Hello Sebastian,
I got it, csv indexer needs one task to run properly, I tested it and it
worked. Thank you for the advice.
I tried to comment on this jira issue, but I don't have access,
unfortunately I don't know how to do it.
I think if a commiter changed CSVIndexerWriter.java:
if (fs.exists
Hi Paul,
> the indexer was writing the
> documents info in the file (nutch.csv) twice,
Yes, I see. And now I know what I've overseen:
.../bin/nutch index -Dmapreduce.job.reduces=2
You need to run the CSV indexer with only a single reducer.
In order to do so, please pass the option
--num-tas
2 matches
Mail list logo