Re: CSV indexer file data overwriting

2022-11-24 Thread Paul Escobar
Hello Sebastian, I got it, csv indexer needs one task to run properly, I tested it and it worked. Thank you for the advice. I tried to comment on this jira issue, but I don't have access, unfortunately I don't know how to do it. I think if a commiter changed CSVIndexerWriter.java: if (fs.exists

Re: CSV indexer file data overwriting

2022-11-24 Thread Sebastian Nagel
Hi Paul, > the indexer was writing the > documents info in the file (nutch.csv) twice, Yes, I see. And now I know what I've overseen: .../bin/nutch index -Dmapreduce.job.reduces=2 You need to run the CSV indexer with only a single reducer. In order to do so, please pass the option --num-tas