Hi people, I urgently need your help!
I have solr 3.3 configured and running. I do uncremental indexing 4 times a day using bulk updates. Some documents are identical to some extent and I wish to skip them, not to index. But here is the problem as I could not find a way to tell solr ignore new duplicate docs and keep old indexed docs. I don't care that it's new. Just determine by ID that such document is in the index already and that's it. I use solrj for indexing. I have tried setting overwrite=false and dedupe apprache but nothing helped me. I either have that a newer doc overwrites old one or I get duplicate. I think it's a very simple and basic feature and it must exist. What did I make wrong or didn't do? Tried google but I couldn't find a solution there althoght many people encounted such problem. I start considering that I must query index to check if a doc to be added is in the index already and do not add it to array but I have so many docs that I am affraid it's not a good solution. Best Regards Alexander Aristov