You can check the Hadoop job's counters to see how many are being deleted. If
some are, then -deleteGone is on in your case. Only with that setting documents
are going to be deleted.
-Original message-
> From:Michael Coffey
> Sent: Monday 2nd October 2017
So, I had these numbers in my index:
Num Docs: 189550Max Docs: 285531
Deleted Docs: 95981
Then I did a crawl and index, which told meindexed (add/update): 13,423
And now I have these numbers in my index:
Num Docs: 190785Max Docs: 223339Deleted Docs: 32554So, I am completely
confused. I don't
With my new news crawl, I would like to keep web pages in the index, even after
they have disappeared from the web, so I can continue using them in
machine-learning processes. I thought I could achieve this by avoiding running
cleaning jobs. However, I still notice increasing numbers of
3 matches
Mail list logo