RE: deletions from index

2017-10-02 Thread Markus Jelsma
You can check the Hadoop job's counters to see how many are being deleted. If some are, then -deleteGone is on in your case. Only with that setting documents are going to be deleted. -Original message- > From:Michael Coffey > Sent: Monday 2nd October 2017

Re: deletions from index

2017-10-02 Thread Michael Coffey
So, I had these numbers in my index: Num Docs: 189550Max Docs: 285531 Deleted Docs: 95981 Then I did a crawl and index, which told meindexed (add/update): 13,423 And now I have these numbers in my index: Num Docs: 190785Max Docs: 223339Deleted Docs: 32554So, I am completely confused. I don't

deletions from index

2017-10-02 Thread Michael Coffey
With my new news crawl, I would like to keep web pages in the index, even after they have disappeared from the web, so I can continue using them in machine-learning processes. I thought I could achieve this by avoiding running cleaning jobs. However, I still notice increasing numbers of