Re: How could I avoid reindexing same files?

2009-04-08 Thread Veselin Kantsev
Hi Fergus, On Tue, Apr 07, 2009 at 05:06:23PM +0100, Fergus McMenemie wrote: Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file. Snap. That is close to what we did. However due to our pervious duff full text search

Re: How could I avoid reindexing same files?

2009-04-08 Thread Fergus McMenemie
Hi Fergus, On Tue, Apr 07, 2009 at 05:06:23PM +0100, Fergus McMenemie wrote: Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file. Snap. That is close to what we did. However due to our pervious duff full text search

Re: How could I avoid reindexing same files?

2009-04-08 Thread Veselin K
Useful tip Erik, this will save a lot of hassle. Thank you much. Regards, Veselin K On Tue, Apr 07, 2009 at 11:29:38AM -0400, Erik Hatcher wrote: Note that Solr (trunk, soon to be 1.4) has a duplicate detection feature that may work for your need. See

Re: How could I avoid reindexing same files?

2009-04-07 Thread Fergus McMenemie
Veselin, Well, as far as solr is concerned, there is two issues here:- 1) To stop the same document ending up in the indexes twice, use the document pathname as the unique ID. Then if you do index it twice, the previous index information will be discarded. Not very efficient, but it may be

Re: How could I avoid reindexing same files?

2009-04-07 Thread Veselin K
Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file. Then as a part of Solr indexing, one could check against the DB if a file path exists, if Yes, then compare MD5 and only index if different. Regards, Veselin K On Tue,

Re: How could I avoid reindexing same files?

2009-04-07 Thread Erik Hatcher
Note that Solr (trunk, soon to be 1.4) has a duplicate detection feature that may work for your need. See http://wiki.apache.org/solr/Deduplication (looks like docs need updating to say 1.4 here) and http://issues.apache.org/jira/browse/SOLR-799 Erik On Apr 7, 2009, at 11:25 AM,

Re: How could I avoid reindexing same files?

2009-04-07 Thread Fergus McMenemie
Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file. Snap. That is close to what we did. However due to our pervious duff full text search engine we had to hold this information in a separate checksums file. Solr is much better

Re: How could I avoid reindexing same files?

2009-04-06 Thread Veselin K
Hello Paul, I'm indexing with curl http://localhost... -F myfi...@file.pdf Regards, Veselin K On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ? ?? wrote: how are you indexing? On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev

How could I avoid reindexing same files?

2009-04-06 Thread Veselin Kantsev
Hello, apologies for the basic question. How can I avoid double indexing files? In case all my files are in one folder which is scanned frequently, is there a Solr feature of checking and skipping a file if it has already been indexed and not changed since? Thank you. Regards, Veselin K

Re: How could I avoid reindexing same files?

2009-04-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
how are you indexing? On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev vese...@campbell-lange.net wrote: Hello, apologies for the basic question. How can I avoid double indexing files? In case all my files are in one folder which is scanned frequently, is there a Solr feature of checking