Veselin, Well, as far as solr is concerned, there is two issues here:-
1) To stop the same document ending up in the indexes twice, use the document pathname as the unique ID. Then if you do index it twice, the previous index information will be discarded. Not very efficient, but it may be tolerable. IMHO using pathname as the unique ID is often best practice. 2) To stop a document even being submitted to solr. You need to implement some middle ware that either performs a search/lookup using a documents pathname to see if it is already indexed. Or, after examining timestampts, only submits documents which have changed since the last folder scan. Fergus. >Hello Paul, >I'm indexing with "curl http://localhost... -F myfi...@file.pdf" > >Regards, >Veselin K > > >On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ????????????????????? >?????????????????? wrote: >> how are you indexing? >> >> On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev >> <vese...@campbell-lange.net> wrote: >> > Hello, >> > apologies for the basic question. >> > >> > How can I avoid double indexing files? >> > >> > In case all my files are in one folder which is scanned frequently, is >> > there a Solr feature of checking and skipping a file if it has already >> > been indexed >> > and not changed since? >> > >> > >> > Thank you. >> > >> > Regards, >> > Veselin K >> > >> > >> >> >> >> -- >> --Noble Paul -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================