Chris wrote:
I'm building a crawler that needs to find all the documents in a
repository. Once I do the first crawl, how do I go back later and get
all the documents that have changed?
I could do a full recrawl, but I was hoping there was a faster way to
find the nodes that had been inserted/updated/deleted since the last crawl.
If you use-case allow you to register a listener you can listen for
modifications events.
On the other hand, if you are doing a snap-shot, you can add a modified-time
attribute to all nodes and when you need to find all updated just select nodes
that has modified-time later than your last snap-shot.
But this task is the same with RDBMS. How to select all updated rows from a
table ...
--
Ivan Latysh
[EMAIL PROTECTED]