On 09/11/2011 16:30, Marek Bachmann wrote:
Am 09.11.2011 16:27, schrieb Markus Jelsma:
the most recent item

On Wednesday 09 November 2011 16:23:28 Marek Bachmann wrote:
Hello all,

when I have segments from two crawls, the first one from initial
crawling and the second on from recrawl, how will they be merged?

I mean:

*) When site A has changed between the crawl, what content will be in
the merged segment. The old one or the new one (or both)?

Thanks :)


Thank you! :)


Note: please consult the javadocs for SegmentMerger. Timestamps of some parts of segments are difficult to determine, so the "latest" means "coming from a segment with a name in highest lexicographic order".

In practice, if your segments are named after a timestamp, all things should work ok. However, if you rename the latest segment to e.g. 0000-most-recent then results will be not what you expected.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to