On 09.11.2011 18:03, Andrzej Bialecki wrote: > On 09/11/2011 16:30, Marek Bachmann wrote: >> Am 09.11.2011 16:27, schrieb Markus Jelsma: >>> the most recent item >>> >>> On Wednesday 09 November 2011 16:23:28 Marek Bachmann wrote: >>>> Hello all, >>>> >>>> when I have segments from two crawls, the first one from initial >>>> crawling and the second on from recrawl, how will they be merged? >>>> >>>> I mean: >>>> >>>> *) When site A has changed between the crawl, what content will be in >>>> the merged segment. The old one or the new one (or both)? >>>> >>>> Thanks :) >>> >> >> Thank you! :) >> > > Note: please consult the javadocs for SegmentMerger. Timestamps of some > parts of segments are difficult to determine, so the "latest" means > "coming from a segment with a name in highest lexicographic order". > > In practice, if your segments are named after a timestamp, all things > should work ok. However, if you rename the latest segment to e.g. > 0000-most-recent then results will be not what you expected. >
Thank you, Andrzej, for the advice! :) I won't rename them since I need the timestamp structure for finding the ongoing one in may crawl scripts. So it should work for me.

