hi

after that you send content to solr you don't need nutch stuff anymore,
neither segments nor index since Solr creates it's own index.

As for segments naming - Nutch uses segment name on different stages of
processing, for example when it's doing dedup. It might take date from
segmant name to compare two links to decide which one is older.

Best Regards
Alexander Aristov


On 11 May 2010 21:58, Markus Jelsma <[email protected]> wrote:

> Hi Joshua,
>
>
>
>
>
> I'm not using Nutch only for fetch and parse before sending it to a Solr
> instance. During my actions such as creating a new segment from the crawldb,
> fetching and parsing and reinserting new URL's in the crawldb, i can give
> any arbitrary path to a segment directory, thus renaming, would in my case,
> not be a problem at all. I don't know if this holds if you use Nutch's
> built-in indexer though.
>
>
>
> By the way, i delete the newly created segment directories anyway after
> they've been sent to Solr.
>
>
>
>
>
> Cheers,
>
> -----Original message-----
> From: Joshua J Pavel <[email protected]>
> Sent: Tue 11-05-2010 19:47
> To: [email protected];
> Subject: Renaming segments?
>
> Hi everyone!
>
> I crawl often, and move my crawl to a different server to serve out the
> results, replacing the previous crawl's filesystem.  This can quickly lead
> to inactive segments accruing on the server running the web portion.
>
> I would like to rename my segments to a standard, non-dated format (e.g,
> segment1, segment2, segment3, ...) to make it more portable.  Is this
> possible?
>
> Thanks!
> -Josh

Reply via email to