Re: splitting an index (yes, again)

Alexander Aristov Wed, 23 Sep 2009 04:46:41 -0700

Ok, I will paraphrase the question.

Consider I want to use distributed search using 3 servers: one primary and
two secondary nodes.


I create single BIG index using distributed crawler using other computers.
Now I want to split this single BIG index on two parts to put on the search
nodes.

How can it be achieved?

Best Regards
Alexander Aristov


2009/9/23 Koch Martina <k...@huberverlag.de>

> Hi Jesse,
>
> I'm not sure what you're trying to achieve. Do you want to use the
> distributed search or do you want to split an existing index? None of these
> tasks is the prerequisite for the other.
> If you want to split an index, there are several ways to do this. Which way
> to choose depends on the reason for the split.
> If you want to use the distributed search, you just need two or more
> separate indexes, start a search server for each and configure your
> searcher.dir property in nutch-site xml to point to the search-servers.txt
> file, where you entered the hosts and ports of your search servers (detailed
> description:
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg12730.html).
>
> Kind regards,
> Martina
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jesse Hires [mailto:jhi...@gmail.com]
> Gesendet: Mittwoch, 23. September 2009 04:59
> An: nutch-user@lucene.apache.org
> Betreff: splitting an index (yes, again)
>
> My apologies in advance.
>
> I've been digging through the mail archives searching for information on
> splitting the index after crawling, but I am getting even more confused or
> the information is too incomplete for a newbie like myself.
>
> I see reference to using mergesegs, but not enough to make an educated
> guess
> (at least at my level, which I admit is low right now).
>
> I've gotten to the point of having worked my way through the tutorial here:
> http://wiki.apache.org/nutch/Nutch0.9-Hadoop0.10-Tutorial
> and have a working site using a single computer. I have four more computers
> to add, and would like to try distributed search.
>
> When I read that tutorial to the Distributed Searching portion followed by
> "split the index" it mentions this link:
>
> http://wiki.apache.org/nutch/%5Bhttp%3A//www.nabble.com/Lucene-index-manipulation-tools-tf2781692.html#a7760917
>
> But that may as well be saying "then some magic happens".
>
> Does anyone have "step by step" instructions for spitting the index for use
> in distributed search using mergesegs or otherwise? It doesn't have to have
> a lot of explanation, just a list of example steps.
>
>
> Mostly this is experimental for me with no major plans than my own
> education, but because I am starting completely fresh at this, some things
> are still quite confusing.
>
> Thanks,
> Jesse
>

Re: splitting an index (yes, again)

Reply via email to