No need for partitioning in Nutch code anymore. CloudSolrServer now does that
for us plus the upcoming document routing.
-----Original message-----
> From:Tuğcem Oral <[email protected]>
> Sent: Saturday 6th July 2013 20:24
> To: [email protected]
> Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud
>
> Hi,
>
> Well, I see that 1377 is quite related with 1480 and they consider indexing
> "same" documents to multiple solr servers within a cluster for Nutch v1.6.
> Actually, the crucial point is partitioning the documents over different
> solr instances which is discussed in 945. But the MurmurHashPartitioner
> patch provided in 945 is for Nutch v2.2. What I wrote is a kind of
> combination of these two patches, partitioning and indexing for v1.6.
>
> I will attach the patch probably on monday, the original source code is on
> my other computer.
>
> Best,
>
> Tugcem Oral
>
>
>
> On Fri, Jul 5, 2013 at 5:48 PM, Markus Jelsma
> <[email protected]>wrote:
>
> > Hi,
> >
> > 1480 and 1377 are different. We already use CloudSolrServer (i haven't
> > added the patch yet) but also use 1480 to write to multiple Solr clusters!
> > Both need still need patches and i haven't had time yet to provide them
> > although we already use both features in our Nutch.
> >
> > I'll try to find some time next week, should be easy.
> >
> > Cheers
> >
> >
> >
> > -----Original message-----
> > > From:Tuğcem Oral <[email protected]>
> > > Sent: Friday 5th July 2013 16:24
> > > To: [email protected]
> > > Subject: Indexing from nutch 1.6 to solr 4.3.1 cloud
> > >
> > > Hi all,
> > >
> > > There' re several issues and patches about indexing nutch segments to
> > > multiple solr servers. However,
> > > NUTCH-1377<https://issues.apache.org/jira/browse/NUTCH-1377>
> > > and NUTCH-1480 <https://issues.apache.org/jira/browse/NUTCH-1480>
> > considers
> > > a patch about indexing *same* document set to different solr servers.
> > Also,
> > > NUTCH-945 <https://issues.apache.org/jira/browse/NUTCH-945> provides a
> > > patch for partitioning documents by murmur hash partitioner and index to
> > > different solr servers, but for Nutch version 2.x As we're still using
> > > nutch 1.6 (w/o gora), I couldn't find a valid patch. thus I wrote my
> > own. I
> > > tested to index ~8M crawled documents to a solr cluster with 4 shards
> > (each
> > > with 1 replica, total 8 nodes), and it scaled quite good. I'd love to
> > share
> > > this patch with you guys.
> > >
> > > Best,
> > >
> > > Tugcem Oral
> > >
> > > --
> > > TO
> >
>
>
>
> --
> TO
>