RE: Indexing from nutch 1.6 to solr 4.3.1 cloud

Markus Jelsma Tue, 09 Jul 2013 03:57:29 -0700

Hi,

Just as i explained. The DistributedUpdateRequestProcessor does that on the 
Solr node for you. There's an issue at Solr for client based document routing 
which we will use when it is committed and released. Then indexing is as 
efficient as it can be. See https://issues.apache.org/jira/browse/SOLR-4816
 
The problem with CommonsHttpSolrServer is that it does not fail-over as 
CloudSolrServer does which uses LBSolrServer underneath. CommonsHttpSolrServer 
also doesn't exist anymore in Solr 4.x so that won't work anymore when 
NUTCH-1486 is committed. Keep an eye on SOLR-4816. Hopefully it will make Solr 
4.4 which is probably going to be released in august.


Cheers
 
 
-----Original message-----
> From:Tuğcem Oral <[email protected]>
> Sent: Tuesday 9th July 2013 12:51
> To: [email protected]
> Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud
> 
> Every point is OK except one: if there's no partitioning for solrj, how
> could, say 1000 documents, distributed across the nodes?  One-by-one? What
> will be the strategy?
> 
> No need to open a new issue, my patch does similar job w/o using
> CloudSolrServer, but CommonsHttpSolrServer(s). I'll give a shot for your
> patch.
> 
> Best
> 
> 
> On Tue, Jul 9, 2013 at 1:34 PM, Markus Jelsma 
> <[email protected]>wrote:
> 
> > Yes, it only takes URL's for your ensemble because that is how
> > CloudSolrServer works and it is the best method of connecting to a Solr
> > cloud from Java. As said, there is no partitioning at all (SolrJ document
> > routing is not yet committed) but your Solr nodes'
> > DistributedUpdateRequestProcessor does the redistribution of incoming
> > documents. Documents are also not send over Zookeeper, CloudSolrServer only
> > uses the Zookeeper ensemble to find all nodes of the cluster and
> > distinguish between masters and slaves so documents are sent to masters
> > only.
> >
> > Depending on what your patch exactly does you may need to open a new
> > issue. If it's also about writing data to a SolrCloud cluster, NUTCH-1377
> > via Zookeeper is the only proper way to go.
> >
> > Cheers
> >
> > -----Original message-----
> > > From:Tuğcem Oral <[email protected]>
> > > Sent: Tuesday 9th July 2013 12:29
> > > To: [email protected]
> > > Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud
> > >
> > > Markus,
> > >
> > > I checked yours, they're quite similar but yours only takes zookeeper
> > > ensemble urls, mine looks for all solr urls for a cluster. How could you
> > > partition the documents? Sending them over zookeeper is enough?
> > >
> > > BTW my patch is ready, how could suppose to attach it?
> > >
> > > Best
> > >
> > >
> > > On Tue, Jul 9, 2013 at 1:11 PM, Markus Jelsma <
> > [email protected]>wrote:
> > >
> > > > I attached a patch for support of CloudSolrServer and a Zookeeper
> > > > ensemble. Use solr.zookeeper.hosts and solr.collection to enable it.
> > Patch
> > > > also required NUTCH-1486.
> > > > https://issues.apache.org/jira/browse/NUTCH-1377
> > > >
> > > >
> > > >
> > > > -----Original message-----
> > > > > From:Tuğcem Oral <[email protected]>
> > > > > Sent: Tuesday 9th July 2013 9:31
> > > > > To: [email protected]
> > > > > Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud
> > > > >
> > > > > So your org.apache.nutch.indexer.solr.SolrIndexer utility is not
> > working
> > > > > from nutch 1.6 I suppose, that might be used from nutch 2.1. Because
> > in
> > > > 1.6
> > > > > you cannot do such a thing, as multiple solr instances (so
> > solrcloud) and
> > > > > partitioning is not supported on that version.
> > > > >
> > > > >
> > > > > On Tue, Jul 9, 2013 at 12:55 AM, <[email protected]> wrote:
> > > > >
> > > > > > I give only one url to solrindex command and solrcloud takes care
> > of
> > > > > >  partitioning. I do not use solrj and actually did not understand
> > > > Markus's
> > > > > > comments. I use solr.4.2.0 with cloud feature.
> > > > > >
> > > > > > Thanks.
> > > > > > Alex.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tuğcem Oral <[email protected]>
> > > > > > To: user <[email protected]>
> > > > > > Sent: Mon, Jul 8, 2013 1:26 pm
> > > > > > Subject: Indexing from nutch 1.6 to solr 4.3.1 cloud
> > > > > >
> > > > > >
> > > > > > @alex, i dont understand how could you give multiple solr urls
> > while
> > > > > > indexing from 1.6. Because solrindex handles given solr url with a
> > > > single
> > > > > > SolrServer instance, dont use List<SolrServer>, and also as @Marcus
> > > > said,
> > > > > > solrj doesnt support partitioning. The phrase you used "indexing
> > using
> > > > with
> > > > > > nutch 1.6 and 2.1" seems a bit confusing for me, which version of
> > > > solrj and
> > > > > > solr (cloud) you are using is important i suppose.
> > > > > >
> > > > > > @erol, I can upload the patch tomorrow and notify you about it,
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Tugcem
> > > > > >
> > > > > > On Monday, July 8, 2013, eakarsu wrote:
> > > > > >
> > > > > > > Tugcem,
> > > > > > >
> > > > > > > Can you please send me patch also?
> > > > > > > I would like to test it
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Erol Akarsu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > View this message in context:
> > > > > > >
> > > > > >
> > > >
> > http://lucene.472066.n3.nabble.com/Indexing-from-nutch-1-6-to-solr-4-3-1-cloud-tp4075737p4076346.html
> > > > > > > Sent from the Nutch - User mailing list archive at Nabble.com.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > TO
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > TO
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > TO
> > >
> >
> 
> 
> 
> -- 
> TO
>

RE: Indexing from nutch 1.6 to solr 4.3.1 cloud

Reply via email to