I'll try to index ~5M crawled documents to 8-noded cluster with this patch and notify you guys about the result.
Best. On Tue, Jul 9, 2013 at 1:55 PM, Markus Jelsma <[email protected]>wrote: > Hi, > > Just as i explained. The DistributedUpdateRequestProcessor does that on > the Solr node for you. There's an issue at Solr for client based document > routing which we will use when it is committed and released. Then indexing > is as efficient as it can be. See > https://issues.apache.org/jira/browse/SOLR-4816 > > The problem with CommonsHttpSolrServer is that it does not fail-over as > CloudSolrServer does which uses LBSolrServer underneath. > CommonsHttpSolrServer also doesn't exist anymore in Solr 4.x so that won't > work anymore when NUTCH-1486 is committed. Keep an eye on SOLR-4816. > Hopefully it will make Solr 4.4 which is probably going to be released in > august. > > Cheers > > > -----Original message----- > > From:Tuğcem Oral <[email protected]> > > Sent: Tuesday 9th July 2013 12:51 > > To: [email protected] > > Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud > > > > Every point is OK except one: if there's no partitioning for solrj, how > > could, say 1000 documents, distributed across the nodes? One-by-one? > What > > will be the strategy? > > > > No need to open a new issue, my patch does similar job w/o using > > CloudSolrServer, but CommonsHttpSolrServer(s). I'll give a shot for your > > patch. > > > > Best > > > > > > On Tue, Jul 9, 2013 at 1:34 PM, Markus Jelsma < > [email protected]>wrote: > > > > > Yes, it only takes URL's for your ensemble because that is how > > > CloudSolrServer works and it is the best method of connecting to a Solr > > > cloud from Java. As said, there is no partitioning at all (SolrJ > document > > > routing is not yet committed) but your Solr nodes' > > > DistributedUpdateRequestProcessor does the redistribution of incoming > > > documents. Documents are also not send over Zookeeper, CloudSolrServer > only > > > uses the Zookeeper ensemble to find all nodes of the cluster and > > > distinguish between masters and slaves so documents are sent to masters > > > only. > > > > > > Depending on what your patch exactly does you may need to open a new > > > issue. If it's also about writing data to a SolrCloud cluster, > NUTCH-1377 > > > via Zookeeper is the only proper way to go. > > > > > > Cheers > > > > > > -----Original message----- > > > > From:Tuğcem Oral <[email protected]> > > > > Sent: Tuesday 9th July 2013 12:29 > > > > To: [email protected] > > > > Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud > > > > > > > > Markus, > > > > > > > > I checked yours, they're quite similar but yours only takes zookeeper > > > > ensemble urls, mine looks for all solr urls for a cluster. How could > you > > > > partition the documents? Sending them over zookeeper is enough? > > > > > > > > BTW my patch is ready, how could suppose to attach it? > > > > > > > > Best > > > > > > > > > > > > On Tue, Jul 9, 2013 at 1:11 PM, Markus Jelsma < > > > [email protected]>wrote: > > > > > > > > > I attached a patch for support of CloudSolrServer and a Zookeeper > > > > > ensemble. Use solr.zookeeper.hosts and solr.collection to enable > it. > > > Patch > > > > > also required NUTCH-1486. > > > > > https://issues.apache.org/jira/browse/NUTCH-1377 > > > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > From:Tuğcem Oral <[email protected]> > > > > > > Sent: Tuesday 9th July 2013 9:31 > > > > > > To: [email protected] > > > > > > Subject: Re: Indexing from nutch 1.6 to solr 4.3.1 cloud > > > > > > > > > > > > So your org.apache.nutch.indexer.solr.SolrIndexer utility is not > > > working > > > > > > from nutch 1.6 I suppose, that might be used from nutch 2.1. > Because > > > in > > > > > 1.6 > > > > > > you cannot do such a thing, as multiple solr instances (so > > > solrcloud) and > > > > > > partitioning is not supported on that version. > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2013 at 12:55 AM, <[email protected]> wrote: > > > > > > > > > > > > > I give only one url to solrindex command and solrcloud takes > care > > > of > > > > > > > partitioning. I do not use solrj and actually did not > understand > > > > > Markus's > > > > > > > comments. I use solr.4.2.0 with cloud feature. > > > > > > > > > > > > > > Thanks. > > > > > > > Alex. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Tuğcem Oral <[email protected]> > > > > > > > To: user <[email protected]> > > > > > > > Sent: Mon, Jul 8, 2013 1:26 pm > > > > > > > Subject: Indexing from nutch 1.6 to solr 4.3.1 cloud > > > > > > > > > > > > > > > > > > > > > @alex, i dont understand how could you give multiple solr urls > > > while > > > > > > > indexing from 1.6. Because solrindex handles given solr url > with a > > > > > single > > > > > > > SolrServer instance, dont use List<SolrServer>, and also as > @Marcus > > > > > said, > > > > > > > solrj doesnt support partitioning. The phrase you used > "indexing > > > using > > > > > with > > > > > > > nutch 1.6 and 2.1" seems a bit confusing for me, which version > of > > > > > solrj and > > > > > > > solr (cloud) you are using is important i suppose. > > > > > > > > > > > > > > @erol, I can upload the patch tomorrow and notify you about it, > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Tugcem > > > > > > > > > > > > > > On Monday, July 8, 2013, eakarsu wrote: > > > > > > > > > > > > > > > Tugcem, > > > > > > > > > > > > > > > > Can you please send me patch also? > > > > > > > > I would like to test it > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Erol Akarsu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > View this message in context: > > > > > > > > > > > > > > > > > > > > > > > > http://lucene.472066.n3.nabble.com/Indexing-from-nutch-1-6-to-solr-4-3-1-cloud-tp4075737p4076346.html > > > > > > > > Sent from the Nutch - User mailing list archive at > Nabble.com. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > TO > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > TO > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > TO > > > > > > > > > > > > > > > -- > > TO > > > -- TO

