Yup, DIH is not optimal for SolrCloud yet. I made a few JIRA issues a short while ago that may help.
I've seen people use it with SolrCloud in the past though - and it wasn't so slow…(though I'm sure slower than a single node). Search me... - Mark On Nov 27, 2012, at 1:24 PM, Mikhail Khludnev <mkhlud...@griddynamics.com> wrote: > It sounds like DataImportHandler will not be really performant with > SolrCloud. From what I see it should essentiallly work - it sends doc to > the chain, which should distribute them via DistributedUpdateProcessor. But > it works synchronously - no multithreading in DIH since 4.0! > Does anyone has an experience or idea of fast data acquisition with > DIH&SolrCloud? > Excuse me for thread hijacking. > > > On Tue, Nov 27, 2012 at 8:10 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> To get the best speed out of SolrCloud you have to index from many clients >> (or threads). Even better is if you index to many nodes rather than one. >> >> Using a single thread against a single instance with replicas will be a >> fair amount slower with cloud than if you just used one node. >> >> - Mark >> >> On Nov 27, 2012, at 12:02 AM, deniz <denizdurmu...@gmail.com> wrote: >> >>> As I am some kinda confused, I wanna check if anyone else has same >> confusions >>> like mine about solrcloud.. >>> >>> I have set up an environment with 3 solr instances and 2 zookeepers, amd >>> tried to index some documents from mysql db. the total amount the docs >> are >>> around 3.5M. before indexing i was expecting some longer time for cloud >> as >>> it does replication between nodes, but i am some kinda disappointed after >>> seeing that indexing took 4 to 5 times higher than indexing on a single >> solr >>> instance. on a single solr instance i am able to index those docs around >> 17 >>> mins while with cloud it tooks around 60 minutes. and as a possible >>> production environment will have more instances and machines available >> for >>> the cloud, i cant imagine the indexing time... in adiditon to initial >>> indexing time, we will be updating our indexes frequently, which makes me >>> sceptical about solrcloud. >>> >>> so in a possible production environment with solrcloud, in case there is >> a >>> serious failure on some nodes, sync operation on cloud will take long >>> time... in this case, reindexing everything on a single instance will >> took >>> less than 17 mins, which is a reasonable amount of time for a crash.. so >> in >>> this case does it make sense use solrcloud although indexing time will >>> increase much higher than a single instance? or using a traditional >> master - >>> slave structure will be better for this case? >>> >>> I am aware cloud makes loadbalancing and some other stuff largely >> concerned >>> about searching, rather than indexing, but for a frequently updated >> system, >>> does it still useful to set up a cloud environment? >>> >>> and are there some workarounds for indexing speed, other than the known >> ones >>> for solr, on cloud? >>> >>> >>> >>> ----- >>> Zeki ama calismiyor... Calissa yapar... >>> -- >>> View this message in context: >> http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Indexing-tp4022549.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com>