Shard splitting and router.field

2020-09-01 Thread Niko Himanen
Hello, I recently ran into a problem that documents disappear from our collections when I split a shard. To be specific, they are not copied to new shards made by the split command. After some debugging I figured out that it is related to router.field we have defined for our collections and that

Re: Solr Shard Splitting Issue with 60 GB index data

2017-02-07 Thread ekta
this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4319149.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shard Splitting Issue

2017-01-30 Thread Anshum Gupta
t shard had -60GB). > 3. Still, the state.json is showing > 3.1 Parent - Active > 3.2 Child - Construction > 4.Yeah i do have logs , i am attaching the file with mail. Please check it > out. > 5. I did shard splitting by this command > > " > http://10.1.1.78:4

Re: Solr Shard Splitting Issue

2017-01-19 Thread ekta
the data got frozen to 24GB in both shards(my parent shard had -60GB). 3. Still, the state.json is showing 3.1 Parent - Active 3.2 Child - Construction 4.Yeah i do have logs , i am attaching the file with mail. Please check it out. 5. I did shard splitting by this command "

Re: Solr Shard Splitting Issue

2017-01-18 Thread Anshum Gupta
to the mailing list. -Anshum On Mon, Jan 16, 2017 at 2:33 AM Ekta Bhalwara <ekta.bhalw...@e-arc.com> wrote: > Hi , > > I tried Shard Splitting with 6.3 version of Solr,with the following steps:- > > Step 1 : > > I have issued > "collections?action=SPLITSHARD==shard1&

Solr Shard Splitting Issue

2017-01-16 Thread Ekta Bhalwara
Hi , I tried Shard Splitting with 6.3 version of Solr,with the following steps:- Step 1 : I have issued "collections?action=SPLITSHARD==shard1" Step 2 : I noticed 2 child shard got created shard1_0 and shard1_1 step 3 : After complete step 2, still I see shard1 stat

Re: Shard splitting for immediate performance boost?

2016-03-20 Thread Erick Erickson
Well, I do tend to go on As Shawn mentioned memory is usually the most precious resource and splitting to more shards, assuming they're in separate JVMs and preferably on separate machines certainly will relieve some of that pressure. My only caution there is that splitting to more shards

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Thanks Erick, I have another index with the same infrastructure setup, but only 10m docs, and never see these slow-downs, that's why my first instinct was to look at creating more shards. I'll definitely make a point of investigating further tho with all the things you and Shawn mentioned,

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Erick Erickson
Be _very_ cautious when you're looking at these timings. Random spikes are often due to opening a new searcher (assuming you're indexing as you query) and are eminently tunable by autowarming. Obviously you can't fire the same query again and again, but if you collect a set of "bad" queries and,

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Shawn Heisey
On 3/19/2016 11:12 AM, Robert Brown wrote: > I have an index of 60m docs split across 2 shards (each with a replica). > > When load testing queries (picking random keywords I know exist), and > randomly requesting facets too, 95% of my responses are under 0.5s. > > However, during some random

Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Hi, I have an index of 60m docs split across 2 shards (each with a replica). When load testing queries (picking random keywords I know exist), and randomly requesting facets too, 95% of my responses are under 0.5s. However, during some random manual tests, sometimes I see searches taking

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-30 Thread Matteo Grolla
Wow, thanks both for the suggestions Erik: good point for the uneven shard load I'm not worried about the growth of a particular shard, in case I'd use shard splitting and if necessary add a server to the cluster but even if I manage to spread docs of typeA producer

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-29 Thread Reitzel, Charles
shard splitting Charles: You raise good points, and I didn't mean to say that co-locating docs due to some critera was never a good idea. That said, it does add administrative complexity that I'd prefer to avoid unless necessary. I suppose it largely depends on what the load and response SLAs

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-28 Thread Reitzel, Charles
, May 21, 2015 11:30 AM To: solr-user@lucene.apache.org Subject: Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting I question your base assumption: bq: So shard by document producer seems a good choice Because what this _also_ does is force

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-28 Thread Erick Erickson
[mailto:erickerick...@gmail.com] Sent: Thursday, May 21, 2015 11:30 AM To: solr-user@lucene.apache.org Subject: Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting I question your base assumption: bq: So shard by document producer seems a good

optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Matteo Grolla
all type A producers) type B cardinality ~10k produce 4M docs/year type C cardinality ~10M produce 9M docs/year I'm thinking about use compositeId ( solrDocId = producerId!docId ) to send all docs of the same producer to the same shards. When a shard becomes too large I can use shard splitting

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Erick Erickson
cardinality ~10M produce 9M docs/year I'm thinking about use compositeId ( solrDocId = producerId!docId ) to send all docs of the same producer to the same shards. When a shard becomes too large I can use shard splitting. problems -documents from type A producers could be oddly distributed among

shard splitting (solr 4.4.0)

2015-04-01 Thread Ashwin Kumar
Hello Solr Community, Greetings ! This is my first post to this group. I am very new to solr, so please do not mind if some of my questions below sound dumb :) Let me explain my present setup: Solr version : Solr_4.4.0 Zookeeper version: zookeeper-3.4.5 -

Re: shard splitting (solr 4.4.0)

2015-04-01 Thread Erick Erickson
Ashwin: First, if at all possible I would simply set up my new SolrCloud structure (2 shards, a leader and follower each) and re-index the entire corpus. 24M docs isn't really very many, and you'll have to have this capability sometime since somone, somewhere will want to change the schema in

Re: Does shard splitting double host count

2015-03-02 Thread tuxedomoon
is it that you say I can just start up new hosts, especially without modfying the numShards parameter from 3 to 4? And then probably reindexing because the other options look risky (my company has no backup system). -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard

Re: Does shard splitting double host count

2015-03-02 Thread Shawn Heisey
On 3/2/2015 6:12 AM, tuxedomoon wrote: Shawn, in light of Garth's response below You can't just add a new core to an existing collection. You can add the new node to the cloud, but it won't be part of any collection. You're not going to be able to just slide it in as a 4th shard to an

RE: Does shard splitting double host count

2015-02-27 Thread Garth Grimm
, February 27, 2015 8:16 AM To: solr-user@lucene.apache.org Subject: Does shard splitting double host count I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have

Re: Does shard splitting double host count

2015-02-27 Thread Shawn Heisey
On 2/27/2015 7:15 AM, tuxedomoon wrote: I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I

Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
removing it from shard2. I'm looking for a migration strategy to achieve 25% docs per shard. I would also consider deleting docs by daterange from shards1,2,3 and reindexing them to redistribute evenly. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting

RE: Does shard splitting double host count

2015-02-27 Thread Garth Grimm
and be routed to the same shard. Shard splitting just divides the range of the shard in half, and copies documents to the 2 new shards based upon where their id's now fall in the new range. That's a little easier to manage than the more complex process of adding one shard, then having to adjust

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does shard splitting double host count

2015-02-27 Thread Shawn Heisey
On 2/27/2015 11:42 AM, tuxedomoon wrote: What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e)

Re: clarification regarding shard splitting and composite IDs

2015-02-05 Thread Dan Davis
gilinac...@gmail.com wrote: Alright. So shard splitting and composite routing plays nicely together. Thank you Anshum. On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: In one line, shard splitting doesn't cater to depend on the routing mechanism

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Gili Nachum
Hi, I'm also interested. When using composite the ID, the _route_ information is not kept on the document itself, so to me it looks like it's not possible as the split API https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 doesn't have a relevant parameter to

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Anshum Gupta
In one line, shard splitting doesn't cater to depend on the routing mechanism but just the hash range so you could have documents for the same prefix split up. Here's an overview of routing in SolrCloud: * Happens based on a hash value * The hash is calculated using the multiple parts

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Gili Nachum
Alright. So shard splitting and composite routing plays nicely together. Thank you Anshum. On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: In one line, shard splitting doesn't cater to depend on the routing mechanism but just the hash range so you could have

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Anshum Gupta
: Alright. So shard splitting and composite routing plays nicely together. Thank you Anshum. On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: In one line, shard splitting doesn't cater to depend on the routing mechanism but just the hash range so you could

Re: clarification regarding shard splitting and composite IDs

2015-02-04 Thread Dan Davis
Doesn't relevancy for that assume that the IDF and TF for user1 and user2 are not too different?SolrCloud still doesn't use a distributed IDF, correct? On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum gilinac...@gmail.com wrote: Alright. So shard splitting and composite routing plays nicely

solrcloud shard splitting with lock type native

2015-01-26 Thread calin.grecu
Hi there, Shard splitting seems to fail if the lock type is native. Here is my config setting: indexConfig lockTypenative/lockType writeLockTimeout1000/writeLockTimeout /indexConfig Shard splitting works if i set the lock type to single or none. However, after splitting, i am

SolrCloud - Shard splitting and re-sizing nodes

2014-12-12 Thread Trilok Prithvi
Hello, We have a 2 shards (S1, S2), 2 replica (R1, R2) setup (Solr Cloud) using 4.10.2 version. Each shard and replica resides on its own nodes (so, total of 4 nodes). As the data increased, we would like to split the shards. So, we are thinking about creating 4 more nodes (2 for shards (S3, S4)

Re: SolrCloud - Shard splitting and re-sizing nodes

2014-12-12 Thread Erick Erickson
A couple of options: 1 physically copy the index over 2 (what I prefer) is to use the ADDREPLICA command from the Collections API to bring up a new node on the new machine as a replica of one of your splits. It'll automatically synchronize, and after it's done then shut down the original split.

Re: More HDFS and Shard Splitting

2014-11-20 Thread Joseph Obernberger
Just confirmed that you do need to create the core directory before doing the SHARDSPLIT (at least with HDFS) - otherwise it fails saying that it cannot find classes - like the cluster classes. Iv'e noticed that the disk usage on HDFS goes up when I do the split - for example, if I split a 100G

Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
I tried to split a shard using HDFS storage, and at first I received this error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'COLLECT1_shard1_0_replica1': Unable to create core [COLLECT1_shard1_0_replica1] Caused by: Direct buffer memory

Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
If I create the directory manually on the server that I'm splitting: COLLECT_shard1_0_replica1 Then do the shard split command, it works OK. -Joe

More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
Originally I had two shards on two machines - shard1 and shard2. I did a SHARDSPLIT on shard1. Now have shard1, shard2, and shard1_0 If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get all the docs OK, but if I specific distrib=false, I get 0 documents. Under HDFS -

Re: More HDFS and Shard Splitting

2014-11-17 Thread Erick Erickson
Tell us more about your HDFS stuff. Specifically, how do you have your HDFSDirectoryFactory specified in solrconfig.xml? Cause you shouldn't have to do things like create the directory ahead of time I don't think. Best, Erick On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger

Re: More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
Looks like the shard split failed, and only created one additional shard. I didn't allocate enough memory for 3x - since two additional shards needed to be created. I was allocating 20G for each shard, so in order do the split, I needed to give 60G for the direct memory access. I've now switched

Re: Shard splitting error: cannot uncache file=_1.nvm

2014-06-07 Thread prem1980
Did you guys were able to fix this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Shard-splitting-error-cannot-uncache-file-1-nvm-tp4086863p4140598.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shard splitting error: cannot uncache file=_1.nvm

2014-01-09 Thread rafal janik
) at org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216) Hi Greg, have you figured it out. I have the same problem... rafal -- View this message in context: http://lucene.472066.n3.nabble.com/Shard-splitting-error-cannot-uncache-file-1-nvm-tp4086863p4110414.html Sent from

SolrCloud shard splitting keeps failing

2013-10-08 Thread Kalle Aaltonen
I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state construction, and each has one replica which

Re: SolrCloud shard splitting keeps failing

2013-10-08 Thread Harald Kirsch
Hello Kalle, we noticed the same problem some weeks ago: http://lucene.472066.n3.nabble.com/Share-splitting-at-23-million-documents-gt-OOM-td4085064.html Would be interesting to hear if there is more positive feedback this time. We finally concluded that it may be worth to start with many

Re: SolrCloud shard splitting keeps failing

2013-10-08 Thread Shalin Shekhar Mangar
Hi Kalle, The problem here is that certain actions are taking too long causing the split process to terminate in between. For example, a commit on the parent shard leader took 83 seconds in your case but the read timeout value is set to 60 seconds only. We actually do not need to open a searcher

Re: SolrCloud shard splitting keeps failing

2013-10-08 Thread Shalin Shekhar Mangar
I was wrong in saying that we don't need to open a searcher, we do. I committed a fix in SOLR-5314 to use soft commits instead of hard commits. I also increased the read time out value. Both of these together will reduce the likelyhood of such a thing happening.

Shard splitting error: cannot uncache file=_1.nvm

2013-08-27 Thread Greg Preston
I haven't been able to successfully split a shard with Solr 4.4.0 If I have an empty index, or all documents would go to one side of the split, I hit SOLR-5144. But if I avoid that case, I consistently get this error: 290391 [qtp243983770-60] INFO

Re: Shard splitting failure, with and without composite hashing

2013-08-14 Thread mewmewball
/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084642.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shard splitting failure, with and without composite hashing

2013-08-13 Thread Srivatsan
I am also getting same error when performing shard splitting using solr 4.4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084177.html Sent from the Solr - User mailing list archive at Nabble.com.

[4.4.0] Shard splitting failure (simplified case)

2013-08-12 Thread Greg Preston
I've simplified things from my previous email, and I'm still seeing errors. Using solr 4.4.0 with two nodes, starting with a single shard. Collection is named marin, host names are dumbo and solrcloud1. I bring up an empty cloud and index 50 documents. I can query them and everything looks

Re: Shard splitting failure, with and without composite hashing

2013-08-12 Thread mewmewball
this message in context: http://lucene.472066.n3.nabble.com/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084143.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shard splitting failure, with and without composite hashing

2013-08-11 Thread Erick Erickson
...@marinsoftware.comwrote: Howdy, I'm trying to test shard splitting, and it's not working for me. I've got a 4 node cloud with a single collection and 2 shards. I've indexed 170k small documents, and I'm using the compositeId router, with an internal client id as the shard key, with 4 distinct values across

Re: Shard splitting failure, with and without composite hashing

2013-08-11 Thread Greg Preston
Oops, I somehow forgot to mention that. The errors I'm seeing are with the release version of Solr 4.4.0. I mentioned 4.1.0 as that's what we currently have in prod, and we want to upgrade to 4.4.0 so we can do shard splitting. Towards that end, I'm testing shard splitting in 4.4.0 and seeing

Shard splitting failure, with and without composite hashing

2013-08-09 Thread Greg Preston
Howdy, I'm trying to test shard splitting, and it's not working for me. I've got a 4 node cloud with a single collection and 2 shards. I've indexed 170k small documents, and I'm using the compositeId router, with an internal client id as the shard key, with 4 distinct values across the data set

Shard splitting and document routing

2013-06-18 Thread Otis Gospodnetic
Hi, Imagine a (common) situation where you use document routing and you end up with 1 large shards (e.g. 1 large user with lots of docs). Shard splitting will help here, because we can break up that 1 shard in 2 smaller shards (and maybe do that recursively to make shards sufficiently small

Re: Shard splitting and document routing

2013-06-18 Thread Mark Miller
). Shard splitting will help here, because we can break up that 1 shard in 2 smaller shards (and maybe do that recursively to make shards sufficiently small). But what happens with document routing after a big shard is split? I assume new docs keep going to just one of the 2 new shards, right

Re: Shard splitting and document routing

2013-06-18 Thread Otis Gospodnetic
On Jun 18, 2013, at 12:25 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Imagine a (common) situation where you use document routing and you end up with 1 large shards (e.g. 1 large user with lots of docs). Shard splitting will help here, because we can break up that 1 shard

Re: shard splitting

2013-06-11 Thread Shalin Shekhar Mangar
can use shard splitting to increase the number of shards. On Tue, Jun 11, 2013 at 10:53 AM, Mingfeng Yang mfy...@wisewindow.comwrote: Hi Shalin, Do you mean that we can do 1-2, 2-4, 4-8 to get 8 shards eventually? After splitting, if we want to set up a solrcloud with all 8 shards, how shall

shard splitting

2013-06-10 Thread Mingfeng Yang
From the solr wiki, I saw this command ( http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=collection_nameshard=shardId) which split one index into 2 shards. However, is there someway to split into more shards? Thanks, Ming-

Re: shard splitting

2013-06-10 Thread Shalin Shekhar Mangar
No, it is hard coded to split into two shards only. You can call it recursively on a sub shard to split into more pieces. Please note that some serious bugs were found in that command which will be fixed in the next (4.3.1) release of Solr. On Tue, Jun 11, 2013 at 9:43 AM, Mingfeng Yang

Re: shard splitting

2013-06-10 Thread Mingfeng Yang
Hi Shalin, Do you mean that we can do 1-2, 2-4, 4-8 to get 8 shards eventually? After splitting, if we want to set up a solrcloud with all 8 shards, how shall we allocate the shards then? Thanks, Ming- On Mon, Jun 10, 2013 at 9:55 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No,

shard splitting

2013-05-22 Thread Arkadi Colson
Hi I tried to split a shard but it failed. If I try to do it again it does not start again. I see the to extra shards in /collections/messages/leader_elect/ and /collections/messages/leaders/ How can I fix this? root@solr07-dcg:/solr/messages_shard3_replica2# curl

Re: shard splitting

2013-05-22 Thread Arkadi Colson
clusterstate.json is now reporting shard3 as inactive. Any idea how to change clusterstate.json manually from commandline? On 05/22/2013 08:59 AM, Arkadi Colson wrote: Hi I tried to split a shard but it failed. If I try to do it again it does not start again. I see the to extra shards in

Re: shard splitting

2013-05-22 Thread Yago Riveiro
You will need to edit it manually and upload using a zookeeper client, you can use kazoo, it's very easy to use. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, May 22, 2013 at 10:04 AM, Arkadi Colson wrote: clusterstate.json is now reporting shard3 as