Hello,
I recently ran into a problem that documents disappear from our collections
when I split a shard. To be specific, they are not copied to new shards
made by the split command.
After some debugging I figured out that it is related to router.field we
have defined for our collections and that
this message in context:
http://lucene.472066.n3.nabble.com/Solr-Shard-Splitting-Issue-tp4314145p4319149.html
Sent from the Solr - User mailing list archive at Nabble.com.
t shard had -60GB).
> 3. Still, the state.json is showing
> 3.1 Parent - Active
> 3.2 Child - Construction
> 4.Yeah i do have logs , i am attaching the file with mail. Please check it
> out.
> 5. I did shard splitting by this command
>
> "
> http://10.1.1.78:4
the data got frozen to 24GB
in both shards(my parent shard had -60GB).
3. Still, the state.json is showing
3.1 Parent - Active
3.2 Child - Construction
4.Yeah i do have logs , i am attaching the file with mail. Please check it
out.
5. I did shard splitting by this command
"
to the mailing list.
-Anshum
On Mon, Jan 16, 2017 at 2:33 AM Ekta Bhalwara <ekta.bhalw...@e-arc.com>
wrote:
> Hi ,
>
> I tried Shard Splitting with 6.3 version of Solr,with the following steps:-
>
> Step 1 :
>
> I have issued
> "collections?action=SPLITSHARD==shard1&
Hi ,
I tried Shard Splitting with 6.3 version of Solr,with the following steps:-
Step 1 :
I have issued
"collections?action=SPLITSHARD==shard1"
Step 2 :
I noticed 2 child shard got created shard1_0 and shard1_1
step 3 :
After complete step 2, still I see
shard1 stat
Well, I do tend to go on
As Shawn mentioned memory is usually the most
precious resource and splitting to more shards, assuming
they're in separate JVMs and preferably on separate
machines certainly will relieve some of that pressure.
My only caution there is that splitting to more shards
Thanks Erick,
I have another index with the same infrastructure setup, but only 10m
docs, and never see these slow-downs, that's why my first instinct was
to look at creating more shards.
I'll definitely make a point of investigating further tho with all the
things you and Shawn mentioned,
Be _very_ cautious when you're looking at these timings. Random
spikes are often due to opening a new searcher (assuming
you're indexing as you query) and are eminently tunable by
autowarming. Obviously you can't fire the same query again and again,
but if you collect a set of "bad" queries and,
On 3/19/2016 11:12 AM, Robert Brown wrote:
> I have an index of 60m docs split across 2 shards (each with a replica).
>
> When load testing queries (picking random keywords I know exist), and
> randomly requesting facets too, 95% of my responses are under 0.5s.
>
> However, during some random
Hi,
I have an index of 60m docs split across 2 shards (each with a replica).
When load testing queries (picking random keywords I know exist), and
randomly requesting facets too, 95% of my responses are under 0.5s.
However, during some random manual tests, sometimes I see searches
taking
Wow,
thanks both for the suggestions
Erik: good point for the uneven shard load
I'm not worried about the growth of a particular shard, in case I'd use
shard splitting and if necessary add a server to the cluster
but even if I manage to spread docs of typeA producer
shard splitting
Charles:
You raise good points, and I didn't mean to say that co-locating docs due to
some critera was never a good idea. That said, it does add administrative
complexity that I'd prefer to avoid unless necessary.
I suppose it largely depends on what the load and response SLAs
, May 21, 2015 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: optimal shard assignment with low shard key cardinality using
compositeId to enable shard splitting
I question your base assumption:
bq: So shard by document producer seems a good choice
Because what this _also_ does is force
[mailto:erickerick...@gmail.com]
Sent: Thursday, May 21, 2015 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: optimal shard assignment with low shard key cardinality using
compositeId to enable shard splitting
I question your base assumption:
bq: So shard by document producer seems a good
all type A producers)
type B
cardinality ~10k
produce 4M docs/year
type C
cardinality ~10M
produce 9M docs/year
I'm thinking about
use compositeId ( solrDocId = producerId!docId ) to send all docs of the same
producer to the same shards. When a shard becomes too large I can use shard
splitting
cardinality ~10M
produce 9M docs/year
I'm thinking about
use compositeId ( solrDocId = producerId!docId ) to send all docs of the same
producer to the same shards. When a shard becomes too large I can use shard
splitting.
problems
-documents from type A producers could be oddly distributed among
Hello Solr Community,
Greetings ! This is my first post to this group.
I am very new to solr, so please do not mind if some of my questions below
sound dumb :)
Let me explain my present setup:
Solr version : Solr_4.4.0
Zookeeper version: zookeeper-3.4.5
-
Ashwin:
First, if at all possible I would simply set up my new SolrCloud
structure (2 shards, a leader and follower each) and re-index the
entire corpus. 24M docs isn't really very many, and you'll have to
have this capability sometime since somone, somewhere will want to
change the schema in
is it that you say I can just start up new hosts, especially without
modfying the numShards parameter from 3 to 4? And then probably reindexing
because the other options look risky (my company has no backup system).
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-shard
On 3/2/2015 6:12 AM, tuxedomoon wrote:
Shawn, in light of Garth's response below
You can't just add a new core to an existing collection. You can add the
new node to the cloud, but it won't be part of any collection. You're not
going to be able to just slide it in as a 4th shard to an
, February 27, 2015 8:16 AM
To: solr-user@lucene.apache.org
Subject: Does shard splitting double host count
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have
On 2/27/2015 7:15 AM, tuxedomoon wrote:
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have to reindex to a 4.3+ version.
If I had that feature would I
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have to reindex to a 4.3+ version.
If I had that feature would I need to split each shard into 2 subshards
removing it from shard2.
I'm looking for a migration strategy to achieve 25% docs per shard. I would
also consider deleting docs by daterange from shards1,2,3 and reindexing
them to redistribute evenly.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-shard-splitting
and be routed to the same shard.
Shard splitting just divides the range of the shard in half, and copies
documents to the 2 new shards based upon where their id's now fall in the new
range. That's a little easier to manage than the more complex process of
adding one shard, then having to adjust
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts.
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 2/27/2015 11:42 AM, tuxedomoon wrote:
What about adding one new leader/replica pair? It seems that would entail
a) creating the r3.large instances and volumes
b) adding 2 new Zookeeper hosts?
c) updating my Zookeeper configs (new hosts, new ids, new SOLR config)
d) restarting all ZKs
e)
gilinac...@gmail.com wrote:
Alright. So shard splitting and composite routing plays nicely together.
Thank you Anshum.
On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net
wrote:
In one line, shard splitting doesn't cater to depend on the routing
mechanism
Hi, I'm also interested. When using composite the ID, the _route_
information is not kept on the document itself, so to me it looks like it's
not possible as the split API
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
doesn't have a relevant parameter to
In one line, shard splitting doesn't cater to depend on the routing
mechanism but just the hash range so you could have documents for the same
prefix split up.
Here's an overview of routing in SolrCloud:
* Happens based on a hash value
* The hash is calculated using the multiple parts
Alright. So shard splitting and composite routing plays nicely together.
Thank you Anshum.
On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net
wrote:
In one line, shard splitting doesn't cater to depend on the routing
mechanism but just the hash range so you could have
:
Alright. So shard splitting and composite routing plays nicely together.
Thank you Anshum.
On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta ans...@anshumgupta.net
wrote:
In one line, shard splitting doesn't cater to depend on the routing
mechanism but just the hash range so you could
Doesn't relevancy for that assume that the IDF and TF for user1 and user2
are not too different?SolrCloud still doesn't use a distributed IDF,
correct?
On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum gilinac...@gmail.com wrote:
Alright. So shard splitting and composite routing plays nicely
Hi there,
Shard splitting seems to fail if the lock type is native. Here is my config
setting:
indexConfig
lockTypenative/lockType
writeLockTimeout1000/writeLockTimeout
/indexConfig
Shard splitting works if i set the lock type to single or none. However,
after splitting, i am
Hello,
We have a 2 shards (S1, S2), 2 replica (R1, R2) setup (Solr Cloud) using
4.10.2 version. Each shard and replica resides on its own nodes (so, total
of 4 nodes).
As the data increased, we would like to split the shards. So, we are
thinking about creating 4 more nodes (2 for shards (S3, S4)
A couple of options:
1 physically copy the index over
2 (what I prefer) is to use the ADDREPLICA
command from the Collections API to bring
up a new node on the new machine as a replica
of one of your splits. It'll automatically synchronize,
and after it's done then shut down the original split.
Just confirmed that you do need to create the core directory before doing
the SHARDSPLIT (at least with HDFS) - otherwise it fails saying that it
cannot find classes - like the cluster classes.
Iv'e noticed that the disk usage on HDFS goes up when I do the split - for
example, if I split a 100G
I tried to split a shard using HDFS storage, and at first I received this
error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
CREATEing SolrCore 'COLLECT1_shard1_0_replica1': Unable to create core
[COLLECT1_shard1_0_replica1] Caused by: Direct buffer memory
If I create the directory manually on the server that I'm splitting:
COLLECT_shard1_0_replica1
Then do the shard split command, it works OK.
-Joe
Originally I had two shards on two machines - shard1 and shard2.
I did a SHARDSPLIT on shard1.
Now have shard1, shard2, and shard1_0
If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get
all the docs OK, but if I specific distrib=false, I get 0 documents.
Under HDFS -
Tell us more about your HDFS stuff. Specifically, how
do you have your HDFSDirectoryFactory specified in
solrconfig.xml?
Cause you shouldn't have to do things like create the
directory ahead of time I don't think.
Best,
Erick
On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger
Looks like the shard split failed, and only created one additional shard.
I didn't allocate enough memory for 3x - since two additional shards needed
to be created. I was allocating 20G for each shard, so in order do the
split, I needed to give 60G for the direct memory access. I've now
switched
Did you guys were able to fix this issue?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Shard-splitting-error-cannot-uncache-file-1-nvm-tp4086863p4140598.html
Sent from the Solr - User mailing list archive at Nabble.com.
)
at
org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:216)
Hi Greg, have you figured it out. I have the same problem...
rafal
--
View this message in context:
http://lucene.472066.n3.nabble.com/Shard-splitting-error-cannot-uncache-file-1-nvm-tp4086863p4110414.html
Sent from
I have a test system where I have a index of 15M documents in one shard
that I would like to split in two. I've tried it four times now. I have a
stand-alone zookeeper running on the same machine.
The end result is that I have two new shards with state construction, and
each has one replica which
Hello Kalle,
we noticed the same problem some weeks ago:
http://lucene.472066.n3.nabble.com/Share-splitting-at-23-million-documents-gt-OOM-td4085064.html
Would be interesting to hear if there is more positive feedback this time.
We finally concluded that it may be worth to start with many
Hi Kalle,
The problem here is that certain actions are taking too long causing the
split process to terminate in between. For example, a commit on the parent
shard leader took 83 seconds in your case but the read timeout value is set
to 60 seconds only. We actually do not need to open a searcher
I was wrong in saying that we don't need to open a searcher, we do. I
committed a fix in SOLR-5314 to use soft commits instead of hard commits. I
also increased the read time out value. Both of these together will reduce
the likelyhood of such a thing happening.
I haven't been able to successfully split a shard with Solr 4.4.0
If I have an empty index, or all documents would go to one side of the
split, I hit SOLR-5144. But if I avoid that case, I consistently get
this error:
290391 [qtp243983770-60] INFO
/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084642.html
Sent from the Solr - User mailing list archive at Nabble.com.
I am also getting same error when performing shard splitting using solr 4.4.0
--
View this message in context:
http://lucene.472066.n3.nabble.com/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084177.html
Sent from the Solr - User mailing list archive at Nabble.com.
I've simplified things from my previous email, and I'm still seeing errors.
Using solr 4.4.0 with two nodes, starting with a single shard. Collection
is named marin, host names are dumbo and solrcloud1. I bring up an empty
cloud and index 50 documents. I can query them and everything looks
this message in context:
http://lucene.472066.n3.nabble.com/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084143.html
Sent from the Solr - User mailing list archive at Nabble.com.
...@marinsoftware.comwrote:
Howdy,
I'm trying to test shard splitting, and it's not working for me. I've got
a 4 node cloud with a single collection and 2 shards.
I've indexed 170k small documents, and I'm using the compositeId router,
with an internal client id as the shard key, with 4 distinct values
across
Oops, I somehow forgot to mention that. The errors I'm seeing are with the
release version of Solr 4.4.0. I mentioned 4.1.0 as that's what we
currently have in prod, and we want to upgrade to 4.4.0 so we can do shard
splitting. Towards that end, I'm testing shard splitting in 4.4.0 and
seeing
Howdy,
I'm trying to test shard splitting, and it's not working for me. I've got
a 4 node cloud with a single collection and 2 shards.
I've indexed 170k small documents, and I'm using the compositeId router,
with an internal client id as the shard key, with 4 distinct values
across the data set
Hi,
Imagine a (common) situation where you use document routing and you
end up with 1 large shards (e.g. 1 large user with lots of docs).
Shard splitting will help here, because we can break up that 1 shard
in 2 smaller shards (and maybe do that recursively to make shards
sufficiently small
).
Shard splitting will help here, because we can break up that 1 shard
in 2 smaller shards (and maybe do that recursively to make shards
sufficiently small).
But what happens with document routing after a big shard is split?
I assume new docs keep going to just one of the 2 new shards, right
On Jun 18, 2013, at 12:25 PM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:
Hi,
Imagine a (common) situation where you use document routing and you
end up with 1 large shards (e.g. 1 large user with lots of docs).
Shard splitting will help here, because we can break up that 1 shard
can use
shard splitting to increase the number of shards.
On Tue, Jun 11, 2013 at 10:53 AM, Mingfeng Yang mfy...@wisewindow.comwrote:
Hi Shalin,
Do you mean that we can do 1-2, 2-4, 4-8 to get 8 shards eventually?
After splitting, if we want to set up a solrcloud with all 8 shards, how
shall
From the solr wiki, I saw this command (
http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=collection_nameshard=shardId)
which split one index into 2 shards. However, is there someway to split
into more shards?
Thanks,
Ming-
No, it is hard coded to split into two shards only. You can call it
recursively on a sub shard to split into more pieces. Please note that some
serious bugs were found in that command which will be fixed in the next
(4.3.1) release of Solr.
On Tue, Jun 11, 2013 at 9:43 AM, Mingfeng Yang
Hi Shalin,
Do you mean that we can do 1-2, 2-4, 4-8 to get 8 shards eventually?
After splitting, if we want to set up a solrcloud with all 8 shards, how
shall we allocate the shards then?
Thanks,
Ming-
On Mon, Jun 10, 2013 at 9:55 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
No,
Hi
I tried to split a shard but it failed. If I try to do it again it does
not start again.
I see the to extra shards in /collections/messages/leader_elect/ and
/collections/messages/leaders/
How can I fix this?
root@solr07-dcg:/solr/messages_shard3_replica2# curl
clusterstate.json is now reporting shard3 as inactive. Any idea how to
change clusterstate.json manually from commandline?
On 05/22/2013 08:59 AM, Arkadi Colson wrote:
Hi
I tried to split a shard but it failed. If I try to do it again it
does not start again.
I see the to extra shards in
You will need to edit it manually and upload using a zookeeper client, you can
use kazoo, it's very easy to use.
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, May 22, 2013 at 10:04 AM, Arkadi Colson wrote:
clusterstate.json is now reporting shard3 as
67 matches
Mail list logo