I would need to look at the code to figure out how it works, but I would
imagine that the shards are shuffled randomly among the hosts so that
multiple collections will be evenly distributed across the cluster. It
would take me quite a while to familiarize myself with the code before I
could
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param
for each of the 4 hosts, as in
SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN
environment which might be the problem.
This command now works (leaving off port)
I ran this command with Solr hosts s1 s2 running.
http://s1:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=mycollection_cloud_confcreateNodeSet=s1:8983,s2:8983
I referred to this link
http://heliosearch.org/solrcloud-assigning-nodes-machines/
ok thanks, continuing...
numShards in SOLR_OPTS isn't a good idea, what happens if you want to
create a collection with 5 shards?)
yes I was following my old pattern CATALINA_OPTS=${CATALINA_OPTS}
-DnumShards=n
down the nodes and nuke the directories you created by hand and bring the
nodes
I followed these steps and I am unable to launch in cloud mode.
1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3
2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2
3. uploaded a configset to zk1 (solr home is /volume/solr/data)
This is fixed. My SolrJ client was putting a JSON object into a multivalued
field in the SolrInputDocument. Solr returned a 0 status code but did not
add the bad object, instead it performed what looks like an atomic index as
described above. Once I removed the illegal JSON object from the
I'm relying on an autocommit of 60 secs.
I just ran the same test via my SolrJ client and result was the same,
SolrCloud query always returns correct number of fields.
Is there a way to find out which shard and replica a particular document
lives on?
--
View this message in context:
a few further clues to this unresolved problem
1. I found one of my 5 zookeeper instances was down
2. I tried another reindex of a bad document but no change on the SOLR side
3. I deleted and reindexed the same doc, that worked (obviously, but at this
point I don't know what to expect)
--
View
l If it is implicit then
you may have indexed the new document to a different shard, which means
that it is now in your index more than once, and which one gets returned
may not be predictable.
If a document with uniqueKey 1234 is assigned to a shard by SolrCloud,
implicit routing won't a
let's see the code.
simplified code and some comments
1. solrUrl points at leader 1 of 3 leaders, each with a replica
2. createSolrDoc takes a full Mongo doc and returns a valid
SolrInputDocument
3. I have done dumps of the returned solrDoc and verified it does not have
the unwanted
I'm posting the fields from one of my problem document, based on this comment
I found from Shawn on Grokbase.
If you are trying to use a Map object as the value of a field, that is
probably why it is interpreting your add request as an atomic update.
If this is the case, and you're doing it
I'm doing all my index to leader 1 and have not specified any router
configuration. But there is an equal distribution of 240M docs across 5
shards. I think I've been stating I have 3 shards in these posts, I have 5,
sorry.
How do I know what kind of routing I am using?
--
View this
OK it is composite
I've just used post.sh to index a test doc with 3 fields to leader 1 of my
SolrCloud. I then reindexed it with 1 field removed and the query on it
shows 2 fields. I repeated this a few times and always get the correct
field count from Solr.
I'm now wondering if SolrJ is
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields
removed so upon reindexing those fields should be gone in Solr. They are
not. So the result is a new doc merged with an old doc rather than a
replacement which is what I need.
I do not know whether the issue is with
The uniqueKey value is the same.
The new documents contain fewer fields than the already indexed ones. Could
this cause the updates to be treated as atomic? With the persisting fields
treated as un-updated?
Routing should be implicit since the collection was created using numShards.
Many
@Shawn,
I can definitely upgrade to SolrJ 4.x and would prefer that so as to target
4.x cores as well. I'm already on Java 7.
One attempt I made was this
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.setParam(collection, collectionName);
I have a SolrJ application that reads from a Redis queue and updates
different collections based on the message content. New collections are
added without my knowledge, so I am creating SolrServer objects on the fly
as follows:
def solrHost = http://myhost/solr/; (defined at startup)
@Shawn
I'm getting the Bad Request again, with the original code snippet I posted,
it appears to be an 'illegal' string field.
SOLR log
-
INFO:
{add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7
I want SOLR 4.9 to log to my rolling tomcat logs like
catalina.2015-03-06.log. Instead I'm just getting a solr.log with no
timestamp. Maybe this is this just the way it has to be now?
I'm also not sure if I need to copy more SOLR jars into my tomcat lib.
This is my setup.
Shawn, in light of Garth's response below
You can't just add a new core to an existing collection. You can add the
new node to the cloud, but it won't be part of any collection. You're not
going to be able to just slide it in as a 4th shard to an established
collection of 3 shards.
how is it
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have to reindex to a 4.3+ version.
If I had that feature would I need to split each shard into 2 subshards
What about adding one new leader/replica pair? It seems that would entail
a) creating the r3.large instances and volumes
b) adding 2 new Zookeeper hosts?
c) updating my Zookeeper configs (new hosts, new ids, new SOLR config)
d) restarting all ZKs
e) restarting SOLR hosts in sequence needed for
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts.
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html
Sent from the Solr - User mailing list archive at Nabble.com.
Great info. Can I ask how much data you are handling with that 6G or 7G
heap?
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html
Sent from the Solr - User mailing list archive at Nabble.com.
Have you used a queue to intercept queries and if so what was your
implementation? We are indexing huge amounts of data from 7 SolrJ instances
which run independently, so there's a lot of concurrent indexing.
--
View this message in context:
I applied the OPTS you pointed me to, here's the full string:
CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to
ask this but can you recommend Java memory and GC settings for 90G data and
the above memory? Currently I have
CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms5120m -Xmx5120m -XX:+UseParNewGC
27 matches
Mail list logo