Re: SolrCloud 5.1 startup looking for standalone config

2015-06-05 Thread tuxedomoon
I would need to look at the code to figure out how it works, but I would imagine that the shards are shuffled randomly among the hosts so that multiple collections will be evenly distributed across the cluster. It would take me quite a while to familiarize myself with the code before I could

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread tuxedomoon
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port)

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
I ran this command with Solr hosts s1 s2 running. http://s1:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=mycollection_cloud_confcreateNodeSet=s1:8983,s2:8983 I referred to this link http://heliosearch.org/solrcloud-assigning-nodes-machines/

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
ok thanks, continuing... numShards in SOLR_OPTS isn't a good idea, what happens if you want to create a collection with 5 shards?) yes I was following my old pattern CATALINA_OPTS=${CATALINA_OPTS} -DnumShards=n down the nodes and nuke the directories you created by hand and bring the nodes

SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon
I followed these steps and I am unable to launch in cloud mode. 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2 3. uploaded a configset to zk1 (solr home is /volume/solr/data)

Re: Reindex of document leaves old fields behind

2015-05-22 Thread tuxedomoon
This is fixed. My SolrJ client was putting a JSON object into a multivalued field in the SolrInputDocument. Solr returned a 0 status code but did not add the bad object, instead it performed what looks like an atomic index as described above. Once I removed the illegal JSON object from the

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs. I just ran the same test via my SolrJ client and result was the same, SolrCloud query always returns correct number of fields. Is there a way to find out which shard and replica a particular document lives on? -- View this message in context:

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem 1. I found one of my 5 zookeeper instances was down 2. I tried another reindex of a bad document but no change on the SOLR side 3. I deleted and reindexed the same doc, that worked (obviously, but at this point I don't know what to expect) -- View

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l If it is implicit then you may have indexed the new document to a different shard, which means that it is now in your index more than once, and which one gets returned may not be predictable. If a document with uniqueKey 1234 is assigned to a shard by SolrCloud, implicit routing won't a

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
let's see the code. simplified code and some comments 1. solrUrl points at leader 1 of 3 leaders, each with a replica 2. createSolrDoc takes a full Mongo doc and returns a valid SolrInputDocument 3. I have done dumps of the returned solrDoc and verified it does not have the unwanted

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment I found from Shawn on Grokbase. If you are trying to use a Map object as the value of a field, that is probably why it is interpreting your add request as an atomic update. If this is the case, and you're doing it

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router configuration. But there is an equal distribution of 240M docs across 5 shards. I think I've been stating I have 3 shards in these posts, I have 5, sorry. How do I know what kind of routing I am using? -- View this

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite I've just used post.sh to index a test doc with 3 fields to leader 1 of my SolrCloud. I then reindexed it with 1 field removed and the query on it shows 2 fields. I repeated this a few times and always get the correct field count from Solr. I'm now wondering if SolrJ is

Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields removed so upon reindexing those fields should be gone in Solr. They are not. So the result is a new doc merged with an old doc rather than a replacement which is what I need. I do not know whether the issue is with

Re: Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
The uniqueKey value is the same. The new documents contain fewer fields than the already indexed ones. Could this cause the updates to be treated as atomic? With the persisting fields treated as un-updated? Routing should be implicit since the collection was created using numShards. Many

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn, I can definitely upgrade to SolrJ 4.x and would prefer that so as to target 4.x cores as well. I'm already on Java 7. One attempt I made was this UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setParam(collection, collectionName);

Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
I have a SolrJ application that reads from a Redis queue and updates different collections based on the message content. New collections are added without my knowledge, so I am creating SolrServer objects on the fly as follows: def solrHost = http://myhost/solr/; (defined at startup)

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn I'm getting the Bad Request again, with the original code snippet I posted, it appears to be an 'illegal' string field. SOLR log - INFO: {add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7

How to direct SOLR 4.9 log output to regular Tomcat logs

2015-03-06 Thread tuxedomoon
I want SOLR 4.9 to log to my rolling tomcat logs like catalina.2015-03-06.log. Instead I'm just getting a solr.log with no timestamp. Maybe this is this just the way it has to be now? I'm also not sure if I need to copy more SOLR jars into my tomcat lib. This is my setup.

Re: Does shard splitting double host count

2015-03-02 Thread tuxedomoon
Shawn, in light of Garth's response below You can't just add a new core to an existing collection. You can add the new node to the cloud, but it won't be part of any collection. You're not going to be able to just slide it in as a 4th shard to an established collection of 3 shards. how is it

Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Great info. Can I ask how much data you are handling with that 6G or 7G heap? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. -- View this message in context:

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
I applied the OPTS you pointed me to, here's the full string: CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark

Re: SolrCloud OOM Problem

2014-08-12 Thread tuxedomoon
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC