Re: SolrCloud 5.1 startup looking for standalone config
I would need to look at the code to figure out how it works, but I would imagine that the shards are shuffled randomly among the hosts so that multiple collections will be evenly distributed across the cluster. It would take me quite a while to familiarize myself with the code before I could figure out where to look. The random assignment is ok, wherever shard3 is created will become node3 for my system. As long as each leader and replica pair remain partnered mycollection_shard1_replica1 -- mycollection_shard1_replica2 mycollection_shard2_replica1 -- mycollection_shard2_replica2 etc Does this remain 'fixed' in Zookeeper once established, so that restarting nodes will not affect their shardn assignment? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud 5.1 startup looking for standalone config
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port) http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr The shard directories do now appear on s1,s2,s3 but the order is different every time I DELETE the collection and rerun the CREATE, right now it is s1: mycollection_shard2_replica1 s2: mycollection_shard3_replica1 s3: mycollection_shard1_replica1 I'll look further at your article but any advice appreciated on controlling what hosts the shards land on. Also are these considered leaders? If so I don't understand the replica1 suffix. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud 5.1 startup looking for standalone config
I ran this command with Solr hosts s1 s2 running. http://s1:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=mycollection_cloud_confcreateNodeSet=s1:8983,s2:8983 I referred to this link http://heliosearch.org/solrcloud-assigning-nodes-machines/ which looks like it is only passing the desired leaders to createNodeSet. But I'm getting this error - Cannot create collection mycollection. Value of maxShardsPerNode is 1, and the number of nodes currently live or live and part of your createNodeSet is 0. This allows a maximum of 0 to be created. Value of numShards is 2 and value of replicationFactor is 1. This requires 2 shards to be created (higher than the allowed number) I get the same error with createNodeSet=s1:8983,s2:8983,s3:8983,s4:8983 with all four Solr hosts running. But the service status command shows that Zookeeper sees all my running nodes Solr process 24603 running on port 8983 { solr_home:/volume/solr/data/, version:5.1.0 1672403 - timpotter - 2015-04-09 10:37:54, startTime:2015-06-02T18:00:06.665Z, uptime:0 days, 0 hours, 4 minutes, 35 seconds, memory:19.6 MB (%4) of 490.7 MB, cloud:{ ZooKeeper:zk1:2181,zk2:2181,k3:2181, liveNodes:4, collections:0}} I was expecting the absent maxShardsPerNode param to default to 1 and give me 2 leaders, 2 replicas. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud 5.1 startup looking for standalone config
ok thanks, continuing... numShards in SOLR_OPTS isn't a good idea, what happens if you want to create a collection with 5 shards?) yes I was following my old pattern CATALINA_OPTS=${CATALINA_OPTS} -DnumShards=n down the nodes and nuke the directories you created by hand and bring the nodes back up yes I did this create the collection via the Collections API CREATE I did this but kept getting not running in SolrCloud mode. Added the -c option to my service script like this su -c SOLR_INCLUDE=$SOLR_ENV $SOLR_INSTALL_DIR/bin/solr $SOLR_CMD -c - $RUNAS and it did start in cloud mode. Is the -c necessary and is that the right place for it? I thought uncommenting the ZK param in solr.in.sh would put it in cloud mode. Reran the CREATE and got a shard1 and shard2 in the GUI cloud view. New directories are arc_search_shard1_replica1 and arc_search_shard2_replica1. Is this because I have only 2 Solr hosts running? I'm used to adding nodes one by one and having the replica assignments start when numShards count is exceeded. Transitioning from 4.2 to 5.1 and it's quite different! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209222.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud 5.1 startup looking for standalone config
I followed these steps and I am unable to launch in cloud mode. 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2 3. uploaded a configset to zk1 (solr home is /volume/solr/data) --- /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost zk1:2181 -confname mycollection_cloud_conf -solrhome /volume/solr/data -confdir /home/ec2-user/mycollection/conf 4. on s1, added these params to solr.in.sh --- ZK_HOST=zk1:2181,zk2:2181,zk3:2181 SOLR_HOST=s1 ZK_CLIENT_TIMEOUT=15000 SOLR_OPTS=$SOLR_OPTS -DnumShards=2 5. on s1 created core directory and file /volume/solr/data/mycollection/core.properties (name=mycollection) 6. repeated steps 4,5 for s2 minus the numShards param Starting the service on s1 gives me mycollection: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycollection: Error loading solr config from /volume/solr/data/mycollection/conf/solrconfig.xml but aren't the config files supposed to be in Zookeeper? Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
This is fixed. My SolrJ client was putting a JSON object into a multivalued field in the SolrInputDocument. Solr returned a 0 status code but did not add the bad object, instead it performed what looks like an atomic index as described above. Once I removed the illegal JSON object from the SolrInputDocument a regular document replacement occurred and my unwanted fields were removed in Solr. Is this a known behaviour, for Solr to switch into atomic update mode based on attributes of the SolrInputDocument? -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4207164.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
I'm relying on an autocommit of 60 secs. I just ran the same test via my SolrJ client and result was the same, SolrCloud query always returns correct number of fields. Is there a way to find out which shard and replica a particular document lives on? -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
a few further clues to this unresolved problem 1. I found one of my 5 zookeeper instances was down 2. I tried another reindex of a bad document but no change on the SOLR side 3. I deleted and reindexed the same doc, that worked (obviously, but at this point I don't know what to expect) -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206946.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
l If it is implicit then you may have indexed the new document to a different shard, which means that it is now in your index more than once, and which one gets returned may not be predictable. If a document with uniqueKey 1234 is assigned to a shard by SolrCloud, implicit routing won't a reindex of 1234 be assigned to the same shard? If not you'd have dups all over the cluster. -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206849.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
let's see the code. simplified code and some comments 1. solrUrl points at leader 1 of 3 leaders, each with a replica 2. createSolrDoc takes a full Mongo doc and returns a valid SolrInputDocument 3. I have done dumps of the returned solrDoc and verified it does not have the unwanted fields SolrServer solrServer = new HttpSolrServer(solrUrl); SolrInputDocument solrDoc = solrDocFactory.createSolrDoc(mongoDoc, dbName); UpdateResponse uresponse = solrServer.add(solrDoc); issue a query on some of the unique ids in question SolrCloud is returning only 1 document per uniqueKey Did you push your schema up to Zookeeper and reload (or restart) your collection before re-indexing things? no. the config was pushed up to Zookeeper only once a few months ago. The documents in question were updated in Mongo and given an updated create_date. Based on this new create_date my SolrJ client detects and reindexes them. are you sure the documents are actually getting indexed and that the update is succeeding? yes, I see a new value in the timestamp field each time I reindex -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206841.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
I'm posting the fields from one of my problem document, based on this comment I found from Shawn on Grokbase. If you are trying to use a Map object as the value of a field, that is probably why it is interpreting your add request as an atomic update. If this is the case, and you're doing it because you have a multivalued field, you can use a List object rather than a Map. This is just a solrDoc.toString() with linebreaks where commas were. Maybe some of these are being seen as map fields by SOLR. = SolrInputDocument[ mynamespaces_s_mv=[drama], changedates_s_mv=[Tue May 19 17:21:26 EDT 2015, Thu Dec 30 19:00:00 EST ], networks_t_mv=[{ abcitem-id : 288578fd-6596-47bc-af95-80daecd1f24a , abccontentType : Standard:SocialHandle , SocialNetwork : { $uuid : 73553c4c-4919-4ba9-b16c-fb340f3e4c31} , Handle : in my imaginationseries}], links_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} , { $uuid : 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , { $uuid : 150e43ed-9ebe-41b4-86cc-bdf4885a50fe} , { $uuid : e20b0040-561f-4c34-9dd3-df85250b5a5b} , { $uuid : 0cff75d0-4f32-46c9-9092-60eec2dc847a} , { $uuid : 73553c4c-4919-4ba9-b16c-fb340f3e4c31}], ratings_t_mv=[{ abcitem-id : 56058649-579a-4160-9439-e59448eb3dff , abccontentType : Standard:TVPG , Rating : { $uuid : 150e43ed-9ebe-41b4-86cc-bdf4885a50fe}}], title_ci_t=in my imagination, urlkey_s=in-my imagination, title_cs_t=In My Imagination, dp2_1_s_mv=[ { _id : { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} , _rules : [ { _startDate : { $date : 2015-03-23T14:58:00.000Z} , _endDate : { $date : -12-31T00:00:00.000Z} , _r : { $uuid : 47b6b31d-d690-437a-9bab-6eeb7be3c8a4} , _p : { $uuid : d478874f-8fc7-4b3d-97f3-f7e63222d633} , _o : { $uuid : 983b6ae9-7882-4af8-bb2f-cff342be99b3} , _a : null }]}], seriestype_s=e20b0040-561f-4c34-9dd3-df85250b5a5b, shortid_s=x5jqqf, i shorttitle_t=In My Imagination, uuid_s=90a1fbbf-ddf8-47a7-9f00-55f05e7dc297, status_s=DEFAULT, updatedby_s=maceirar, description_t=sometext, review_s_mv=[{ abcpublished : { $date : 2015-05-19T21:21:30.930Z} , abcpublishedBy : jelly , abctargetEnvironment : entertainment-staging , abcrequestId : { $uuid : 56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment : fishing , abcstate : true}, { abcpublished : { $date : 2015-05-19T21:21:31.731Z} , abcpublishedBy : jelly , abctargetEnvironment : myshow-live , abcrequestId : { $uuid : 56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment : myshow-staging , abcstate : true}], sorttitle_t=In My Imagination, images_s_mv=[ { $uuid : 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , { $uuid : 0cff75d0-4f32-46c9-9092-60eec2dc847a}], title_ci_s=in my imagination, firmuuids_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e}], id=mongo-v2.abcnservices.com-fishing-90a1fbbf-ddf8-47a7-9f00-55f05e7dc297, timestamp=Thu May 21 17:29:58 EDT 2015 ] -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206963.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
I'm doing all my index to leader 1 and have not specified any router configuration. But there is an equal distribution of 240M docs across 5 shards. I think I've been stating I have 3 shards in these posts, I have 5, sorry. How do I know what kind of routing I am using? -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
OK it is composite I've just used post.sh to index a test doc with 3 fields to leader 1 of my SolrCloud. I then reindexed it with 1 field removed and the query on it shows 2 fields. I repeated this a few times and always get the correct field count from Solr. I'm now wondering if SolrJ is somehow involved in performing an atomic update rather than replacement. I will try the above test via SolrJ. -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206886.html Sent from the Solr - User mailing list archive at Nabble.com.
Reindex of document leaves old fields behind
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields removed so upon reindexing those fields should be gone in Solr. They are not. So the result is a new doc merged with an old doc rather than a replacement which is what I need. I do not know whether the issue is with my SolrJ client, Solr config or something else. -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reindex of document leaves old fields behind
The uniqueKey value is the same. The new documents contain fewer fields than the already indexed ones. Could this cause the updates to be treated as atomic? With the persisting fields treated as un-updated? Routing should be implicit since the collection was created using numShards. Many request for the same document with cache busting produce the same unwanted fields, so I doubt the correct one is hiding somewhere. I can also see the timestamp going up with each reindex. -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can a single SolrServer instance update multiple collections?
@Shawn, I can definitely upgrade to SolrJ 4.x and would prefer that so as to target 4.x cores as well. I'm already on Java 7. One attempt I made was this UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setParam(collection, collectionName); updateRequest.setMethod(SolrRequest.METHOD.POST); updateRequest.add(solrdoc); UpdateResponse updateResponse = updateRequest.process(solrServer); but I kept getting Bad Request which I suspect was a SOLR/SolrJ version conflict. I'm all ears! Dan -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192520.html Sent from the Solr - User mailing list archive at Nabble.com.
Can a single SolrServer instance update multiple collections?
I have a SolrJ application that reads from a Redis queue and updates different collections based on the message content. New collections are added without my knowledge, so I am creating SolrServer objects on the fly as follows: def solrHost = http://myhost/solr/; (defined at startup) def solrTarget = solrHost + collectionName SolrServer solrServer = new CommonsHttpSolrServer(solrTarget) updateResponse = solrServer.add(solrdoc) This does work but obviously creates a new CommonsHttpSolrServer instance for each message. I assume GC will eliminate these but is there a way to do this with a single SolrServer object? The SOLR host is version 3.5 and I am using the 3.5 jars for my application (not sure if that is necessary). -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can a single SolrServer instance update multiple collections?
@Shawn I'm getting the Bad Request again, with the original code snippet I posted, it appears to be an 'illegal' string field. SOLR log - INFO: {add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7 Mar 12, 2015 12:15:09 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30] multiple values encountered for non multiValued field image_url_s: [mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg, mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg] at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) SolrJ Log shows the doc being sent (this is the offending field only) field name=image_url_s/field I will investigate on the feeds side, the existing SolrJ code is not the culprit. But I'd still like a more elegant solution. If a SolrJ 5 client can talk to a 3.5 host I'm willing to go there. I know I'm not the only one who would like to address collections on the fly. thx Dan -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192545.html Sent from the Solr - User mailing list archive at Nabble.com.
How to direct SOLR 4.9 log output to regular Tomcat logs
I want SOLR 4.9 to log to my rolling tomcat logs like catalina.2015-03-06.log. Instead I'm just getting a solr.log with no timestamp. Maybe this is this just the way it has to be now? I'm also not sure if I need to copy more SOLR jars into my tomcat lib. This is my setup. tomcat6/conf/log4j.properties log4j.rootLogger=debug, R log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=${catalina.home}/logs/tomcat.log log4j.appender.R.MaxFileSize=10MB log4j.appender.R.MaxBackupIndex=10 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.catalina=DEBUG, R log4j.logger.org.apache.catalina.core.ContainerBase.[Catalina].[localhost]=DEBUG, R log4j.logger.org.apache.catalina.core=DEBUG, R log4j.logger.org.apache.catalina.session=DEBUG, R tomcat6/conf/logging.properties - handlers = 1catalina.org.apache.juli.FileHandler, 2localhost.org.apache.juli.FileHandler, 3manager.org.apache.juli.FileHandler, 4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler .handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler 1catalina.org.apache.juli.FileHandler.level = FINE 1catalina.org.apache.juli.FileHandler.directory = /data/tomcatlogs 1catalina.org.apache.juli.FileHandler.prefix = catalina. 2localhost.org.apache.juli.FileHandler.level = FINE 2localhost.org.apache.juli.FileHandler.directory = /data/tomcatlogs 2localhost.org.apache.juli.FileHandler.prefix = localhost. 3manager.org.apache.juli.FileHandler.level = FINE 3manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs 3manager.org.apache.juli.FileHandler.prefix = manager. 4host-manager.org.apache.juli.FileHandler.level = FINE 4host-manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs 4host-manager.org.apache.juli.FileHandler.prefix = host-manager. java.util.logging.ConsoleHandler.level = FINE java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers = 3manager.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers = 4host-manager.org.apache.juli.FileHandler copied solr-4.9.0/example/lib/ext/*.jar to tomcat6/lib, not the solrj-lib + dist jars as some tutorials suggested -- jcl-over-slf4j-1.7.6.jar jul-to-slf4j-1.7.6.jar log4j-1.2.17.jar slf4j-api-1.7.6.jar slf4j-log4j12-1.7.6.jar copied ./solr-4.9.0/example/resources/log4j.properties to tomcat6/lib and pointed solr.log to my chosen directory. I also have a tomcat6/conf/log4j.properties and don't know if I should delete it. -- # Logging level solr.log=/data/tomcatlogs log4j.rootLogger=INFO, file, CONSOLE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x \u2013 %m%n #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format log4j.appender.file.File=${solr.log}/solr.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd HH:mm:ss.SSS}; %C; %m\n log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.apache.hadoop=WARN # set to INFO to enable infostream log messages log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-direct-SOLR-4-9-log-output-to-regular-Tomcat-logs-tp4191502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does shard splitting double host count
Shawn, in light of Garth's response below You can't just add a new core to an existing collection. You can add the new node to the cloud, but it won't be part of any collection. You're not going to be able to just slide it in as a 4th shard to an established collection of 3 shards. how is it that you say I can just start up new hosts, especially without modfying the numShards parameter from 3 to 4? And then probably reindexing because the other options look risky (my company has no backup system). -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4190320.html Sent from the Solr - User mailing list archive at Nabble.com.
Does shard splitting double host count
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards resulting in a total of 6 subshards, in order to keep all shards relatively equal? And since host memory is the problem I'd be migrating subshards to new hosts. So it seems I'd be going from 6 hosts to 12. Are these assumptions correct or is there a way to avoid doubling my host count? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does shard splitting double host count
What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for correct shard/replica assignment f) start indexing again So shards 1,2,3 start with 33% of the docs each. As I start indexing new documents get sharded at 25% per shard. If I reindex a document that exists already in shard2, does it remain in shard2 or could it migrate to another shard, thus removing it from shard2. I'm looking for a migration strategy to achieve 25% docs per shard. I would also consider deleting docs by daterange from shards1,2,3 and reindexing them to redistribute evenly. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does shard splitting double host count
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
Great info. Can I ask how much data you are handling with that 6G or 7G heap? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152717.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
I applied the OPTS you pointed me to, here's the full string: CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts jConsole is now showing lower heap usage. It had been climbing to 12G consistently, now it is only spiking to 10G every 10 minutes or so. Here's my top output === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4250 root 20 0 129g 14g 1.9g S2.021.317:40.61 java -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Doesn't this mean I am starting with 5G and never going over 5G? I've seen a few of those univerted multi-valued field OOMs already on the upgraded host. Thanks Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152585.html Sent from the Solr - User mailing list archive at Nabble.com.