Re: Optimal configuration for high throughput indexing
Hi Shawn, Thanks for your inputs. The 12GB is for solr. I did read through your wiki and your G1 related recommended settings are already included. Tried a lower memory config (7G) as well and it did not result in any better results. Right now, in the process of changing the updates to use Solrj CloudSolrServer and testing it. Thanks Vinay On 4 May 2015 at 16:09, Shawn Heisey apa...@elyograg.org wrote: On 5/4/2015 2:36 PM, Vinay Pothnis wrote: But nonetheless, we will give the latest solrJ client + cloudSolrServer a try. * Yes, the documents are pretty small. * We are using G1 collector and there are no major GCs, but however, there are a lot of minor GCs sometimes going upto 2s per minute overall. * We are allocating 12G of memory. * Query rate: 3750 TPS (transactions per second) * I need to get the exact rate for insert/updates. I will make the solrJ client change first and give it a test. Whether that 12GB heap size is for Solr itself or for your client code, with a heap that large, you should be doing more tuning than simply turning on G1GC. I have spent quite a lot of time working on GC tuning for Solr, and the results of that work can be found here: http://wiki.apache.org/solr/ShawnHeisey I cannot claim that these are the best options you can find for Solr, but they've worked well for me, and for others. Thanks, Shawn
Re: Optimal configuration for high throughput indexing
Hi Erick, Thanks for your inputs. I think long before we had made a conscious decision to skip solrJ client and use plain http. I think it might have been because at the time solrJ client was queueing update in its memory or something. But nonetheless, we will give the latest solrJ client + cloudSolrServer a try. * Yes, the documents are pretty small. * We are using G1 collector and there are no major GCs, but however, there are a lot of minor GCs sometimes going upto 2s per minute overall. * We are allocating 12G of memory. * Query rate: 3750 TPS (transactions per second) * I need to get the exact rate for insert/updates. I will make the solrJ client change first and give it a test. Thanks Vinay On 3 May 2015 at 09:37, Erick Erickson erickerick...@gmail.com wrote: First, you shouldn't be using HttpSolrClient, use CloudSolrServer (CloudSolrClient in 5.x). That takes the ZK address and routes the docs to the leader, reducing the network hops docs have to go through. AFAIK, in cloud setups it is in every way superior to http. I'm guessing your docs aren't huge. You haven't really told us what high indexing rates and high query rates are in your environment, so it's hard to say much. For comparison I get 2-3K docs/sec on my laptop (no query load though). The most frequent problem for nodes going into recovery in this scenario is the ZK timeout being exceeded. This is often triggered by excessive GC pauses, some more details would help here: How much memory are you allocating to Solr? Have you turned on GC logging to see whether you're getting stop the world GC pauses? What rates _are_ you seeing? Personally, I'd concentrate on the nodes going into recovery before anything else. Until that's fixed any other things you do will not be predictive of much. BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes that's too big, sometimes too small but it seems pretty reasonable most of the time. Best, Erick On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis poth...@gmail.com wrote: Hello, I have a usecase with the following characteristics: - High index update rate (adds/updates) - High query rate - Low index size (~800MB for 2.4Million docs) - The documents that are created at the high rate eventually expire and are deleted regularly at half hour intervals I currently have a solr cloud set up with 1 shard and 4 replicas. * My index updates are sent to a VIP/loadbalancer (round robins to one of the 4 solr nodes) * I am using http client to send the updates * Using batch size of 100 and 8 to 10 threads sending the batch of updates to solr. When I try to run tests to scale out the indexing rate, I see the following: * solr nodes go into recovery * updates are taking really long to complete. As I understand, when a node receives an update: * If it is the leader, it forwards the update to all the replicas and waits until it receives the reply from all of them before replying back to the client that sent the reply. * If it is not the leader, it forwards the update to the leader, which THEN does the above steps mentioned. How do I go about scaling the index updates: * As I add more replicas, my updates would get slower and slower? * Is there a way I can configure the leader to wait for say N out of M replicas only? * Should I be targeting the updates to only the leader? * Any other approach i should be considering? Thanks Vinay
Optimal configuration for high throughput indexing
Hello, I have a usecase with the following characteristics: - High index update rate (adds/updates) - High query rate - Low index size (~800MB for 2.4Million docs) - The documents that are created at the high rate eventually expire and are deleted regularly at half hour intervals I currently have a solr cloud set up with 1 shard and 4 replicas. * My index updates are sent to a VIP/loadbalancer (round robins to one of the 4 solr nodes) * I am using http client to send the updates * Using batch size of 100 and 8 to 10 threads sending the batch of updates to solr. When I try to run tests to scale out the indexing rate, I see the following: * solr nodes go into recovery * updates are taking really long to complete. As I understand, when a node receives an update: * If it is the leader, it forwards the update to all the replicas and waits until it receives the reply from all of them before replying back to the client that sent the reply. * If it is not the leader, it forwards the update to the leader, which THEN does the above steps mentioned. How do I go about scaling the index updates: * As I add more replicas, my updates would get slower and slower? * Is there a way I can configure the leader to wait for say N out of M replicas only? * Should I be targeting the updates to only the leader? * Any other approach i should be considering? Thanks Vinay
clarification on index-to-ram ratio
Hello All, The documentation and general feedback on the mailing list suggest the following: *... Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB ...* http://wiki.apache.org/solr/SolrPerformanceProblems#General_information So, when we say index size does it include ALL the replicas or just one of the replica? Say for example, if the solr instance had 2 replicas each of size 8GB, should we consider 16GB as our index size or just 8GB - for the above index-ram-ratio consideration? Thanks Vinay
Re: clarification on index-to-ram ratio
Thanks! And yes, the replica belongs to a different shard - not the same data. -Vinay On 19 June 2014 11:21, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vinay Pothnis [poth...@gmail.com] wrote: *... Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB ...* So, when we say index size does it include ALL the replicas or just one of the replica? Say for example, if the solr instance had 2 replicas each of size 8GB, should we consider 16GB as our index size or just 8GB - for the above index-ram-ratio consideration? 16GB, according to the above principle. Enough RAM to hold all index data on storage. Two things though, 1) If you have replicas of the same data on the same machine, I hope that you have them on separate physical drives. If not, it is just wasted disk cache with no benefits. 2) The general advice is only really usable when we're either talking fairly small indexes on spinning drives or there is a strong need for the absolute lowest latency possible. As soon as we scale up and do not have copious amounts of money, solid state drives provides much better bang for the buck than a spinning drives + RAM combination. - Toke Eskildsen
Re: deleting large amount data from solr cloud
Thanks a lot Shalin! On 16 April 2014 21:26, Shalin Shekhar Mangar shalinman...@gmail.comwrote: You can specify maxSegments parameter e.g. maxSegments=5 while optimizing. On Thu, Apr 17, 2014 at 6:46 AM, Vinay Pothnis poth...@gmail.com wrote: Hello, Couple of follow up questions: * When the optimize command is run, looks like it creates one big segment (forceMerge = 1). Will it get split at any point later? Or will that big segment remain? * Is there anyway to maintain the number of segments - but still merge to reclaim the deleted documents space? In other words, can I issue forceMerge=20? If so, how would the command look like? Any examples for this? Thanks Vinay On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote: Thank you Erick! Yes - I am using the expunge deletes option. Thanks for the note on disk space for the optimize command. I should have enough space for that. What about the heap space requirement? I hope it can do the optimize with the memory that is allocated to it. Thanks Vinay On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote: The optimize should, indeed, reduce the index size. Be aware that it may consume 2x the disk space. You may also try expungedeletes, see here: https://wiki.apache.org/solr/UpdateXmlMessages Best, Erick On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com wrote: Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to see the decrease in overall index size even after 1-2 hours. I can see a bunch of .del files in the index directory, but the it does not seem to get cleaned up. Is there anyway to monitor/follow the progress of index compaction? Also, does triggering optimize from the admin UI help to compact the index size on disk? Thanks Vinay On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote: Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1* *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] No value set for 'pollInterval'. Timer Task not started.* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Master's generation: 1108645* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Slave's generation: 1108627* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Starting replication process* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] Number of files in latest index in master: 814* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.core.CachingDirectoryFactory] return new directory for /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] Starting download
Re: deleting large amount data from solr cloud
Thanks Erick! On 17 April 2014 08:35, Erick Erickson erickerick...@gmail.com wrote: bq: Will it get split at any point later? Split is a little ambiguous here. Will it be copied into two or more segments? No. Will it disappear? Possibly. Eventually this segment will be merged if you add enough documents to the system. Consider this scenario: you add 1M docs to your system and it results in 10 segments (numbers made up). Then you optimize, and you have 1M docs in 1 segment. Fine so far. Now you add 750K of those docs over again, which will delete them from the 1 big segment. Your merge policy will, at some point, select this segment to merge and it'll disappear... FWIW, er...@pedantic.com On Thu, Apr 17, 2014 at 7:24 AM, Vinay Pothnis poth...@gmail.com wrote: Thanks a lot Shalin! On 16 April 2014 21:26, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You can specify maxSegments parameter e.g. maxSegments=5 while optimizing. On Thu, Apr 17, 2014 at 6:46 AM, Vinay Pothnis poth...@gmail.com wrote: Hello, Couple of follow up questions: * When the optimize command is run, looks like it creates one big segment (forceMerge = 1). Will it get split at any point later? Or will that big segment remain? * Is there anyway to maintain the number of segments - but still merge to reclaim the deleted documents space? In other words, can I issue forceMerge=20? If so, how would the command look like? Any examples for this? Thanks Vinay On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote: Thank you Erick! Yes - I am using the expunge deletes option. Thanks for the note on disk space for the optimize command. I should have enough space for that. What about the heap space requirement? I hope it can do the optimize with the memory that is allocated to it. Thanks Vinay On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote: The optimize should, indeed, reduce the index size. Be aware that it may consume 2x the disk space. You may also try expungedeletes, see here: https://wiki.apache.org/solr/UpdateXmlMessages Best, Erick On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com wrote: Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to see the decrease in overall index size even after 1-2 hours. I can see a bunch of .del files in the index directory, but the it does not seem to get cleaned up. Is there anyway to monitor/follow the progress of index compaction? Also, does triggering optimize from the admin UI help to compact the index size on disk? Thanks Vinay On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote: Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout
Re: deleting large amount data from solr cloud
Thank you Erick! Yes - I am using the expunge deletes option. Thanks for the note on disk space for the optimize command. I should have enough space for that. What about the heap space requirement? I hope it can do the optimize with the memory that is allocated to it. Thanks Vinay On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote: The optimize should, indeed, reduce the index size. Be aware that it may consume 2x the disk space. You may also try expungedeletes, see here: https://wiki.apache.org/solr/UpdateXmlMessages Best, Erick On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com wrote: Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to see the decrease in overall index size even after 1-2 hours. I can see a bunch of .del files in the index directory, but the it does not seem to get cleaned up. Is there anyway to monitor/follow the progress of index compaction? Also, does triggering optimize from the admin UI help to compact the index size on disk? Thanks Vinay On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote: Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1* *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] No value set for 'pollInterval'. Timer Task not started.* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Master's generation: 1108645* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Slave's generation: 1108627* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Starting replication process* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] Number of files in latest index in master: 814* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.core.CachingDirectoryFactory] return new directory for /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] Starting download to NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@ /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe; maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true* So, it looks like the number of updates is too huge for the regular replication and then it goes into full copy of index. And since our index size is very huge (350G), this is causing the cluster to go into recovery mode forever - trying to copy that huge index. I also read in some thread http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthatthere is a limit of 100 documents. I wonder if this has been updated to make that configurable since that thread. If not, the only option I see is to do a trickle delete of 100 documents per second or something. Also - the other suggestion of using distributed=false might not help because the issue currently is that the replication is going to full copy
Re: deleting large amount data from solr cloud
Hello, Couple of follow up questions: * When the optimize command is run, looks like it creates one big segment (forceMerge = 1). Will it get split at any point later? Or will that big segment remain? * Is there anyway to maintain the number of segments - but still merge to reclaim the deleted documents space? In other words, can I issue forceMerge=20? If so, how would the command look like? Any examples for this? Thanks Vinay On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote: Thank you Erick! Yes - I am using the expunge deletes option. Thanks for the note on disk space for the optimize command. I should have enough space for that. What about the heap space requirement? I hope it can do the optimize with the memory that is allocated to it. Thanks Vinay On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote: The optimize should, indeed, reduce the index size. Be aware that it may consume 2x the disk space. You may also try expungedeletes, see here: https://wiki.apache.org/solr/UpdateXmlMessages Best, Erick On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com wrote: Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to see the decrease in overall index size even after 1-2 hours. I can see a bunch of .del files in the index directory, but the it does not seem to get cleaned up. Is there anyway to monitor/follow the progress of index compaction? Also, does triggering optimize from the admin UI help to compact the index size on disk? Thanks Vinay On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote: Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1* *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] No value set for 'pollInterval'. Timer Task not started.* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Master's generation: 1108645* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Slave's generation: 1108627* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Starting replication process* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] Number of files in latest index in master: 814* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.core.CachingDirectoryFactory] return new directory for /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] Starting download to NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@ /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe; maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true* So, it looks like the number of updates is too huge for the regular replication and then it goes into full copy of index. And since our index size is very huge (350G), this is causing the cluster to go into recovery mode forever - trying to copy that huge
Re: deleting large amount data from solr cloud
Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to see the decrease in overall index size even after 1-2 hours. I can see a bunch of .del files in the index directory, but the it does not seem to get cleaned up. Is there anyway to monitor/follow the progress of index compaction? Also, does triggering optimize from the admin UI help to compact the index size on disk? Thanks Vinay On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote: Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1* *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] No value set for 'pollInterval'. Timer Task not started.* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Master's generation: 1108645* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Slave's generation: 1108627* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Starting replication process* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] Number of files in latest index in master: 814* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.core.CachingDirectoryFactory] return new directory for /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] Starting download to NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe; maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true* So, it looks like the number of updates is too huge for the regular replication and then it goes into full copy of index. And since our index size is very huge (350G), this is causing the cluster to go into recovery mode forever - trying to copy that huge index. I also read in some thread http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthat there is a limit of 100 documents. I wonder if this has been updated to make that configurable since that thread. If not, the only option I see is to do a trickle delete of 100 documents per second or something. Also - the other suggestion of using distributed=false might not help because the issue currently is that the replication is going to full copy. Any thoughts? Thanks Vinay On 14 April 2014 07:54, Vinay Pothnis poth...@gmail.com wrote: Yes, that is our approach. We did try deleting a day's worth of data at a time, and that resulted in OOM as well. Thanks Vinay On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I mean you can divide the range (i.e. one week at each delete instead of one month) and try to check whether you still get an OOM or not. Thanks; Furkan KAMACI 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com: Aman, Yes - Will do! Furkan, How do you mean by 'bulk delete'? -Thanks Vinay On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Do you get any problems when you index
Re: Tipping point of solr shards (Num of docs / size)
You could look at this link to understand about the factors that affect the solrcloud performance: http://wiki.apache.org/solr/SolrPerformanceProblems Especially, the sections about RAM and disk cache. If the index grows too big for one node, it can lead to performance issues. From the looks of it, 500mil docs per shard - may be already pushing it. How much does that translate to in terms of index size on disk per shard? -vinay On 15 April 2014 21:44, Mukesh Jha me.mukesh@gmail.com wrote: Hi Gurus, In my solr cluster I've multiple shards and each shard containing ~500,000,000 documents total index size being ~1 TB. I was just wondering how much more can I keep on adding to the shard before we reach a tipping point and the performance starts to degrade? Also as best practice what is the recomended no of docs / size of shards . Txz in advance :) -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*
Re: deleting large amount data from solr cloud
Yes, that is our approach. We did try deleting a day's worth of data at a time, and that resulted in OOM as well. Thanks Vinay On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I mean you can divide the range (i.e. one week at each delete instead of one month) and try to check whether you still get an OOM or not. Thanks; Furkan KAMACI 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com: Aman, Yes - Will do! Furkan, How do you mean by 'bulk delete'? -Thanks Vinay On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Do you get any problems when you index your data? On the other hand deleting as bulks and reducing the size of documents may help you not to hit OOM. Thanks; Furkan KAMACI 2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com: Vinay please share your experience after trying this solution. On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com wrote: The query is something like this: *curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1 OR val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO 138516480]/query/delete' 'http://host:port/solr/coll-name1/update?commit=true'* Trying to restrict the number of documents deleted via the date parameter. Had not tried the distrib=false option. I could give that a try. Thanks for the link! I will check on the cache sizes and autowarm values. Will try and disable the caches when I am deleting and give that a try. Thanks Erick and Shawn for your inputs! -Vinay On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote: On 4/10/2014 7:25 PM, Vinay Pothnis wrote: When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. I'm curious about the actual query being used here. Can you share it, or a redacted version of it? Perhaps there might be a clue there? Is this a fully distributed delete request? One thing you might try, assuming Solr even supports it, is sending the same delete request directly to each shard core with distrib=false. Here's a very incomplete list about how you can reduce Solr heap requirements: http://wiki.apache.org/solr/SolrPerformanceProblems# Reducing_heap_requirements Thanks, Shawn -- With Regards Aman Tandon
Re: deleting large amount data from solr cloud
Some update: I removed the auto warm configurations for the various caches and reduced the cache sizes. I then issued a call to delete a day's worth of data (800K documents). There was no out of memory this time - but some of the nodes went into recovery mode. Was able to catch some logs this time around and this is what i see: *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr http://host1:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] PeerSync Recovery was not successful - trying replication. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy] Starting Replication Recovery. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy] Begin buffering updates. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy] Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/ http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2* *INFO [2014-04-14 18:11:00.536] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *INFO [2014-04-14 18:11:01.964] [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http client, config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1* *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] No value set for 'pollInterval'. Timer Task not started.* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Master's generation: 1108645* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Slave's generation: 1108627* *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] Starting replication process* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] Number of files in latest index in master: 814* *INFO [2014-04-14 18:11:02.007] [org.apache.solr.core.CachingDirectoryFactory] return new directory for /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] Starting download to NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe; maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true* So, it looks like the number of updates is too huge for the regular replication and then it goes into full copy of index. And since our index size is very huge (350G), this is causing the cluster to go into recovery mode forever - trying to copy that huge index. I also read in some thread http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthat there is a limit of 100 documents. I wonder if this has been updated to make that configurable since that thread. If not, the only option I see is to do a trickle delete of 100 documents per second or something. Also - the other suggestion of using distributed=false might not help because the issue currently is that the replication is going to full copy. Any thoughts? Thanks Vinay On 14 April 2014 07:54, Vinay Pothnis poth...@gmail.com wrote: Yes, that is our approach. We did try deleting a day's worth of data at a time, and that resulted in OOM as well. Thanks Vinay On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I mean you can divide the range (i.e. one week at each delete instead of one month) and try to check whether you still get an OOM or not. Thanks; Furkan KAMACI 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com: Aman, Yes - Will do! Furkan, How do you mean by 'bulk delete'? -Thanks Vinay On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Do you get any problems when you index your data? On the other hand deleting as bulks and reducing the size of documents may help you not to hit OOM. Thanks; Furkan KAMACI 2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com: Vinay please share your experience after trying this solution. On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com wrote: The query is something like this: *curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1 OR val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO 138516480]/query/delete' 'http://host:port/solr/coll-name1/update?commit=true'* Trying to restrict the number of documents deleted via the date parameter
Re: deleting large amount data from solr cloud
Aman, Yes - Will do! Furkan, How do you mean by 'bulk delete'? -Thanks Vinay On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Do you get any problems when you index your data? On the other hand deleting as bulks and reducing the size of documents may help you not to hit OOM. Thanks; Furkan KAMACI 2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com: Vinay please share your experience after trying this solution. On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com wrote: The query is something like this: *curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1 OR val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO 138516480]/query/delete' 'http://host:port/solr/coll-name1/update?commit=true'* Trying to restrict the number of documents deleted via the date parameter. Had not tried the distrib=false option. I could give that a try. Thanks for the link! I will check on the cache sizes and autowarm values. Will try and disable the caches when I am deleting and give that a try. Thanks Erick and Shawn for your inputs! -Vinay On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote: On 4/10/2014 7:25 PM, Vinay Pothnis wrote: When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. I'm curious about the actual query being used here. Can you share it, or a redacted version of it? Perhaps there might be a clue there? Is this a fully distributed delete request? One thing you might try, assuming Solr even supports it, is sending the same delete request directly to each shard core with distrib=false. Here's a very incomplete list about how you can reduce Solr heap requirements: http://wiki.apache.org/solr/SolrPerformanceProblems# Reducing_heap_requirements Thanks, Shawn -- With Regards Aman Tandon
Re: deleting large amount data from solr cloud
Sorry - yes, I meant to say leader. Each JVM has 16G of memory. On 10 April 2014 20:54, Erick Erickson erickerick...@gmail.com wrote: First, there is no master node, just leaders and replicas. But that's a nit. No real clue why you would be going out of memory. Deleting a document, even by query should just mark the docs as deleted, a pretty low-cost operation. how much memory are you giving the JVM? Best, Erick On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com wrote: [solr version 4.3.1] Hello, I have a solr cloud (4 nodes - 2 shards) with a fairly large amount documents (~360G of index per shard). Now, a major portion of the data is not required and I need to delete those documents. I would need to delete around 75% of the data. One of the solutions could be to drop the index completely re-index. But this is not an option at the moment. When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. Thank! Vinay
Re: deleting large amount data from solr cloud
Tried to increase the memory to 24G but that wasn't enough as well. Agree that the index has now grown too much and had to monitor this and take action much earlier. The search operations seem to run ok with 16G - mainly because the bulk of the data that we are trying to delete is not getting searched. So, now - basically in a salvage mode. Does the number of documents deleted at a time have any impact? If I 'trickle delete' - say 50K documents at a time - would that make a difference? When i delete, does solr try to bring in all the index to memory? Trying to understand what happens under the hood. Thanks Vinay On 11 April 2014 13:53, Erick Erickson erickerick...@gmail.com wrote: Using 16G for a 360G index is probably pushing things. A lot. I'm actually a bit surprised that the problem only occurs when you delete docs The simplest thing would be to increase the JVM memory. You should be looking at your index to see how big it is, be sure to subtract out the *.fdt and *.fdx files, those are used for verbatim copies of the raw data and don't really count towards the memory requirements. I suspect you're just not giving enough memory to your JVM and this is just the first OOM you've hit. Look on the Solr admin page and see how much is being reported, if it's near the limit of your 16G that's the smoking gun... Best, Erick On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis poth...@gmail.com wrote: Sorry - yes, I meant to say leader. Each JVM has 16G of memory. On 10 April 2014 20:54, Erick Erickson erickerick...@gmail.com wrote: First, there is no master node, just leaders and replicas. But that's a nit. No real clue why you would be going out of memory. Deleting a document, even by query should just mark the docs as deleted, a pretty low-cost operation. how much memory are you giving the JVM? Best, Erick On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com wrote: [solr version 4.3.1] Hello, I have a solr cloud (4 nodes - 2 shards) with a fairly large amount documents (~360G of index per shard). Now, a major portion of the data is not required and I need to delete those documents. I would need to delete around 75% of the data. One of the solutions could be to drop the index completely re-index. But this is not an option at the moment. When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. Thank! Vinay
Re: deleting large amount data from solr cloud
The query is something like this: *curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1 OR val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO 138516480]/query/delete' 'http://host:port/solr/coll-name1/update?commit=true'* Trying to restrict the number of documents deleted via the date parameter. Had not tried the distrib=false option. I could give that a try. Thanks for the link! I will check on the cache sizes and autowarm values. Will try and disable the caches when I am deleting and give that a try. Thanks Erick and Shawn for your inputs! -Vinay On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote: On 4/10/2014 7:25 PM, Vinay Pothnis wrote: When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. I'm curious about the actual query being used here. Can you share it, or a redacted version of it? Perhaps there might be a clue there? Is this a fully distributed delete request? One thing you might try, assuming Solr even supports it, is sending the same delete request directly to each shard core with distrib=false. Here's a very incomplete list about how you can reduce Solr heap requirements: http://wiki.apache.org/solr/SolrPerformanceProblems# Reducing_heap_requirements Thanks, Shawn
deleting large amount data from solr cloud
[solr version 4.3.1] Hello, I have a solr cloud (4 nodes - 2 shards) with a fairly large amount documents (~360G of index per shard). Now, a major portion of the data is not required and I need to delete those documents. I would need to delete around 75% of the data. One of the solutions could be to drop the index completely re-index. But this is not an option at the moment. When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. Thank! Vinay
Re: Solr + SPDY
Hi Otis, While the main goal of SPDY is to reduce page load times - i think we could benefit from it in Solr context as well. The transport layer is still TCP - but SPDY allows multiplexing of requests. It also uses compression and reduces the overhead of http headers. An excerpt from http://webtide.intalio.com/2012/03/spdy-support-in-jetty/ SPDY reduces roundtrips with the server, reduces the HTTP verboseness by compressing HTTP headers, improves the utilization of the TCP connection, multiplexes requests into a single TCP connection (instead of using a limited number of connections, each serving only one request), 1. For users who are using http client to communicate with Solr, for sending updates or for searching, they could benefit from SPDY optimizations. They could make use of Jetty Http Client and set up Solr on Jetty to enable communication over SPDY. 2. As far as SolrCloud internode communication is concerned - I am not very sure as to hw beneficial it would be. I brought his up because, in the Solr Cloud context, there's a lot of inter-node chatter happening to facilitate distributed search/distributed indexing. So - I was wondering if anyone else gave a thought about this. Cheers Vinay Some references: http://www.chromium.org/spdy/spdy-whitepaper http://webtide.intalio.com/2012/03/spdy-support-in-jetty/ http://www.eclipse.org/jetty/documentation/current/spdy.html On Fri, Oct 25, 2013 at 12:22 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I'm rusty on SPDY. Can you summarize the benefits in Solr context? Thanks. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 25, 2013 10:46 AM, Vinay Pothnis poth...@gmail.com wrote: Hello, Couple of questions related to using SPDY with solr. 1. Does anybody have experience running Solr on Jetty 9 with SPDY support - and using Jetty Client (SPDY capable client) to talk to Solr over SPDY? 2. This is related to Solr - Cloud - inter node communication. This might not be a user-list question - nonetheless, I was wondering if there would be some way to enable the use of SPDY for inter-node communication in a Solr Cloud set up. Is this something that the solr team might look at? Thanks Vinay
Solr + SPDY
Hello, Couple of questions related to using SPDY with solr. 1. Does anybody have experience running Solr on Jetty 9 with SPDY support - and using Jetty Client (SPDY capable client) to talk to Solr over SPDY? 2. This is related to Solr - Cloud - inter node communication. This might not be a user-list question - nonetheless, I was wondering if there would be some way to enable the use of SPDY for inter-node communication in a Solr Cloud set up. Is this something that the solr team might look at? Thanks Vinay
Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads
Thank you Erick! Will look at all these suggestions. -Vinay On Wed, Jun 26, 2013 at 6:37 AM, Erick Erickson erickerick...@gmail.comwrote: Right, unfortunately this is a gremlin lurking in the weeds, see: http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock There are a couple of ways to deal with this: 1 go ahead and up the limit and re-compile, if you look at SolrCmdDistributor the semaphore is defined there. 2 https://issues.apache.org/jira/browse/SOLR-4816 should address this as well as improve indexing throughput. I'm totally sure Joel (the guy working on this) would be thrilled if you were able to verify that these two points, I'd ask him (on the JIRA) whether he thinks it's ready to test. 3 Reduce the number of threads you're indexing with 4 index docs in small packets, perhaps even one and just rack together a zillion threads to get throughput. FWIW, Erick On Tue, Jun 25, 2013 at 8:55 AM, Vinay Pothnis poth...@gmail.com wrote: Jason and Scott, Thanks for the replies and pointers! Yes, I will consider the 'maxDocs' value as well. How do i monitor the transaction logs during the interval between commits? Thanks Vinay On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Scott, My comment was meant to be a bit tongue-in-cheek, but my intent in the statement was to represent hard failure along the lines Vinay is seeing. We're talking about OutOfMemoryException conditions, total cluster paralysis requiring restart, or other similar and disastrous conditions. Where that line is is impossible to generically define, but trivial to accomplish. What any of us running Solr has to achieve is a realistic simulation of our desired production load (probably well above peak) and to see what limits are reached. Armed with that information we tweak. In this case, we look at finding the point where data ingestion reaches a natural limit. For some that may be JVM GC, for others memory buffer size on the client load, and yet others it may be I/O limits on multithreaded reads from a database or file system. In old Solr days we had a little less to worry about. We might play with a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial commits and rollback recoveries. But with 4.x we now have more durable write options and NRT to consider, and SolrCloud begs to use this. So we have to consider transaction logs, the file handles they leave open until commit operations occur, and how we want to manage writing to all cores simultaneously instead of a more narrow master/slave relationship. It's all manageable, all predictable (with some load testing) and all filled with many possibilities to meet our specific needs. Considering hat each person's data model, ingestion pipeline, request processors, and field analysis steps will be different, 5 threads of input at face value doesn't really contemplate the whole problem. We have to measure our actual data against our expectations and find where the weak chain links are to strengthen them. The symptoms aren't necessarily predictable in advance of this testing, but they're likely addressable and not difficult to decipher. For what it's worth, SolrCloud is new enough that we're still experiencing some uncharted territory with unknown ramifications but with continued dialog through channels like these there are fewer territories without good cartography :) Hope that's of use! Jason On Jun 24, 2013, at 7:12 PM, Scott Lundgren scott.lundg...@carbonblack.com wrote: Jason, Regarding your statement push you over the edge- what does that mean? Does it mean uncharted territory with unknown ramifications or something more like specific, known symptoms? I ask because our use is similar to Vinay's in some respects, and we want to be able to push the capabilities of write perf - but not over the edge! In particular, I am interested in knowing the symptoms of failure, to help us troubleshoot the underlying problems if and when they arise. Thanks, Scott On Monday, June 24, 2013, Jason Hellman wrote: Vinay, You may wish to pay attention to how many transaction logs are being created along the way to your hard autoCommit, which should truncate the open handles for those files. I might suggest setting a maxDocs value in parallel with your maxTime value (you can use both) to ensure the commit occurs at either breakpoint. 30 seconds is plenty of time for 5 parallel processes of 20 document submissions to push you over the edge. Jason On Jun 24, 2013, at 2:21 PM, Vinay Pothnis poth...@gmail.com wrote: I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds. On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman jhell...@innoventsolutions.com
Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads
Jason and Scott, Thanks for the replies and pointers! Yes, I will consider the 'maxDocs' value as well. How do i monitor the transaction logs during the interval between commits? Thanks Vinay On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Scott, My comment was meant to be a bit tongue-in-cheek, but my intent in the statement was to represent hard failure along the lines Vinay is seeing. We're talking about OutOfMemoryException conditions, total cluster paralysis requiring restart, or other similar and disastrous conditions. Where that line is is impossible to generically define, but trivial to accomplish. What any of us running Solr has to achieve is a realistic simulation of our desired production load (probably well above peak) and to see what limits are reached. Armed with that information we tweak. In this case, we look at finding the point where data ingestion reaches a natural limit. For some that may be JVM GC, for others memory buffer size on the client load, and yet others it may be I/O limits on multithreaded reads from a database or file system. In old Solr days we had a little less to worry about. We might play with a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial commits and rollback recoveries. But with 4.x we now have more durable write options and NRT to consider, and SolrCloud begs to use this. So we have to consider transaction logs, the file handles they leave open until commit operations occur, and how we want to manage writing to all cores simultaneously instead of a more narrow master/slave relationship. It's all manageable, all predictable (with some load testing) and all filled with many possibilities to meet our specific needs. Considering hat each person's data model, ingestion pipeline, request processors, and field analysis steps will be different, 5 threads of input at face value doesn't really contemplate the whole problem. We have to measure our actual data against our expectations and find where the weak chain links are to strengthen them. The symptoms aren't necessarily predictable in advance of this testing, but they're likely addressable and not difficult to decipher. For what it's worth, SolrCloud is new enough that we're still experiencing some uncharted territory with unknown ramifications but with continued dialog through channels like these there are fewer territories without good cartography :) Hope that's of use! Jason On Jun 24, 2013, at 7:12 PM, Scott Lundgren scott.lundg...@carbonblack.com wrote: Jason, Regarding your statement push you over the edge- what does that mean? Does it mean uncharted territory with unknown ramifications or something more like specific, known symptoms? I ask because our use is similar to Vinay's in some respects, and we want to be able to push the capabilities of write perf - but not over the edge! In particular, I am interested in knowing the symptoms of failure, to help us troubleshoot the underlying problems if and when they arise. Thanks, Scott On Monday, June 24, 2013, Jason Hellman wrote: Vinay, You may wish to pay attention to how many transaction logs are being created along the way to your hard autoCommit, which should truncate the open handles for those files. I might suggest setting a maxDocs value in parallel with your maxTime value (you can use both) to ensure the commit occurs at either breakpoint. 30 seconds is plenty of time for 5 parallel processes of 20 document submissions to push you over the edge. Jason On Jun 24, 2013, at 2:21 PM, Vinay Pothnis poth...@gmail.com wrote: I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds. On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Vinay, What autoCommit settings do you have for your indexing process? Jason On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote: Here is the ulimit -a output: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 179963 max locked memory(kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32769 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time(seconds, -t) unlimited max user processes (-u) 14 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro yago.rive...@gmail.com wrote: Hi, I have the same issue too, and the deploy is quasi exact like than mine, http://lucene.472066.n3
[solr cloud] solr hangs when indexing large number of documents from multiple threads
Hello All, I have the following set up of solr cloud. * solr version 4.3.1 * 3 node solr cloud + replciation factor 2 * 3 zoo keepers * load balancer in front of the 3 solr nodes I am seeing this strange behavior when I am indexing a large number of documents (10 mil). When I have more than 3-5 threads sending documents (in batch of 20) to solr, sometimes solr goes into a hung state. After this all the update requests get timed out. What we see via AppDynamics (a performance monitoring tool) is that there are a number of threads that are stalled. The stack trace for one of the threads is shown below. The cluster has to be restarted to recover from this. When I reduce the concurrency to 1, 2, 3 threads, then the indexing goes through smoothly. Any pointers as to what could be wrong here? We send the updates to one of the nodes in the solr cloud through a load balancer. Thanks Vinay Thread Name:qtp2141131052-78 ID:78 Time:Fri Jun 21 23:20:22 GMT 2013 State:WAITING Priority:5 sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks. LockSupport.park(LockSupport.java:186) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) java.util.concurrent.Semaphore.acquire(Semaphore.java:317) org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) org.eclipse.jetty.server.Server.handle(Server.java:445) org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) java.lang.Thread.run(Thread.java:722
Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads
Here is the ulimit -a output: core file size (blocks, -c) 0 data seg size(kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 179963 max locked memory(kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32769 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time(seconds, -t) unlimited max user processes (-u) 14 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro yago.rive...@gmail.comwrote: Hi, I have the same issue too, and the deploy is quasi exact like than mine, http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 With some concurrence and batches of 10 solr apparently have some deadlock distributing updates Can you dump the configuration of the ulimit on your servers?, some people had the same issues because they are reach the ulimit maximum defined for descriptor and process. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote: Hello All, I have the following set up of solr cloud. * solr version 4.3.1 * 3 node solr cloud + replciation factor 2 * 3 zoo keepers * load balancer in front of the 3 solr nodes I am seeing this strange behavior when I am indexing a large number of documents (10 mil). When I have more than 3-5 threads sending documents (in batch of 20) to solr, sometimes solr goes into a hung state. After this all the update requests get timed out. What we see via AppDynamics (a performance monitoring tool) is that there are a number of threads that are stalled. The stack trace for one of the threads is shown below. The cluster has to be restarted to recover from this. When I reduce the concurrency to 1, 2, 3 threads, then the indexing goes through smoothly. Any pointers as to what could be wrong here? We send the updates to one of the nodes in the solr cloud through a load balancer. Thanks Vinay Thread Name:qtp2141131052-78 ID:78 Time:Fri Jun 21 23:20:22 GMT 2013 State:WAITING Priority:5 sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks. LockSupport.park(LockSupport.java:186) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) java.util.concurrent.Semaphore.acquire(Semaphore.java:317) org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017
Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads
I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds. On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Vinay, What autoCommit settings do you have for your indexing process? Jason On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote: Here is the ulimit -a output: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 179963 max locked memory(kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32769 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time(seconds, -t) unlimited max user processes (-u) 14 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro yago.rive...@gmail.com wrote: Hi, I have the same issue too, and the deploy is quasi exact like than mine, http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 With some concurrence and batches of 10 solr apparently have some deadlock distributing updates Can you dump the configuration of the ulimit on your servers?, some people had the same issues because they are reach the ulimit maximum defined for descriptor and process. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote: Hello All, I have the following set up of solr cloud. * solr version 4.3.1 * 3 node solr cloud + replciation factor 2 * 3 zoo keepers * load balancer in front of the 3 solr nodes I am seeing this strange behavior when I am indexing a large number of documents (10 mil). When I have more than 3-5 threads sending documents (in batch of 20) to solr, sometimes solr goes into a hung state. After this all the update requests get timed out. What we see via AppDynamics (a performance monitoring tool) is that there are a number of threads that are stalled. The stack trace for one of the threads is shown below. The cluster has to be restarted to recover from this. When I reduce the concurrency to 1, 2, 3 threads, then the indexing goes through smoothly. Any pointers as to what could be wrong here? We send the updates to one of the nodes in the solr cloud through a load balancer. Thanks Vinay Thread Name:qtp2141131052-78 ID:78 Time:Fri Jun 21 23:20:22 GMT 2013 State:WAITING Priority:5 sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks. LockSupport.park(LockSupport.java:186) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) java.util.concurrent.Semaphore.acquire(Semaphore.java:317) org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564
Re: [solr cloud 4.1] Issue with order in a batch of commands
Thanks for the reply. In my case, the order is definitely critical. It would be great if this can be fixed. And yes, even SolrJ deals with deletes first and then the add/updates. And that was the reason why I switched from SolrJ to plain http. There is a ticket with SolrJ as well https://issues.apache.org/jira/browse/SOLR-1162. Looks like it had some traction and then dropped off. I can work around this for the moment, but would definitely be great if this can be fixed. Do you want me to create a JIRA with Solr or would you be doing that? Thanks Vinay On Wed, Feb 20, 2013 at 6:40 AM, Mark Miller markrmil...@gmail.com wrote: It's because of how we currently handle batched requests - we buffer a different number of deletes thqn we do adds and flush them separately - mainly because the size of each is likely to be so different, at one point we would buffer a lot more deletes. So currently, you want to break these up to multiple requests if order is critical. Might make a JIRA issue to look at this again - I think at this point we are buffering the same number of each any way - we should probably just treat them the same, and put them in the same buffer in the right order. Even then though, I don't think SolrJ update requests order deletes and adds in the same request either, so that would also need to be addressed. Pretty sure solrj will do the adds then the deletes. - Mark On Feb 19, 2013, at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com wrote: Hello, I have the following set up: * solr cloud 4.1.0 * 2 shards with embedded zookeeper * plain http to communicate with solr I am testing a scenario where i am batching multiple commands and sending to solr. Since this is the solr cloud setup, I am always sending the updates to one of the nodes in the cloud. e.g.: http://localhost:8983/solr/sample/update *example set of commands:* {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} When I include deletes and updates in the batch, sometimes, the order of the commands is not maintained. Specifically, if the document does not belong to the shard that I am communicating with (lets say shard-1), then shard-1 sends the commands to shard-2. In this case, the deletes are sent first and then the updates. This changes the order that I originally sent. Any inputs on why the order is not maintained? Thanks! Vinay
[solr cloud 4.1] Issue with order in a batch of commands
Hello, I have the following set up: * solr cloud 4.1.0 * 2 shards with embedded zookeeper * plain http to communicate with solr I am testing a scenario where i am batching multiple commands and sending to solr. Since this is the solr cloud setup, I am always sending the updates to one of the nodes in the cloud. e.g.: http://localhost:8983/solr/sample/update *example set of commands:* {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} When I include deletes and updates in the batch, sometimes, the order of the commands is not maintained. Specifically, if the document does not belong to the shard that I am communicating with (lets say shard-1), then shard-1 sends the commands to shard-2. In this case, the deletes are sent first and then the updates. This changes the order that I originally sent. Any inputs on why the order is not maintained? Thanks! Vinay
Re: [solr cloud 4.1] Issue with order in a batch of commands
Thanks for the reply Eric. * I am not using SolrJ * I am using plain http (apache http client) to send a batch of commands. * As I mentioned below, the json payload I am sending is like this (some of the fields have been removed for brevity) * POST http://localhost:8983/solr/sample/update * POST BODY {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} The evidence is from the logs on the 2 shards. The following is the log on shard 1: *INFO: [sample] webapp=/solr path=/update params={} {add=[e.1.80, e.1.80, e.1.80, e.1.80, e.1.80, e.1.80, e.1.80],delete=[e.1.80]} 0 48* * * The following is the log on shard 2: *INFO: [sample] webapp=/solr path=/update params={update.distrib=TOLEADERwt=javabinversion=2} {delete=[e.1.80 (-1427453640312881152)]} 0 2* *Feb 19, 2013 6:04:34 PM org.apache.solr.update.processor.LogUpdateProcessor finish* *INFO: [sample] webapp=/solr path=/update params={distrib.from= http://10.10.76.23:8983/solr/ch-madden/update.distrib=TOLEADERwt=javabinversion=2} {add=[e.1* *.80 (1427453640314978304), e.1.80 (1427453640338046976), e.1.80 (1427453640342241280), e.1.80 (1427453640346435584), e.1.80 (1427453640349581312), e.1.80 (14274* *53640351678464), e.1.80 (1427453640353775616)]} 0 41* As you can see, shard 2 gets the delete command first and then the add/update commands. I am sure I have waited until the commit happens. And besides, I am also using the softAutoCommit at 1 second. So, the query results should be updated quite quickly. Any pointers would be very helpful. Thanks! Vinay On Tue, Feb 19, 2013 at 5:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, this would surprise me unless the add and delete were going to separate machines. how are you sending them? SolrJ? and in a single server.add(doclist) format or with individual adds? Individual commands being sent can come 'round out of sequence, that's what the whole optimistic locking bit is about. I guess my other question is what's your evidence that this isn't working? Are you just querying your index and looking at the results? If so, are you sure you're waiting until after any autocommit intervals? Best Erick On Tue, Feb 19, 2013 at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com wrote: Hello, I have the following set up: * solr cloud 4.1.0 * 2 shards with embedded zookeeper * plain http to communicate with solr I am testing a scenario where i am batching multiple commands and sending to solr. Since this is the solr cloud setup, I am always sending the updates to one of the nodes in the cloud. e.g.: http://localhost:8983/solr/sample/update *example set of commands:* {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} When I include deletes and updates in the batch, sometimes, the order of the commands is not maintained. Specifically, if the document does not belong to the shard that I am communicating with (lets say shard-1), then shard-1 sends the commands to shard-2. In this case, the deletes are sent first and then the updates. This changes the order that I originally sent. Any inputs on why the order is not maintained? Thanks! Vinay
Re: [solr cloud 4.1] Issue with order in a batch of commands
Also, I was referring to this wiki page: http://wiki.apache.org/solr/UpdateJSON#Update_Commands Thanks Vinay On Tue, Feb 19, 2013 at 6:12 PM, Vinay Pothnis vinay.poth...@gmail.comwrote: Thanks for the reply Eric. * I am not using SolrJ * I am using plain http (apache http client) to send a batch of commands. * As I mentioned below, the json payload I am sending is like this (some of the fields have been removed for brevity) * POST http://localhost:8983/solr/sample/update * POST BODY {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} The evidence is from the logs on the 2 shards. The following is the log on shard 1: *INFO: [sample] webapp=/solr path=/update params={} {add=[e.1.80, e.1.80, e.1.80, e.1.80, e.1.80, e.1.80, e.1.80],delete=[e.1.80]} 0 48* * * The following is the log on shard 2: *INFO: [sample] webapp=/solr path=/update params={update.distrib=TOLEADERwt=javabinversion=2} {delete=[e.1.80 (-1427453640312881152)]} 0 2* *Feb 19, 2013 6:04:34 PM org.apache.solr.update.processor.LogUpdateProcessor finish* *INFO: [sample] webapp=/solr path=/update params={distrib.from= http://10.10.76.23:8983/solr/ch-madden/update.distrib=TOLEADERwt=javabinversion=2} {add=[e.1* *.80 (1427453640314978304), e.1.80 (1427453640338046976), e.1.80 (1427453640342241280), e.1.80 (1427453640346435584), e.1.80 (1427453640349581312), e.1.80 (14274* *53640351678464), e.1.80 (1427453640353775616)]} 0 41* As you can see, shard 2 gets the delete command first and then the add/update commands. I am sure I have waited until the commit happens. And besides, I am also using the softAutoCommit at 1 second. So, the query results should be updated quite quickly. Any pointers would be very helpful. Thanks! Vinay On Tue, Feb 19, 2013 at 5:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, this would surprise me unless the add and delete were going to separate machines. how are you sending them? SolrJ? and in a single server.add(doclist) format or with individual adds? Individual commands being sent can come 'round out of sequence, that's what the whole optimistic locking bit is about. I guess my other question is what's your evidence that this isn't working? Are you just querying your index and looking at the results? If so, are you sure you're waiting until after any autocommit intervals? Best Erick On Tue, Feb 19, 2013 at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com wrote: Hello, I have the following set up: * solr cloud 4.1.0 * 2 shards with embedded zookeeper * plain http to communicate with solr I am testing a scenario where i am batching multiple commands and sending to solr. Since this is the solr cloud setup, I am always sending the updates to one of the nodes in the cloud. e.g.: http://localhost:8983/solr/sample/update *example set of commands:* {add: {doc: {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78} },add: {doc: {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78} },delete: { id: e.1.78 }} When I include deletes and updates in the batch, sometimes, the order of the commands is not maintained. Specifically, if the document does not belong to the shard that I am communicating with (lets say shard-1), then shard-1 sends the commands to shard-2. In this case, the deletes are sent first and then the updates. This changes the order that I originally sent. Any inputs on why the order is not maintained? Thanks! Vinay