Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Shawn,

Thanks for your inputs. The 12GB is for solr.
I did read through your wiki and your G1 related recommended settings are
already included. Tried a lower memory config (7G) as well and it did not
result in any better results.

Right now, in the process of changing the updates to use Solrj
CloudSolrServer and testing it.

Thanks
Vinay

On 4 May 2015 at 16:09, Shawn Heisey apa...@elyograg.org wrote:

 On 5/4/2015 2:36 PM, Vinay Pothnis wrote:
  But nonetheless, we will give the latest solrJ client + cloudSolrServer a
  try.
 
  * Yes, the documents are pretty small.
  * We are using G1 collector and there are no major GCs, but however,
 there
  are a lot of minor GCs sometimes going upto 2s per minute overall.
  * We are allocating 12G of memory.
  * Query rate: 3750 TPS (transactions per second)
  * I need to get the exact rate for insert/updates.
 
  I will make the solrJ client change first and give it a test.

 Whether that 12GB heap size is for Solr itself or for your client code,
 with a heap that large, you should be doing more tuning than simply
 turning on G1GC.  I have spent quite a lot of time working on GC tuning
 for Solr, and the results of that work can be found here:

 http://wiki.apache.org/solr/ShawnHeisey

 I cannot claim that these are the best options you can find for Solr,
 but they've worked well for me, and for others.

 Thanks,
 Shawn




Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Erick,

Thanks for your inputs.

I think long before we had made a conscious decision to skip solrJ client
and use plain http. I think it might have been because at the time solrJ
client was queueing update in its memory or something.

But nonetheless, we will give the latest solrJ client + cloudSolrServer a
try.

* Yes, the documents are pretty small.
* We are using G1 collector and there are no major GCs, but however, there
are a lot of minor GCs sometimes going upto 2s per minute overall.
* We are allocating 12G of memory.
* Query rate: 3750 TPS (transactions per second)
* I need to get the exact rate for insert/updates.

I will make the solrJ client change first and give it a test.

Thanks
Vinay

On 3 May 2015 at 09:37, Erick Erickson erickerick...@gmail.com wrote:

 First, you shouldn't be using HttpSolrClient, use CloudSolrServer
 (CloudSolrClient in 5.x). That takes
 the ZK address and routes the docs to the leader, reducing the network
 hops docs have to go
 through. AFAIK, in cloud setups it is in every way superior to http.

 I'm guessing your docs aren't huge. You haven't really told us what
 high indexing rates and
 high query rates are in your environment, so it's hard to say much.
 For comparison I get
 2-3K docs/sec on my laptop (no query load though).

 The most frequent problem for nodes going into recovery in this
 scenario is the ZK timeout
 being exceeded. This is often triggered by excessive GC pauses, some
 more details would
 help here:

 How much memory are you allocating to Solr? Have you turned on GC
 logging to see whether
 you're getting stop the world GC pauses? What rates _are_ you seeing?

 Personally, I'd concentrate on the nodes going into recovery before
 anything else. Until that's
 fixed any other things you do will not be predictive of much.

 BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes
 that's too big, sometimes
 too small but it seems pretty reasonable most of the time.

 Best,
 Erick

 On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis poth...@gmail.com wrote:
  Hello,
 
  I have a usecase with the following characteristics:
 
   - High index update rate (adds/updates)
   - High query rate
   - Low index size (~800MB for 2.4Million docs)
   - The documents that are created at the high rate eventually expire
 and
  are deleted regularly at half hour intervals
 
  I currently have a solr cloud set up with 1 shard and 4 replicas.
   * My index updates are sent to a VIP/loadbalancer (round robins to one
 of
  the 4 solr nodes)
   * I am using http client to send the updates
   * Using batch size of 100 and 8 to 10 threads sending the batch of
 updates
  to solr.
 
  When I try to run tests to scale out the indexing rate, I see the
 following:
   * solr nodes go into recovery
   * updates are taking really long to complete.
 
  As I understand, when a node receives an update:
   * If it is the leader, it forwards the update to all the replicas and
  waits until it receives the reply from all of them before replying back
 to
  the client that sent the reply.
   * If it is not the leader, it forwards the update to the leader, which
  THEN does the above steps mentioned.
 
  How do I go about scaling the index updates:
   * As I add more replicas, my updates would get slower and slower?
   * Is there a way I can configure the leader to wait for say N out of M
  replicas only?
   * Should I be targeting the updates to only the leader?
   * Any other approach i should be considering?
 
  Thanks
  Vinay



Optimal configuration for high throughput indexing

2015-04-30 Thread Vinay Pothnis
Hello,

I have a usecase with the following characteristics:

 - High index update rate (adds/updates)
 - High query rate
 - Low index size (~800MB for 2.4Million docs)
 - The documents that are created at the high rate eventually expire and
are deleted regularly at half hour intervals

I currently have a solr cloud set up with 1 shard and 4 replicas.
 * My index updates are sent to a VIP/loadbalancer (round robins to one of
the 4 solr nodes)
 * I am using http client to send the updates
 * Using batch size of 100 and 8 to 10 threads sending the batch of updates
to solr.

When I try to run tests to scale out the indexing rate, I see the following:
 * solr nodes go into recovery
 * updates are taking really long to complete.

As I understand, when a node receives an update:
 * If it is the leader, it forwards the update to all the replicas and
waits until it receives the reply from all of them before replying back to
the client that sent the reply.
 * If it is not the leader, it forwards the update to the leader, which
THEN does the above steps mentioned.

How do I go about scaling the index updates:
 * As I add more replicas, my updates would get slower and slower?
 * Is there a way I can configure the leader to wait for say N out of M
replicas only?
 * Should I be targeting the updates to only the leader?
 * Any other approach i should be considering?

Thanks
Vinay


clarification on index-to-ram ratio

2014-06-19 Thread Vinay Pothnis
Hello All,

The documentation and general feedback on the mailing list suggest the
following:

*... Let's say that you have a Solr index size of 8GB. If your OS, Solr's
Java heap, and all other running programs require 4GB of memory, then
an ideal memory size for that server is at least 12GB ...*

http://wiki.apache.org/solr/SolrPerformanceProblems#General_information

So, when we say index size does it include ALL the replicas or just one
of the replica? Say for example, if the solr instance had 2 replicas each
of size 8GB, should we consider 16GB as our index size or just 8GB - for
the above index-ram-ratio consideration?

Thanks
Vinay


Re: clarification on index-to-ram ratio

2014-06-19 Thread Vinay Pothnis
Thanks!
And yes, the replica belongs to a different shard - not the same data.

-Vinay


On 19 June 2014 11:21, Toke Eskildsen t...@statsbiblioteket.dk wrote:

 Vinay Pothnis [poth...@gmail.com] wrote:
  *... Let's say that you have a Solr index size of 8GB. If your OS,
 Solr's
  Java heap, and all other running programs require 4GB of memory, then
  an ideal memory size for that server is at least 12GB ...*

  So, when we say index size does it include ALL the replicas or just one
  of the replica? Say for example, if the solr instance had 2 replicas each
  of size 8GB, should we consider 16GB as our index size or just 8GB - for
  the above index-ram-ratio consideration?

 16GB, according to the above principle. Enough RAM to hold all index data
 on storage.

 Two things though,

 1) If you have replicas of the same data on the same machine, I hope that
 you have them on separate physical drives. If not, it is just wasted disk
 cache with no benefits.

 2) The general advice is only really usable when we're either talking
 fairly small indexes on spinning drives or there is a strong need for the
 absolute lowest latency possible. As soon as we scale up and do not have
 copious amounts of money, solid state drives provides much better bang for
 the buck than a spinning drives + RAM combination.

 - Toke Eskildsen



Re: deleting large amount data from solr cloud

2014-04-17 Thread Vinay Pothnis
Thanks a lot Shalin!


On 16 April 2014 21:26, Shalin Shekhar Mangar shalinman...@gmail.comwrote:

 You can specify maxSegments parameter e.g. maxSegments=5 while optimizing.


 On Thu, Apr 17, 2014 at 6:46 AM, Vinay Pothnis poth...@gmail.com wrote:

  Hello,
 
  Couple of follow up questions:
 
  * When the optimize command is run, looks like it creates one big segment
  (forceMerge = 1). Will it get split at any point later? Or will that big
  segment remain?
 
  * Is there anyway to maintain the number of segments - but still merge to
  reclaim the deleted documents space? In other words, can I issue
  forceMerge=20? If so, how would the command look like? Any examples for
  this?
 
  Thanks
  Vinay
 
 
 
  On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote:
 
   Thank you Erick!
   Yes - I am using the expunge deletes option.
  
   Thanks for the note on disk space for the optimize command. I should
 have
   enough space for that. What about the heap space requirement? I hope it
  can
   do the optimize with the memory that is allocated to it.
  
   Thanks
   Vinay
  
  
   On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com
 wrote:
  
   The optimize should, indeed, reduce the index size. Be aware that it
   may consume 2x the disk space. You may also try expungedeletes, see
   here: https://wiki.apache.org/solr/UpdateXmlMessages
  
   Best,
   Erick
  
   On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com
   wrote:
Another update:
   
I removed the replicas - to avoid the replication doing a full
 copy. I
   am
able delete sizeable chunks of data.
But the overall index size remains the same even after the deletes.
 It
   does
not seem to go down.
   
I understand that Solr would do this in background - but I don't
 seem
  to
see the decrease in overall index size even after 1-2 hours.
I can see a bunch of .del files in the index directory, but the it
   does
not seem to get cleaned up. Is there anyway to monitor/follow the
   progress
of index compaction?
   
Also, does triggering optimize from the admin UI help to compact
 the
index size on disk?
   
Thanks
Vinay
   
   
On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote:
   
Some update:
   
I removed the auto warm configurations for the various caches and
   reduced
the cache sizes. I then issued a call to delete a day's worth of
 data
   (800K
documents).
   
There was no out of memory this time - but some of the nodes went
  into
recovery mode. Was able to catch some logs this time around and
 this
  is
what i see:
   

*WARN  [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync]
PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
http://host1:8983/solr too many updates received since start -
startingUpdates no longer overlaps with our currentUpdates*
*INFO  [2014-04-14 18:11:00.476]
   [org.apache.solr.cloud.RecoveryStrategy]
PeerSync Recovery was not successful - trying replication.
core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.476]
   [org.apache.solr.cloud.RecoveryStrategy]
Starting Replication Recovery. core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.535]
   [org.apache.solr.cloud.RecoveryStrategy]
Begin buffering updates. core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.536]
   [org.apache.solr.cloud.RecoveryStrategy]
Attempting to replicate from
   http://host2:8983/solr/core1_shard1_replica1/
http://host2:8983/solr/core1_shard1_replica1/.
   core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.536]
[org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new
 http
client,
   
  
  config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
*INFO  [2014-04-14 18:11:01.964]
[org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new
 http
client,
   
  
 
 config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1*
*INFO  [2014-04-14 18:11:01.969]
 [org.apache.solr.handler.SnapPuller]
No
value set for 'pollInterval'. Timer Task not started.*
*INFO  [2014-04-14 18:11:01.973]
 [org.apache.solr.handler.SnapPuller]
Master's generation: 1108645*
*INFO  [2014-04-14 18:11:01.973]
 [org.apache.solr.handler.SnapPuller]
Slave's generation: 1108627*
*INFO  [2014-04-14 18:11:01.973]
 [org.apache.solr.handler.SnapPuller]
Starting replication process*
*INFO  [2014-04-14 18:11:02.007]
 [org.apache.solr.handler.SnapPuller]
Number of files in latest index in master: 814*
*INFO  [2014-04-14 18:11:02.007]
[org.apache.solr.core.CachingDirectoryFactory] return new directory
  for
/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007*
*INFO  [2014-04-14 18:11:02.008]
 [org.apache.solr.handler.SnapPuller]
Starting download

Re: deleting large amount data from solr cloud

2014-04-17 Thread Vinay Pothnis
Thanks Erick!


On 17 April 2014 08:35, Erick Erickson erickerick...@gmail.com wrote:

 bq: Will it get split at any point later?

 Split is a little ambiguous here. Will it be copied into two or more
 segments? No. Will it disappear? Possibly. Eventually this segment
 will be merged if you add enough documents to the system. Consider
 this scenario:
 you add 1M docs to your system and it results in 10 segments (numbers
 made up). Then you optimize, and you have 1M docs in 1 segment. Fine
 so far.

 Now you add 750K of those docs over again, which will delete them from
 the 1 big segment. Your merge policy will, at some point, select this
 segment to merge and it'll disappear...

 FWIW,
 er...@pedantic.com

 On Thu, Apr 17, 2014 at 7:24 AM, Vinay Pothnis poth...@gmail.com wrote:
  Thanks a lot Shalin!
 
 
  On 16 April 2014 21:26, Shalin Shekhar Mangar shalinman...@gmail.com
 wrote:
 
  You can specify maxSegments parameter e.g. maxSegments=5 while
 optimizing.
 
 
  On Thu, Apr 17, 2014 at 6:46 AM, Vinay Pothnis poth...@gmail.com
 wrote:
 
   Hello,
  
   Couple of follow up questions:
  
   * When the optimize command is run, looks like it creates one big
 segment
   (forceMerge = 1). Will it get split at any point later? Or will that
 big
   segment remain?
  
   * Is there anyway to maintain the number of segments - but still
 merge to
   reclaim the deleted documents space? In other words, can I issue
   forceMerge=20? If so, how would the command look like? Any examples
 for
   this?
  
   Thanks
   Vinay
  
  
  
   On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote:
  
Thank you Erick!
Yes - I am using the expunge deletes option.
   
Thanks for the note on disk space for the optimize command. I should
  have
enough space for that. What about the heap space requirement? I
 hope it
   can
do the optimize with the memory that is allocated to it.
   
Thanks
Vinay
   
   
On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com
  wrote:
   
The optimize should, indeed, reduce the index size. Be aware that
 it
may consume 2x the disk space. You may also try expungedeletes, see
here: https://wiki.apache.org/solr/UpdateXmlMessages
   
Best,
Erick
   
On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com
 
wrote:
 Another update:

 I removed the replicas - to avoid the replication doing a full
  copy. I
am
 able delete sizeable chunks of data.
 But the overall index size remains the same even after the
 deletes.
  It
does
 not seem to go down.

 I understand that Solr would do this in background - but I don't
  seem
   to
 see the decrease in overall index size even after 1-2 hours.
 I can see a bunch of .del files in the index directory, but
 the it
does
 not seem to get cleaned up. Is there anyway to monitor/follow the
progress
 of index compaction?

 Also, does triggering optimize from the admin UI help to
 compact
  the
 index size on disk?

 Thanks
 Vinay


 On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote:

 Some update:

 I removed the auto warm configurations for the various caches
 and
reduced
 the cache sizes. I then issued a call to delete a day's worth of
  data
(800K
 documents).

 There was no out of memory this time - but some of the nodes
 went
   into
 recovery mode. Was able to catch some logs this time around and
  this
   is
 what i see:

 
 *WARN  [2014-04-14 18:11:00.381]
 [org.apache.solr.update.PeerSync]
 PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
 http://host1:8983/solr too many updates received since start
 -
 startingUpdates no longer overlaps with our currentUpdates*
 *INFO  [2014-04-14 18:11:00.476]
[org.apache.solr.cloud.RecoveryStrategy]
 PeerSync Recovery was not successful - trying replication.
 core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.476]
[org.apache.solr.cloud.RecoveryStrategy]
 Starting Replication Recovery. core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.535]
[org.apache.solr.cloud.RecoveryStrategy]
 Begin buffering updates. core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.536]
[org.apache.solr.cloud.RecoveryStrategy]
 Attempting to replicate from
http://host2:8983/solr/core1_shard1_replica1/
 http://host2:8983/solr/core1_shard1_replica1/.
core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.536]
 [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new
  http
 client,

   
  
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
 *INFO  [2014-04-14 18:11:01.964]
 [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new
  http
 client,

   
  
 
 config:connTimeout=5000socketTimeout

Re: deleting large amount data from solr cloud

2014-04-16 Thread Vinay Pothnis
Thank you Erick!
Yes - I am using the expunge deletes option.

Thanks for the note on disk space for the optimize command. I should have
enough space for that. What about the heap space requirement? I hope it can
do the optimize with the memory that is allocated to it.

Thanks
Vinay


On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote:

 The optimize should, indeed, reduce the index size. Be aware that it
 may consume 2x the disk space. You may also try expungedeletes, see
 here: https://wiki.apache.org/solr/UpdateXmlMessages

 Best,
 Erick

 On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com wrote:
  Another update:
 
  I removed the replicas - to avoid the replication doing a full copy. I am
  able delete sizeable chunks of data.
  But the overall index size remains the same even after the deletes. It
 does
  not seem to go down.
 
  I understand that Solr would do this in background - but I don't seem to
  see the decrease in overall index size even after 1-2 hours.
  I can see a bunch of .del files in the index directory, but the it does
  not seem to get cleaned up. Is there anyway to monitor/follow the
 progress
  of index compaction?
 
  Also, does triggering optimize from the admin UI help to compact the
  index size on disk?
 
  Thanks
  Vinay
 
 
  On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote:
 
  Some update:
 
  I removed the auto warm configurations for the various caches and
 reduced
  the cache sizes. I then issued a call to delete a day's worth of data
 (800K
  documents).
 
  There was no out of memory this time - but some of the nodes went into
  recovery mode. Was able to catch some logs this time around and this is
  what i see:
 
  
  *WARN  [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync]
  PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
  http://host1:8983/solr too many updates received since start -
  startingUpdates no longer overlaps with our currentUpdates*
  *INFO  [2014-04-14 18:11:00.476]
 [org.apache.solr.cloud.RecoveryStrategy]
  PeerSync Recovery was not successful - trying replication.
  core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.476]
 [org.apache.solr.cloud.RecoveryStrategy]
  Starting Replication Recovery. core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.535]
 [org.apache.solr.cloud.RecoveryStrategy]
  Begin buffering updates. core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.536]
 [org.apache.solr.cloud.RecoveryStrategy]
  Attempting to replicate from
 http://host2:8983/solr/core1_shard1_replica1/
  http://host2:8983/solr/core1_shard1_replica1/.
 core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.536]
  [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
  client,
 
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
  *INFO  [2014-04-14 18:11:01.964]
  [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
  client,
 
 config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1*
  *INFO  [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller]
  No
  value set for 'pollInterval'. Timer Task not started.*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Master's generation: 1108645*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Slave's generation: 1108627*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Starting replication process*
  *INFO  [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller]
  Number of files in latest index in master: 814*
  *INFO  [2014-04-14 18:11:02.007]
  [org.apache.solr.core.CachingDirectoryFactory] return new directory for
  /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007*
  *INFO  [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller]
  Starting download to
  NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
 /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe;
  maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true*
 
  
 
 
  So, it looks like the number of updates is too huge for the regular
  replication and then it goes into full copy of index. And since our
 index
  size is very huge (350G), this is causing the cluster to go into
 recovery
  mode forever - trying to copy that huge index.
 
  I also read in some thread
 
 http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthatthere
  is a limit of 100 documents.
 
  I wonder if this has been updated to make that configurable since that
  thread. If not, the only option I see is to do a trickle delete of 100
  documents per second or something.
 
  Also - the other suggestion of using distributed=false might not help
  because the issue currently is that the replication is going to full
 copy

Re: deleting large amount data from solr cloud

2014-04-16 Thread Vinay Pothnis
Hello,

Couple of follow up questions:

* When the optimize command is run, looks like it creates one big segment
(forceMerge = 1). Will it get split at any point later? Or will that big
segment remain?

* Is there anyway to maintain the number of segments - but still merge to
reclaim the deleted documents space? In other words, can I issue
forceMerge=20? If so, how would the command look like? Any examples for
this?

Thanks
Vinay



On 16 April 2014 07:59, Vinay Pothnis poth...@gmail.com wrote:

 Thank you Erick!
 Yes - I am using the expunge deletes option.

 Thanks for the note on disk space for the optimize command. I should have
 enough space for that. What about the heap space requirement? I hope it can
 do the optimize with the memory that is allocated to it.

 Thanks
 Vinay


 On 16 April 2014 04:52, Erick Erickson erickerick...@gmail.com wrote:

 The optimize should, indeed, reduce the index size. Be aware that it
 may consume 2x the disk space. You may also try expungedeletes, see
 here: https://wiki.apache.org/solr/UpdateXmlMessages

 Best,
 Erick

 On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis poth...@gmail.com
 wrote:
  Another update:
 
  I removed the replicas - to avoid the replication doing a full copy. I
 am
  able delete sizeable chunks of data.
  But the overall index size remains the same even after the deletes. It
 does
  not seem to go down.
 
  I understand that Solr would do this in background - but I don't seem to
  see the decrease in overall index size even after 1-2 hours.
  I can see a bunch of .del files in the index directory, but the it
 does
  not seem to get cleaned up. Is there anyway to monitor/follow the
 progress
  of index compaction?
 
  Also, does triggering optimize from the admin UI help to compact the
  index size on disk?
 
  Thanks
  Vinay
 
 
  On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote:
 
  Some update:
 
  I removed the auto warm configurations for the various caches and
 reduced
  the cache sizes. I then issued a call to delete a day's worth of data
 (800K
  documents).
 
  There was no out of memory this time - but some of the nodes went into
  recovery mode. Was able to catch some logs this time around and this is
  what i see:
 
  
  *WARN  [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync]
  PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
  http://host1:8983/solr too many updates received since start -
  startingUpdates no longer overlaps with our currentUpdates*
  *INFO  [2014-04-14 18:11:00.476]
 [org.apache.solr.cloud.RecoveryStrategy]
  PeerSync Recovery was not successful - trying replication.
  core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.476]
 [org.apache.solr.cloud.RecoveryStrategy]
  Starting Replication Recovery. core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.535]
 [org.apache.solr.cloud.RecoveryStrategy]
  Begin buffering updates. core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.536]
 [org.apache.solr.cloud.RecoveryStrategy]
  Attempting to replicate from
 http://host2:8983/solr/core1_shard1_replica1/
  http://host2:8983/solr/core1_shard1_replica1/.
 core=core1_shard1_replica2*
  *INFO  [2014-04-14 18:11:00.536]
  [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
  client,
 
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
  *INFO  [2014-04-14 18:11:01.964]
  [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
  client,
 
 config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1*
  *INFO  [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller]
  No
  value set for 'pollInterval'. Timer Task not started.*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Master's generation: 1108645*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Slave's generation: 1108627*
  *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
  Starting replication process*
  *INFO  [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller]
  Number of files in latest index in master: 814*
  *INFO  [2014-04-14 18:11:02.007]
  [org.apache.solr.core.CachingDirectoryFactory] return new directory for
  /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007*
  *INFO  [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller]
  Starting download to
  NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
 /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe;
  maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true*
 
  
 
 
  So, it looks like the number of updates is too huge for the regular
  replication and then it goes into full copy of index. And since our
 index
  size is very huge (350G), this is causing the cluster to go into
 recovery
  mode forever - trying to copy that huge

Re: deleting large amount data from solr cloud

2014-04-15 Thread Vinay Pothnis
Another update:

I removed the replicas - to avoid the replication doing a full copy. I am
able delete sizeable chunks of data.
But the overall index size remains the same even after the deletes. It does
not seem to go down.

I understand that Solr would do this in background - but I don't seem to
see the decrease in overall index size even after 1-2 hours.
I can see a bunch of .del files in the index directory, but the it does
not seem to get cleaned up. Is there anyway to monitor/follow the progress
of index compaction?

Also, does triggering optimize from the admin UI help to compact the
index size on disk?

Thanks
Vinay


On 14 April 2014 12:19, Vinay Pothnis poth...@gmail.com wrote:

 Some update:

 I removed the auto warm configurations for the various caches and reduced
 the cache sizes. I then issued a call to delete a day's worth of data (800K
 documents).

 There was no out of memory this time - but some of the nodes went into
 recovery mode. Was able to catch some logs this time around and this is
 what i see:

 
 *WARN  [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync]
 PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
 http://host1:8983/solr too many updates received since start -
 startingUpdates no longer overlaps with our currentUpdates*
 *INFO  [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy]
 PeerSync Recovery was not successful - trying replication.
 core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy]
 Starting Replication Recovery. core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy]
 Begin buffering updates. core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy]
 Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/
 http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2*
 *INFO  [2014-04-14 18:11:00.536]
 [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
 client,
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
 *INFO  [2014-04-14 18:11:01.964]
 [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
 client,
 config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1*
 *INFO  [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller]  No
 value set for 'pollInterval'. Timer Task not started.*
 *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
 Master's generation: 1108645*
 *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
 Slave's generation: 1108627*
 *INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
 Starting replication process*
 *INFO  [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller]
 Number of files in latest index in master: 814*
 *INFO  [2014-04-14 18:11:02.007]
 [org.apache.solr.core.CachingDirectoryFactory] return new directory for
 /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007*
 *INFO  [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller]
 Starting download to
 NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe;
 maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true*

 


 So, it looks like the number of updates is too huge for the regular
 replication and then it goes into full copy of index. And since our index
 size is very huge (350G), this is causing the cluster to go into recovery
 mode forever - trying to copy that huge index.

 I also read in some thread
 http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthat
  there is a limit of 100 documents.

 I wonder if this has been updated to make that configurable since that
 thread. If not, the only option I see is to do a trickle delete of 100
 documents per second or something.

 Also - the other suggestion of using distributed=false might not help
 because the issue currently is that the replication is going to full copy.

 Any thoughts?

 Thanks
 Vinay







 On 14 April 2014 07:54, Vinay Pothnis poth...@gmail.com wrote:

 Yes, that is our approach. We did try deleting a day's worth of data at a
 time, and that resulted in OOM as well.

 Thanks
 Vinay


 On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 I mean you can divide the range (i.e. one week at each delete instead of
 one month) and try to check whether you still get an OOM or not.

 Thanks;
 Furkan KAMACI


 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com:

  Aman,
  Yes - Will do!
 
  Furkan,
  How do you mean by 'bulk delete'?
 
  -Thanks
  Vinay
 
 
  On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote:
 
   Hi;
  
   Do you get any problems when you index

Re: Tipping point of solr shards (Num of docs / size)

2014-04-15 Thread Vinay Pothnis
You could look at this link to understand about the factors that affect the
solrcloud performance: http://wiki.apache.org/solr/SolrPerformanceProblems

Especially, the sections about RAM and disk cache. If the index grows too
big for one node, it can lead to performance issues. From the looks of it,
500mil docs per shard - may be already pushing it. How much does that
translate to in terms of index size on disk per shard?

-vinay


On 15 April 2014 21:44, Mukesh Jha me.mukesh@gmail.com wrote:

 Hi Gurus,

 In my solr cluster I've multiple shards and each shard containing
 ~500,000,000 documents total index size being ~1 TB.

 I was just wondering how much more can I keep on adding to the shard before
 we reach a tipping point and the performance starts to degrade?

 Also as best practice what is the recomended no of docs / size of shards .

 Txz in advance :)

 --
 Thanks  Regards,

 *Mukesh Jha me.mukesh@gmail.com*



Re: deleting large amount data from solr cloud

2014-04-14 Thread Vinay Pothnis
Yes, that is our approach. We did try deleting a day's worth of data at a
time, and that resulted in OOM as well.

Thanks
Vinay


On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 I mean you can divide the range (i.e. one week at each delete instead of
 one month) and try to check whether you still get an OOM or not.

 Thanks;
 Furkan KAMACI


 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com:

  Aman,
  Yes - Will do!
 
  Furkan,
  How do you mean by 'bulk delete'?
 
  -Thanks
  Vinay
 
 
  On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote:
 
   Hi;
  
   Do you get any problems when you index your data? On the other hand
   deleting as bulks and reducing the size of documents may help you not
 to
   hit OOM.
  
   Thanks;
   Furkan KAMACI
  
  
   2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com:
  
Vinay please share your experience after trying this solution.
   
   
On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com
   wrote:
   
 The query is something like this:


 *curl -H 'Content-Type: text/xml' --data
 'deletequeryparam1:(val1
   OR
 val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
 138516480]/query/delete'
 'http://host:port/solr/coll-name1/update?commit=true'*

 Trying to restrict the number of documents deleted via the date
parameter.

 Had not tried the distrib=false option. I could give that a try.
   Thanks
 for the link! I will check on the cache sizes and autowarm values.
  Will
try
 and disable the caches when I am deleting and give that a try.

 Thanks Erick and Shawn for your inputs!

 -Vinay



 On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote:

  On 4/10/2014 7:25 PM, Vinay Pothnis wrote:
 
  When we tried to delete the data through a query - say 1
  day/month's
 worth
  of data. But after deleting just 1 month's worth of data, the
  master
 node
  is going out of memory - heap space.
 
  Wondering is there any way to incrementally delete the data
  without
  affecting the cluster adversely.
 
 
  I'm curious about the actual query being used here.  Can you
 share
   it,
or
  a redacted version of it?  Perhaps there might be a clue there?
 
  Is this a fully distributed delete request?  One thing you might
  try,
  assuming Solr even supports it, is sending the same delete
 request
 directly
  to each shard core with distrib=false.
 
  Here's a very incomplete list about how you can reduce Solr heap
  requirements:
 
  http://wiki.apache.org/solr/SolrPerformanceProblems#
  Reducing_heap_requirements
 
  Thanks,
  Shawn
 
 

   
   
   
--
With Regards
Aman Tandon
   
  
 



Re: deleting large amount data from solr cloud

2014-04-14 Thread Vinay Pothnis
Some update:

I removed the auto warm configurations for the various caches and reduced
the cache sizes. I then issued a call to delete a day's worth of data (800K
documents).

There was no out of memory this time - but some of the nodes went into
recovery mode. Was able to catch some logs this time around and this is
what i see:


*WARN  [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync]
PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr
http://host1:8983/solr too many updates received since start -
startingUpdates no longer overlaps with our currentUpdates*
*INFO  [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy]
PeerSync Recovery was not successful - trying replication.
core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.476] [org.apache.solr.cloud.RecoveryStrategy]
Starting Replication Recovery. core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.535] [org.apache.solr.cloud.RecoveryStrategy]
Begin buffering updates. core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.536] [org.apache.solr.cloud.RecoveryStrategy]
Attempting to replicate from http://host2:8983/solr/core1_shard1_replica1/
http://host2:8983/solr/core1_shard1_replica1/. core=core1_shard1_replica2*
*INFO  [2014-04-14 18:11:00.536]
[org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
*INFO  [2014-04-14 18:11:01.964]
[org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http
client,
config:connTimeout=5000socketTimeout=2allowCompression=falsemaxConnections=1maxConnectionsPerHost=1*
*INFO  [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller]  No
value set for 'pollInterval'. Timer Task not started.*
*INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
Master's generation: 1108645*
*INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
Slave's generation: 1108627*
*INFO  [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller]
Starting replication process*
*INFO  [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller]
Number of files in latest index in master: 814*
*INFO  [2014-04-14 18:11:02.007]
[org.apache.solr.core.CachingDirectoryFactory] return new directory for
/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007*
*INFO  [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller]
Starting download to
NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/data/solr/core1_shard1_replica2/data/index.20140414181102007
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe;
maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true*




So, it looks like the number of updates is too huge for the regular
replication and then it goes into full copy of index. And since our index
size is very huge (350G), this is causing the cluster to go into recovery
mode forever - trying to copy that huge index.

I also read in some thread
http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthat
there is a limit of 100 documents.

I wonder if this has been updated to make that configurable since that
thread. If not, the only option I see is to do a trickle delete of 100
documents per second or something.

Also - the other suggestion of using distributed=false might not help
because the issue currently is that the replication is going to full copy.

Any thoughts?

Thanks
Vinay







On 14 April 2014 07:54, Vinay Pothnis poth...@gmail.com wrote:

 Yes, that is our approach. We did try deleting a day's worth of data at a
 time, and that resulted in OOM as well.

 Thanks
 Vinay


 On 14 April 2014 00:27, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 I mean you can divide the range (i.e. one week at each delete instead of
 one month) and try to check whether you still get an OOM or not.

 Thanks;
 Furkan KAMACI


 2014-04-14 7:09 GMT+03:00 Vinay Pothnis poth...@gmail.com:

  Aman,
  Yes - Will do!
 
  Furkan,
  How do you mean by 'bulk delete'?
 
  -Thanks
  Vinay
 
 
  On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote:
 
   Hi;
  
   Do you get any problems when you index your data? On the other hand
   deleting as bulks and reducing the size of documents may help you not
 to
   hit OOM.
  
   Thanks;
   Furkan KAMACI
  
  
   2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com:
  
Vinay please share your experience after trying this solution.
   
   
On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com
   wrote:
   
 The query is something like this:


 *curl -H 'Content-Type: text/xml' --data
 'deletequeryparam1:(val1
   OR
 val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
 138516480]/query/delete'
 'http://host:port/solr/coll-name1/update?commit=true'*

 Trying to restrict the number of documents deleted via the date
parameter

Re: deleting large amount data from solr cloud

2014-04-13 Thread Vinay Pothnis
Aman,
Yes - Will do!

Furkan,
How do you mean by 'bulk delete'?

-Thanks
Vinay


On 12 April 2014 14:49, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 Do you get any problems when you index your data? On the other hand
 deleting as bulks and reducing the size of documents may help you not to
 hit OOM.

 Thanks;
 Furkan KAMACI


 2014-04-12 8:22 GMT+03:00 Aman Tandon amantandon...@gmail.com:

  Vinay please share your experience after trying this solution.
 
 
  On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis poth...@gmail.com
 wrote:
 
   The query is something like this:
  
  
   *curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1
 OR
   val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
   138516480]/query/delete'
   'http://host:port/solr/coll-name1/update?commit=true'*
  
   Trying to restrict the number of documents deleted via the date
  parameter.
  
   Had not tried the distrib=false option. I could give that a try.
 Thanks
   for the link! I will check on the cache sizes and autowarm values. Will
  try
   and disable the caches when I am deleting and give that a try.
  
   Thanks Erick and Shawn for your inputs!
  
   -Vinay
  
  
  
   On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote:
  
On 4/10/2014 7:25 PM, Vinay Pothnis wrote:
   
When we tried to delete the data through a query - say 1 day/month's
   worth
of data. But after deleting just 1 month's worth of data, the master
   node
is going out of memory - heap space.
   
Wondering is there any way to incrementally delete the data without
affecting the cluster adversely.
   
   
I'm curious about the actual query being used here.  Can you share
 it,
  or
a redacted version of it?  Perhaps there might be a clue there?
   
Is this a fully distributed delete request?  One thing you might try,
assuming Solr even supports it, is sending the same delete request
   directly
to each shard core with distrib=false.
   
Here's a very incomplete list about how you can reduce Solr heap
requirements:
   
http://wiki.apache.org/solr/SolrPerformanceProblems#
Reducing_heap_requirements
   
Thanks,
Shawn
   
   
  
 
 
 
  --
  With Regards
  Aman Tandon
 



Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
Sorry - yes, I meant to say leader.
Each JVM has 16G of memory.


On 10 April 2014 20:54, Erick Erickson erickerick...@gmail.com wrote:

 First, there is no master node, just leaders and replicas. But that's a
 nit.

 No real clue why you would be going out of memory. Deleting a
 document, even by query should just mark the docs as deleted, a pretty
 low-cost operation.

 how much memory are you giving the JVM?

 Best,
 Erick

 On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com wrote:
  [solr version 4.3.1]
 
  Hello,
 
  I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
  documents (~360G of index per shard). Now, a major portion of the data is
  not required and I need to delete those documents. I would need to delete
  around 75% of the data.
 
  One of the solutions could be to drop the index completely re-index. But
  this is not an option at the moment.
 
  When we tried to delete the data through a query - say 1 day/month's
 worth
  of data. But after deleting just 1 month's worth of data, the master node
  is going out of memory - heap space.
 
  Wondering is there any way to incrementally delete the data without
  affecting the cluster adversely.
 
  Thank!
  Vinay



Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
Tried to increase the memory to 24G but that wasn't enough as well.
Agree that the index has now grown too much and had to monitor this and
take action much earlier.

The search operations seem to run ok with 16G - mainly because the bulk of
the data that we are trying to delete is not getting searched. So, now -
basically in a salvage mode.

Does the number of documents deleted at a time have any impact? If I
'trickle delete' - say 50K documents at a time - would that make a
difference?

When i delete, does solr try to bring in all the index to memory? Trying to
understand what happens under the hood.

Thanks
Vinay


On 11 April 2014 13:53, Erick Erickson erickerick...@gmail.com wrote:

 Using 16G for a 360G index is probably pushing things. A lot. I'm
 actually a bit surprised that the problem only occurs when you delete
 docs

 The simplest thing would be to increase the JVM memory. You should be
 looking at your index to see how big it is, be sure to subtract out
 the *.fdt and *.fdx files, those are used for verbatim copies of the
 raw data and don't really count towards the memory requirements.

 I suspect you're just not giving enough memory to your JVM and this is
 just the first OOM you've hit. Look on the Solr admin page and see how
 much is being reported, if it's near the limit of your 16G that's the
 smoking gun...

 Best,
 Erick

 On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis poth...@gmail.com wrote:
  Sorry - yes, I meant to say leader.
  Each JVM has 16G of memory.
 
 
  On 10 April 2014 20:54, Erick Erickson erickerick...@gmail.com wrote:
 
  First, there is no master node, just leaders and replicas. But that's
 a
  nit.
 
  No real clue why you would be going out of memory. Deleting a
  document, even by query should just mark the docs as deleted, a pretty
  low-cost operation.
 
  how much memory are you giving the JVM?
 
  Best,
  Erick
 
  On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com
 wrote:
   [solr version 4.3.1]
  
   Hello,
  
   I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
   documents (~360G of index per shard). Now, a major portion of the
 data is
   not required and I need to delete those documents. I would need to
 delete
   around 75% of the data.
  
   One of the solutions could be to drop the index completely re-index.
 But
   this is not an option at the moment.
  
   When we tried to delete the data through a query - say 1 day/month's
  worth
   of data. But after deleting just 1 month's worth of data, the master
 node
   is going out of memory - heap space.
  
   Wondering is there any way to incrementally delete the data without
   affecting the cluster adversely.
  
   Thank!
   Vinay
 



Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
The query is something like this:


*curl -H 'Content-Type: text/xml' --data 'deletequeryparam1:(val1 OR
val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
138516480]/query/delete'
'http://host:port/solr/coll-name1/update?commit=true'*

Trying to restrict the number of documents deleted via the date parameter.

Had not tried the distrib=false option. I could give that a try. Thanks
for the link! I will check on the cache sizes and autowarm values. Will try
and disable the caches when I am deleting and give that a try.

Thanks Erick and Shawn for your inputs!

-Vinay



On 11 April 2014 15:28, Shawn Heisey s...@elyograg.org wrote:

 On 4/10/2014 7:25 PM, Vinay Pothnis wrote:

 When we tried to delete the data through a query - say 1 day/month's worth
 of data. But after deleting just 1 month's worth of data, the master node
 is going out of memory - heap space.

 Wondering is there any way to incrementally delete the data without
 affecting the cluster adversely.


 I'm curious about the actual query being used here.  Can you share it, or
 a redacted version of it?  Perhaps there might be a clue there?

 Is this a fully distributed delete request?  One thing you might try,
 assuming Solr even supports it, is sending the same delete request directly
 to each shard core with distrib=false.

 Here's a very incomplete list about how you can reduce Solr heap
 requirements:

 http://wiki.apache.org/solr/SolrPerformanceProblems#
 Reducing_heap_requirements

 Thanks,
 Shawn




deleting large amount data from solr cloud

2014-04-10 Thread Vinay Pothnis
[solr version 4.3.1]

Hello,

I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
documents (~360G of index per shard). Now, a major portion of the data is
not required and I need to delete those documents. I would need to delete
around 75% of the data.

One of the solutions could be to drop the index completely re-index. But
this is not an option at the moment.

When we tried to delete the data through a query - say 1 day/month's worth
of data. But after deleting just 1 month's worth of data, the master node
is going out of memory - heap space.

Wondering is there any way to incrementally delete the data without
affecting the cluster adversely.

Thank!
Vinay


Re: Solr + SPDY

2013-10-26 Thread Vinay Pothnis
Hi Otis,

While the main goal of SPDY is to reduce page load times - i think we could
benefit from it in Solr context as well.
The transport layer is still TCP - but SPDY allows multiplexing of
requests. It also uses compression and reduces the overhead of http
headers.

An excerpt from http://webtide.intalio.com/2012/03/spdy-support-in-jetty/

SPDY reduces roundtrips with the server, reduces the HTTP verboseness by
compressing HTTP headers, improves the utilization of the TCP connection,
multiplexes requests into a single TCP connection (instead of using a
limited number of connections, each serving only one request),

1. For users who are using http client to communicate with Solr, for
sending updates or for searching, they could benefit from SPDY
optimizations. They could make use of Jetty Http Client and set up Solr on
Jetty to enable communication over SPDY.

2. As far as SolrCloud internode communication is concerned - I am not very
sure as to hw beneficial it would be. I brought his up because, in the Solr
Cloud context, there's a lot of inter-node chatter happening to facilitate
distributed search/distributed indexing. So - I was wondering if anyone
else gave a thought about this.

Cheers
Vinay

Some references:
http://www.chromium.org/spdy/spdy-whitepaper
http://webtide.intalio.com/2012/03/spdy-support-in-jetty/
http://www.eclipse.org/jetty/documentation/current/spdy.html


On Fri, Oct 25, 2013 at 12:22 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 I'm rusty on SPDY. Can you summarize the benefits in Solr context?  Thanks.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Oct 25, 2013 10:46 AM, Vinay Pothnis poth...@gmail.com wrote:

  Hello,
 
  Couple of questions related to using SPDY with solr.
 
  1. Does anybody have experience running Solr on Jetty 9 with SPDY
 support -
  and using Jetty Client (SPDY capable client) to talk to Solr over SPDY?
 
  2. This is related to Solr - Cloud - inter node communication. This might
  not be a user-list question - nonetheless, I was wondering if there would
  be some way to enable the use of SPDY for inter-node communication in a
  Solr Cloud set up. Is this something that the solr team might look at?
 
  Thanks
  Vinay
 



Solr + SPDY

2013-10-25 Thread Vinay Pothnis
Hello,

Couple of questions related to using SPDY with solr.

1. Does anybody have experience running Solr on Jetty 9 with SPDY support -
and using Jetty Client (SPDY capable client) to talk to Solr over SPDY?

2. This is related to Solr - Cloud - inter node communication. This might
not be a user-list question - nonetheless, I was wondering if there would
be some way to enable the use of SPDY for inter-node communication in a
Solr Cloud set up. Is this something that the solr team might look at?

Thanks
Vinay


Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-26 Thread Vinay Pothnis
Thank you Erick!

Will look at all these suggestions.

-Vinay


On Wed, Jun 26, 2013 at 6:37 AM, Erick Erickson erickerick...@gmail.comwrote:

 Right, unfortunately this is a gremlin lurking in the weeds, see:
 http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock

 There are a couple of ways to deal with this:
 1 go ahead and up the limit and re-compile, if you look at
 SolrCmdDistributor the semaphore is defined there.

 2 https://issues.apache.org/jira/browse/SOLR-4816 should
 address this as well as improve indexing throughput. I'm totally sure
 Joel (the guy working on this) would be thrilled if you were able to
 verify that these two points, I'd ask him (on the JIRA) whether he thinks
 it's ready to test.

 3 Reduce the number of threads you're indexing with

 4 index docs in small packets, perhaps even one and just rack
 together a zillion threads to get throughput.

 FWIW,
 Erick

 On Tue, Jun 25, 2013 at 8:55 AM, Vinay Pothnis poth...@gmail.com wrote:
  Jason and Scott,
 
  Thanks for the replies and pointers!
  Yes, I will consider the 'maxDocs' value as well. How do i monitor the
  transaction logs during the interval between commits?
 
  Thanks
  Vinay
 
 
  On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
  Scott,
 
  My comment was meant to be a bit tongue-in-cheek, but my intent in the
  statement was to represent hard failure along the lines Vinay is seeing.
   We're talking about OutOfMemoryException conditions, total cluster
  paralysis requiring restart, or other similar and disastrous conditions.
 
  Where that line is is impossible to generically define, but trivial to
  accomplish.  What any of us running Solr has to achieve is a realistic
  simulation of our desired production load (probably well above peak)
 and to
  see what limits are reached.  Armed with that information we tweak.  In
  this case, we look at finding the point where data ingestion reaches a
  natural limit.  For some that may be JVM GC, for others memory buffer
 size
  on the client load, and yet others it may be I/O limits on multithreaded
  reads from a database or file system.
 
  In old Solr days we had a little less to worry about.  We might play
 with
  a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
  commits and rollback recoveries.  But with 4.x we now have more durable
  write options and NRT to consider, and SolrCloud begs to use this.  So
 we
  have to consider transaction logs, the file handles they leave open
 until
  commit operations occur, and how we want to manage writing to all cores
  simultaneously instead of a more narrow master/slave relationship.
 
  It's all manageable, all predictable (with some load testing) and all
  filled with many possibilities to meet our specific needs.  Considering
 hat
  each person's data model, ingestion pipeline, request processors, and
 field
  analysis steps will be different, 5 threads of input at face value
 doesn't
  really contemplate the whole problem.  We have to measure our actual
 data
  against our expectations and find where the weak chain links are to
  strengthen them.  The symptoms aren't necessarily predictable in
 advance of
  this testing, but they're likely addressable and not difficult to
 decipher.
 
  For what it's worth, SolrCloud is new enough that we're still
 experiencing
  some uncharted territory with unknown ramifications but with continued
  dialog through channels like these there are fewer territories without
 good
  cartography :)
 
  Hope that's of use!
 
  Jason
 
 
 
  On Jun 24, 2013, at 7:12 PM, Scott Lundgren 
  scott.lundg...@carbonblack.com wrote:
 
   Jason,
  
   Regarding your statement push you over the edge- what does that
 mean?
   Does it mean uncharted territory with unknown ramifications or
  something
   more like specific, known symptoms?
  
   I ask because our use is similar to Vinay's in some respects, and we
 want
   to be able to push the capabilities of write perf - but not over the
  edge!
   In particular, I am interested in knowing the symptoms of failure, to
  help
   us troubleshoot the underlying problems if and when they arise.
  
   Thanks,
  
   Scott
  
   On Monday, June 24, 2013, Jason Hellman wrote:
  
   Vinay,
  
   You may wish to pay attention to how many transaction logs are being
   created along the way to your hard autoCommit, which should truncate
 the
   open handles for those files.  I might suggest setting a maxDocs
 value
  in
   parallel with your maxTime value (you can use both) to ensure the
 commit
   occurs at either breakpoint.  30 seconds is plenty of time for 5
  parallel
   processes of 20 document submissions to push you over the edge.
  
   Jason
  
   On Jun 24, 2013, at 2:21 PM, Vinay Pothnis poth...@gmail.com
 wrote:
  
   I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30
 seconds.
  
   On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman 
   jhell...@innoventsolutions.com

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-25 Thread Vinay Pothnis
Jason and Scott,

Thanks for the replies and pointers!
Yes, I will consider the 'maxDocs' value as well. How do i monitor the
transaction logs during the interval between commits?

Thanks
Vinay


On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman 
jhell...@innoventsolutions.com wrote:

 Scott,

 My comment was meant to be a bit tongue-in-cheek, but my intent in the
 statement was to represent hard failure along the lines Vinay is seeing.
  We're talking about OutOfMemoryException conditions, total cluster
 paralysis requiring restart, or other similar and disastrous conditions.

 Where that line is is impossible to generically define, but trivial to
 accomplish.  What any of us running Solr has to achieve is a realistic
 simulation of our desired production load (probably well above peak) and to
 see what limits are reached.  Armed with that information we tweak.  In
 this case, we look at finding the point where data ingestion reaches a
 natural limit.  For some that may be JVM GC, for others memory buffer size
 on the client load, and yet others it may be I/O limits on multithreaded
 reads from a database or file system.

 In old Solr days we had a little less to worry about.  We might play with
 a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
 commits and rollback recoveries.  But with 4.x we now have more durable
 write options and NRT to consider, and SolrCloud begs to use this.  So we
 have to consider transaction logs, the file handles they leave open until
 commit operations occur, and how we want to manage writing to all cores
 simultaneously instead of a more narrow master/slave relationship.

 It's all manageable, all predictable (with some load testing) and all
 filled with many possibilities to meet our specific needs.  Considering hat
 each person's data model, ingestion pipeline, request processors, and field
 analysis steps will be different, 5 threads of input at face value doesn't
 really contemplate the whole problem.  We have to measure our actual data
 against our expectations and find where the weak chain links are to
 strengthen them.  The symptoms aren't necessarily predictable in advance of
 this testing, but they're likely addressable and not difficult to decipher.

 For what it's worth, SolrCloud is new enough that we're still experiencing
 some uncharted territory with unknown ramifications but with continued
 dialog through channels like these there are fewer territories without good
 cartography :)

 Hope that's of use!

 Jason



 On Jun 24, 2013, at 7:12 PM, Scott Lundgren 
 scott.lundg...@carbonblack.com wrote:

  Jason,
 
  Regarding your statement push you over the edge- what does that mean?
  Does it mean uncharted territory with unknown ramifications or
 something
  more like specific, known symptoms?
 
  I ask because our use is similar to Vinay's in some respects, and we want
  to be able to push the capabilities of write perf - but not over the
 edge!
  In particular, I am interested in knowing the symptoms of failure, to
 help
  us troubleshoot the underlying problems if and when they arise.
 
  Thanks,
 
  Scott
 
  On Monday, June 24, 2013, Jason Hellman wrote:
 
  Vinay,
 
  You may wish to pay attention to how many transaction logs are being
  created along the way to your hard autoCommit, which should truncate the
  open handles for those files.  I might suggest setting a maxDocs value
 in
  parallel with your maxTime value (you can use both) to ensure the commit
  occurs at either breakpoint.  30 seconds is plenty of time for 5
 parallel
  processes of 20 document submissions to push you over the edge.
 
  Jason
 
  On Jun 24, 2013, at 2:21 PM, Vinay Pothnis poth...@gmail.com wrote:
 
  I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.
 
  On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman 
  jhell...@innoventsolutions.com wrote:
 
  Vinay,
 
  What autoCommit settings do you have for your indexing process?
 
  Jason
 
  On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote:
 
  Here is the ulimit -a output:
 
  core file size   (blocks, -c)  0  data seg size
  (kbytes,
  -d)  unlimited  scheduling priority  (-e)  0  file size
   (blocks, -f)  unlimited  pending signals
  (-i)  179963  max locked memory(kbytes, -l)  64  max memory
  size
 (kbytes, -m)  unlimited  open files   (-n)
  32769  pipe size (512 bytes, -p)  8  POSIX message queues
   (bytes,
  -q)  819200  real-time priority   (-r)  0  stack size
  (kbytes, -s)  10240  cpu time(seconds, -t)  unlimited
  max
  user processes   (-u)  14  virtual memory
  (kbytes,
  -v)  unlimited  file locks   (-x)  unlimited
 
  On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro 
 yago.rive...@gmail.com
  wrote:
 
  Hi,
 
  I have the same issue too, and the deploy is quasi exact like than
  mine,
 
 
 
 http://lucene.472066.n3

[solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Vinay Pothnis
Hello All,

I have the following set up of solr cloud.

* solr version 4.3.1
* 3 node solr cloud + replciation factor 2
* 3 zoo keepers
* load balancer in front of the 3 solr nodes

I am seeing this strange behavior when I am indexing a large number of
documents (10 mil). When I have more than 3-5 threads sending documents (in
batch of 20) to solr, sometimes solr goes into a hung state. After this all
the update requests get timed out. What we see via AppDynamics (a
performance monitoring tool) is that there are a number of threads that are
stalled. The stack trace for one of the threads is shown below.

The cluster has to be restarted to recover from this. When I reduce the
concurrency to 1, 2, 3 threads, then the indexing goes through smoothly.
Any pointers as to what could be wrong here?

We send the updates to one of the nodes in the solr cloud through a load
balancer.

Thanks
Vinay

Thread Name:qtp2141131052-78
ID:78
Time:Fri Jun 21 23:20:22 GMT 2013
State:WAITING
Priority:5

sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.
LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
org.eclipse.jetty.server.Server.handle(Server.java:445)
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
java.lang.Thread.run(Thread.java:722


Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Vinay Pothnis
Here is the ulimit -a output:

  core file size   (blocks, -c)  0  data seg size(kbytes,
-d)  unlimited  scheduling priority  (-e)  0  file size
(blocks, -f)  unlimited  pending signals
(-i)  179963  max locked memory(kbytes, -l)  64  max memory size
  (kbytes, -m)  unlimited  open files   (-n)
32769  pipe size (512 bytes, -p)  8  POSIX message queues
(bytes,
-q)  819200  real-time priority   (-r)  0  stack size
(kbytes, -s)  10240  cpu time(seconds, -t)  unlimited  max
user processes   (-u)  14  virtual memory   (kbytes,
-v)  unlimited  file locks   (-x)  unlimited

On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro yago.rive...@gmail.comwrote:

 Hi,

 I have the same issue too, and the deploy is quasi exact like than mine,
 http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862

 With some concurrence and batches of 10 solr apparently have some deadlock
 distributing updates

 Can you dump the configuration of the ulimit on your servers?, some people
 had the same issues because they are reach the ulimit maximum defined for
 descriptor and process.

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:

  Hello All,
 
  I have the following set up of solr cloud.
 
  * solr version 4.3.1
  * 3 node solr cloud + replciation factor 2
  * 3 zoo keepers
  * load balancer in front of the 3 solr nodes
 
  I am seeing this strange behavior when I am indexing a large number of
  documents (10 mil). When I have more than 3-5 threads sending documents
 (in
  batch of 20) to solr, sometimes solr goes into a hung state. After this
 all
  the update requests get timed out. What we see via AppDynamics (a
  performance monitoring tool) is that there are a number of threads that
 are
  stalled. The stack trace for one of the threads is shown below.
 
  The cluster has to be restarted to recover from this. When I reduce the
  concurrency to 1, 2, 3 threads, then the indexing goes through smoothly.
  Any pointers as to what could be wrong here?
 
  We send the updates to one of the nodes in the solr cloud through a load
  balancer.
 
  Thanks
  Vinay
 
  Thread Name:qtp2141131052-78
  ID:78
  Time:Fri Jun 21 23:20:22 GMT 2013
  State:WAITING
  Priority:5
 
  sun.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.
  LockSupport.park(LockSupport.java:186)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
 
 org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Vinay Pothnis
I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.

On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman 
jhell...@innoventsolutions.com wrote:

 Vinay,

 What autoCommit settings do you have for your indexing process?

 Jason

 On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote:

  Here is the ulimit -a output:
 
   core file size   (blocks, -c)  0  data seg size
  (kbytes,
  -d)  unlimited  scheduling priority  (-e)  0  file size
 (blocks, -f)  unlimited  pending signals
  (-i)  179963  max locked memory(kbytes, -l)  64  max memory size
   (kbytes, -m)  unlimited  open files   (-n)
  32769  pipe size (512 bytes, -p)  8  POSIX message queues
 (bytes,
  -q)  819200  real-time priority   (-r)  0  stack size
  (kbytes, -s)  10240  cpu time(seconds, -t)  unlimited
  max
  user processes   (-u)  14  virtual memory
 (kbytes,
  -v)  unlimited  file locks   (-x)  unlimited
 
  On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:
 
  Hi,
 
  I have the same issue too, and the deploy is quasi exact like than mine,
 
 http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
 
  With some concurrence and batches of 10 solr apparently have some
 deadlock
  distributing updates
 
  Can you dump the configuration of the ulimit on your servers?, some
 people
  had the same issues because they are reach the ulimit maximum defined
 for
  descriptor and process.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:
 
  Hello All,
 
  I have the following set up of solr cloud.
 
  * solr version 4.3.1
  * 3 node solr cloud + replciation factor 2
  * 3 zoo keepers
  * load balancer in front of the 3 solr nodes
 
  I am seeing this strange behavior when I am indexing a large number of
  documents (10 mil). When I have more than 3-5 threads sending documents
  (in
  batch of 20) to solr, sometimes solr goes into a hung state. After this
  all
  the update requests get timed out. What we see via AppDynamics (a
  performance monitoring tool) is that there are a number of threads that
  are
  stalled. The stack trace for one of the threads is shown below.
 
  The cluster has to be restarted to recover from this. When I reduce the
  concurrency to 1, 2, 3 threads, then the indexing goes through
 smoothly.
  Any pointers as to what could be wrong here?
 
  We send the updates to one of the nodes in the solr cloud through a
 load
  balancer.
 
  Thanks
  Vinay
 
  Thread Name:qtp2141131052-78
  ID:78
  Time:Fri Jun 21 23:20:22 GMT 2013
  State:WAITING
  Priority:5
 
  sun.misc.Unsafe.park(Native Method)
  java.util.concurrent.locks.
  LockSupport.park(LockSupport.java:186)
 
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
 
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
  java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
 
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
 
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
 
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
 
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
 
 
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
 
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
 
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
 
 
 org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
 
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
 
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
 
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564

Re: [solr cloud 4.1] Issue with order in a batch of commands

2013-02-20 Thread Vinay Pothnis
Thanks for the reply.

In my case, the order is definitely critical. It would be great if this can
be fixed. And yes, even SolrJ deals with deletes first and then the
add/updates. And that was the reason why I switched from SolrJ to plain
http.
There is a ticket with SolrJ as well
https://issues.apache.org/jira/browse/SOLR-1162. Looks like it had some
traction and then dropped off.

I can work around this for the moment, but would definitely be great if
this can be fixed.

Do you want me to create a JIRA with Solr or would you be doing that?

Thanks
Vinay



On Wed, Feb 20, 2013 at 6:40 AM, Mark Miller markrmil...@gmail.com wrote:

 It's because of how we currently handle batched requests - we buffer a
 different number of deletes thqn we do adds and flush them separately -
 mainly because the size of each is likely to be so different, at one point
 we would buffer a lot more deletes.

 So currently, you want to break these up to multiple requests if order is
 critical.

 Might make a JIRA issue to look at this again - I think at this point we
 are buffering the same number of each any way - we should probably just
 treat them the same, and put them in the same buffer in the right order.

 Even then though, I don't think SolrJ update requests order deletes and
 adds in the same request either, so that would also need to be addressed.
 Pretty sure solrj will do the adds then the deletes.

 - Mark

 On Feb 19, 2013, at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com
 wrote:

  Hello,
 
  I have the following set up:
 
  * solr cloud 4.1.0
  * 2 shards with embedded zookeeper
  * plain http to communicate with solr
 
  I am testing a scenario where i am batching multiple commands and sending
  to solr. Since this is the solr cloud setup, I am always sending the
  updates to one of the nodes in the cloud.
 
  e.g.: http://localhost:8983/solr/sample/update
 
  *example set of commands:*
  {add: {doc:
  {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
  },add: {doc:
  {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
  },delete: { id: e.1.78 }}
 
  When I include deletes and updates in the batch, sometimes, the order of
  the commands is not maintained.
 
  Specifically, if the document does not belong to the shard that I am
  communicating with (lets say shard-1), then shard-1 sends the commands to
  shard-2. In this case, the deletes are sent first and then the
 updates.
  This changes the order that I originally sent.
 
  Any inputs on why the order is not maintained?
 
  Thanks!
  Vinay




[solr cloud 4.1] Issue with order in a batch of commands

2013-02-19 Thread Vinay Pothnis
Hello,

I have the following set up:

* solr cloud 4.1.0
* 2 shards with embedded zookeeper
* plain http to communicate with solr

I am testing a scenario where i am batching multiple commands and sending
to solr. Since this is the solr cloud setup, I am always sending the
updates to one of the nodes in the cloud.

e.g.: http://localhost:8983/solr/sample/update

*example set of commands:*
{add: {doc:
{field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
},add: {doc:
{field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
},delete: { id: e.1.78 }}

When I include deletes and updates in the batch, sometimes, the order of
the commands is not maintained.

Specifically, if the document does not belong to the shard that I am
communicating with (lets say shard-1), then shard-1 sends the commands to
shard-2. In this case, the deletes are sent first and then the updates.
This changes the order that I originally sent.

Any inputs on why the order is not maintained?

Thanks!
Vinay


Re: [solr cloud 4.1] Issue with order in a batch of commands

2013-02-19 Thread Vinay Pothnis
Thanks for the reply Eric.

* I am not using SolrJ
* I am using plain http (apache http client) to send a batch of commands.
* As I mentioned below, the json payload I am sending is like this (some of
the fields have been removed for brevity)
* POST http://localhost:8983/solr/sample/update
* POST BODY
  {add: {doc:
{field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
},add: {doc:
{field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
},delete: { id: e.1.78 }}

The evidence is from the logs on the 2 shards.

The following is the log on shard 1:
*INFO: [sample] webapp=/solr path=/update params={} {add=[e.1.80, e.1.80,
e.1.80, e.1.80, e.1.80, e.1.80, e.1.80],delete=[e.1.80]} 0 48*
*
*
The following is the log on shard 2:
*INFO: [sample] webapp=/solr path=/update
params={update.distrib=TOLEADERwt=javabinversion=2} {delete=[e.1.80
(-1427453640312881152)]} 0 2*
*Feb 19, 2013 6:04:34 PM
org.apache.solr.update.processor.LogUpdateProcessor finish*
*INFO: [sample] webapp=/solr path=/update params={distrib.from=
http://10.10.76.23:8983/solr/ch-madden/update.distrib=TOLEADERwt=javabinversion=2}
{add=[e.1*
*.80 (1427453640314978304), e.1.80 (1427453640338046976), e.1.80
(1427453640342241280), e.1.80 (1427453640346435584), e.1.80
(1427453640349581312), e.1.80 (14274*
*53640351678464), e.1.80 (1427453640353775616)]} 0 41*

As you can see, shard 2 gets the delete command first and then the
add/update commands.
I am sure I have waited until the commit happens. And besides, I am also
using the softAutoCommit at 1 second. So, the query results should be
updated quite quickly.

Any pointers would be very helpful.

Thanks!
Vinay


On Tue, Feb 19, 2013 at 5:57 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, this would surprise me unless the add and delete were going to
 separate machines. how are you sending them? SolrJ? and in a single
 server.add(doclist) format or with individual adds?

 Individual commands being sent can come 'round out of sequence, that's what
 the whole optimistic locking bit is about.

 I guess my other question is what's your evidence that this isn't working?
 Are you just querying your index and looking at the results? If so, are you
 sure you're waiting until after any autocommit intervals?

 Best
 Erick


 On Tue, Feb 19, 2013 at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com
 wrote:

  Hello,
 
  I have the following set up:
 
  * solr cloud 4.1.0
  * 2 shards with embedded zookeeper
  * plain http to communicate with solr
 
  I am testing a scenario where i am batching multiple commands and sending
  to solr. Since this is the solr cloud setup, I am always sending the
  updates to one of the nodes in the cloud.
 
  e.g.: http://localhost:8983/solr/sample/update
 
  *example set of commands:*
  {add: {doc:
  {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
  },add: {doc:
  {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
  },delete: { id: e.1.78 }}
 
  When I include deletes and updates in the batch, sometimes, the order of
  the commands is not maintained.
 
  Specifically, if the document does not belong to the shard that I am
  communicating with (lets say shard-1), then shard-1 sends the commands to
  shard-2. In this case, the deletes are sent first and then the
 updates.
  This changes the order that I originally sent.
 
  Any inputs on why the order is not maintained?
 
  Thanks!
  Vinay
 



Re: [solr cloud 4.1] Issue with order in a batch of commands

2013-02-19 Thread Vinay Pothnis
Also, I was referring to this wiki page:
http://wiki.apache.org/solr/UpdateJSON#Update_Commands

Thanks
Vinay


On Tue, Feb 19, 2013 at 6:12 PM, Vinay Pothnis vinay.poth...@gmail.comwrote:

 Thanks for the reply Eric.

 * I am not using SolrJ
 * I am using plain http (apache http client) to send a batch of commands.
 * As I mentioned below, the json payload I am sending is like this (some
 of the fields have been removed for brevity)
 * POST http://localhost:8983/solr/sample/update
 * POST BODY
   {add: {doc:
 {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
 },add: {doc:
 {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
 },delete: { id: e.1.78 }}

 The evidence is from the logs on the 2 shards.

 The following is the log on shard 1:
 *INFO: [sample] webapp=/solr path=/update params={} {add=[e.1.80, e.1.80,
 e.1.80, e.1.80, e.1.80, e.1.80, e.1.80],delete=[e.1.80]} 0 48*
 *
 *
 The following is the log on shard 2:
 *INFO: [sample] webapp=/solr path=/update
 params={update.distrib=TOLEADERwt=javabinversion=2} {delete=[e.1.80
 (-1427453640312881152)]} 0 2*
 *Feb 19, 2013 6:04:34 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish*
 *INFO: [sample] webapp=/solr path=/update params={distrib.from=
 http://10.10.76.23:8983/solr/ch-madden/update.distrib=TOLEADERwt=javabinversion=2}
 {add=[e.1*
 *.80 (1427453640314978304), e.1.80 (1427453640338046976), e.1.80
 (1427453640342241280), e.1.80 (1427453640346435584), e.1.80
 (1427453640349581312), e.1.80 (14274*
 *53640351678464), e.1.80 (1427453640353775616)]} 0 41*

 As you can see, shard 2 gets the delete command first and then the
 add/update commands.
 I am sure I have waited until the commit happens. And besides, I am also
 using the softAutoCommit at 1 second. So, the query results should be
 updated quite quickly.

 Any pointers would be very helpful.

 Thanks!
 Vinay


 On Tue, Feb 19, 2013 at 5:57 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Hmmm, this would surprise me unless the add and delete were going to
 separate machines. how are you sending them? SolrJ? and in a single
 server.add(doclist) format or with individual adds?

 Individual commands being sent can come 'round out of sequence, that's
 what
 the whole optimistic locking bit is about.

 I guess my other question is what's your evidence that this isn't working?
 Are you just querying your index and looking at the results? If so, are
 you
 sure you're waiting until after any autocommit intervals?

 Best
 Erick


 On Tue, Feb 19, 2013 at 2:23 PM, Vinay Pothnis vinay.poth...@gmail.com
 wrote:

  Hello,
 
  I have the following set up:
 
  * solr cloud 4.1.0
  * 2 shards with embedded zookeeper
  * plain http to communicate with solr
 
  I am testing a scenario where i am batching multiple commands and
 sending
  to solr. Since this is the solr cloud setup, I am always sending the
  updates to one of the nodes in the cloud.
 
  e.g.: http://localhost:8983/solr/sample/update
 
  *example set of commands:*
  {add: {doc:
  {field-1:1359591340025,field-2:1361301249330,doc_id:e.1.78}
  },add: {doc:
  {field-1:1360089709282,field-2:1361301249377,doc_id:e.1.78}
  },delete: { id: e.1.78 }}
 
  When I include deletes and updates in the batch, sometimes, the order of
  the commands is not maintained.
 
  Specifically, if the document does not belong to the shard that I am
  communicating with (lets say shard-1), then shard-1 sends the commands
 to
  shard-2. In this case, the deletes are sent first and then the
 updates.
  This changes the order that I originally sent.
 
  Any inputs on why the order is not maintained?
 
  Thanks!
  Vinay