Re: 4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-06 Thread Tim Vaillancourt
Some more info to provide:

-Replication almost never completes following the this IndexWriter is
closed stacktraces.
-When the replication begins after this IndexWriter is closed error, over
a few hours the replica eventually fills the disk to 100% with index files
under data/. There are so many files in the data directory it can't be
listed and takes a very long time to delete. It seems the frequent
replications are filling the disk with new files whose sum is roughly 3
times larger than the real index. Is it leaking filehandles or forgetting
it has downloaded something?

Is this a better question for the lucene list? It seems (see below) that
this stacktrace is occuring in the lucene layer vs solr, but maybe someone
could confirm?

ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199)
at
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
... chopped

Thanks!

Tim


On 5 February 2014 13:04, Tim Vaillancourt t...@elementspace.com wrote:

 Hey guys,

 I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
 shards over 4 Solr instances, (which results in 1 core per Solr instance).

 After some time in Production without issues, we are seeing errors related
 to the IndexWriter all over our logs and an infinite loop of failing
 replication from Leader on our 2 replicas.

 We see a flood of: org.apache.lucene.store.AlreadyClosedException: this
 IndexWriter is closed stacktraces, then the Solr replica tries to
 replicate/recover, then fails replication and then the following 2 errors
 show up:

 1) SolrIndexWriter was not closed prior to finalize(), indicates a bug --
 POSSIBLE RESOURCE LEAK!!!
 2) Error closing IndexWriter, trying rollback (which results in a
 null-pointer exception).

 I'm guessing the best way forward would be to upgrade to latest, but that
 is an undertaking that will take significant time/testing. In the meantime,
 is there anything I can do to mitigate or understand the issue more?

 Does anyone know what the IndexWriter errors refer to?

 Below is a URL to a .txt file with summarized portions of my solr.log. Any
 help is really appreciated as always!!

 http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt

 Thanks all,

 Tim



4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-05 Thread Tim Vaillancourt
Hey guys,

I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
shards over 4 Solr instances, (which results in 1 core per Solr instance).

After some time in Production without issues, we are seeing errors related
to the IndexWriter all over our logs and an infinite loop of failing
replication from Leader on our 2 replicas.

We see a flood of: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed stacktraces, then the Solr replica tries to
replicate/recover, then fails replication and then the following 2 errors
show up:

1) SolrIndexWriter was not closed prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!
2) Error closing IndexWriter, trying rollback (which results in a
null-pointer exception).

I'm guessing the best way forward would be to upgrade to latest, but that
is an undertaking that will take significant time/testing. In the meantime,
is there anything I can do to mitigate or understand the issue more?

Does anyone know what the IndexWriter errors refer to?

Below is a URL to a .txt file with summarized portions of my solr.log. Any
help is really appreciated as always!!

http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt

Thanks all,

Tim


Re: Perl Client for SolrCloud

2014-01-10 Thread Tim Vaillancourt
I'm pretty interested in taking a stab at a Perl CPAN for SolrCloud that 
is Zookeeper-aware; it's the least I can do for Solr as a non-Java 
developer. :)


A quick question though: how would I write the shard logic to behave 
similar to Java's Zookeeper-aware client? I'm able to get the hash/hex 
needed for each shard from clusterstate.json, but how do I know which 
field to hash on?


I'm guessing I also need to read the collection's schema.xml from 
Zookeeper to get uniqueKey, and then use that for sharding, or does the 
Java client take the sharding field as input? Looking for ideas here.


Thanks!

Tim

On 08/01/14 09:35 AM, Chris Hostetter wrote:

:  I couldn't find anyone which can connect to SolrCloud similar to SolrJ's
:  CloudSolrServer.
:
: Since I have a load balancer in front of 8 nodes, WebService::Solr[1] still
: works fine.

Right -- just because SolrJ is ZooKeeper aware doesn't mean you can *only*
talk to SolrCloud with SolrJ -- you can still use any HTTP client of your
choice to connect to your Solr nodes in a round robin fashion (or via a
load blancer) if you wish -- just like with a non SolrCloud deployment
using something like master/slave.

What you might want to consider, is taking a look at something like
Net::ZooKeeper to have a ZK aware perl client layer that could wrap
WebService::Solr.


-Hoss
http://www.lucidworks.com/


Re: Redis as Solr Cache

2014-01-02 Thread Tim Vaillancourt
This is a neat idea, but could be too close to lucene/etc.

You could jump up one level in the stack and use Redis/memcache as a
distributed HTTP cache in conjunction with Solr's HTTP caching and a proxy.
I tried doing this myself with Nginx, but I forgot what issue I hit - I
think misses needed logic outside of nginx but I didn't spend too much
time on it.

Tim


On 2 January 2014 07:51, Alexander Ramos Jardim 
alexander.ramos.jar...@gmail.com wrote:

 You touched an interesting point. I am really assuming if a quick win
 scenario is even possible. But what would be the advantage of using Redis
 to keep Solr Cache if each node would keep it's own Redis cache?


 2013/12/29 Upayavira u...@odoko.co.uk

  On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote:
   While researching for Solr Caching options and interesting cases, I
   bumped
   on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has
  any
   experience with this setup? Using Redis as Solr Cache.
  
   I see a lot of advantage in having a distributed cache for solr. One
 solr
   node benefiting from the cache generated on another one would be
   beautiful.
  
   I see problems too. Performance wise, I don't know if it would be
 viable
   for Solr to write it's cache through the network on Redis Master node.
  
   And what about if I have Solr nodes with different index version
 looking
   at
   the same cache?
  
   IMO as long as Redis is useful, if it isn't to have a distributed
 cache,
   I
   think it's not possible to get better performance using it.
 
  This idea makes assumptions about how a Solr/Lucene index operates.
  Certainly, in a SolrCloud setup, each node is responsible for its own
  committing, and its caches exist for the timespan between commits. Thus,
  the cache one node will need will not necessarily be the same as the one
  that is needed by another node, which might have a commit interval
  slightly out of sync with the first.
 
  So, whilst this may be possible, and may give some benefits, I'd reckon
  that it would be a rather substantial engineering exercise, rather than
  the quick win you seem to be assuming it might be.
 
  Upayavira
 



 --
 Alexander Ramos Jardim



Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
Very good point. I've seen this issue occur once before when I was playing
with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
news - we are just behind.

For anyone that is curious, on my earlier mention that
Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
Zookeeper has no issues taking set/creates to clusterstate.json (or any
znode), just this one node seemed to stay stuck as state: active while it
was very inconsistent for reasons unknown, potentially just bugs.

The good news is this will be resolved today with a create/destroy of the
bad replica.

Thanks all!

Tim


On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote:

 Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

 - Mark

 On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote:

  Hey all,
 
  Now that I am getting correct results with distrib=false, I've
 identified that 1 of my nodes has just 1/3rd of the total data set and
 totally explains the flapping in results. The fix for this is obvious
 (rebuild replica) but the cause is less obvious.
 
  There is definately more than one issue going on with this SolrCloud
 (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
 /clusterstate.json doesn't seem to get updated when nodes are brought
 down/up is the reason why this replica remained in the distributed request
 chain without recovering/re-replicating from leader.
 
  I imagine my Zookeeper ensemble is having some problems unrelated to
 Solr that is the real root cause.
 
  Thanks!
 
  Tim
 
  On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
  Chris, this is extremely helpful and it's silly I didn't think of this
 sooner! Thanks a lot, this makes the situation make much more sense.
 
  I will gather some proper data with your suggestion and get back to the
 thread shortly.
 
  Thanks!!
 
  Tim
 
  On 04/12/13 02:57 PM, Chris Hostetter wrote:
  :
  : I may be incorrect here, but I assumed when querying a single core
 of a
  : SolrCloud collection, the SolrCloud routing is bypassed and I am
 talking
  : directly to a plain/non-SolrCloud core.
 
  No ... every query received from a client by solr is handled by a
 single
  core -- if that core knows it's part of a SolrCloud collection then it
  will do a distributed search across a random replica from each shard in
  that collection.
 
  If you want to bypass the distribute search logic, you have to say so
  explicitly...
 
  To ask an arbitrary replica to only search itself add distrib=false
 to
  the request.
 
  Alternatively: you can ask that only certain shard names (or certain
  explicit replicas) be included in a distribute request..
 
  https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
  -Hoss
  http://www.lucidworks.com/




Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
I spoke too soon, my plan for fixing this didn't quite work.

I've moved this issue into a new thread/topic: No /clusterstate.json
updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE.

Thanks all for the help on this one!

Tim


On 5 December 2013 11:37, Tim Vaillancourt t...@elementspace.com wrote:

 Very good point. I've seen this issue occur once before when I was playing
 with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
 news - we are just behind.

 For anyone that is curious, on my earlier mention that
 Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
 Zookeeper has no issues taking set/creates to clusterstate.json (or any
 znode), just this one node seemed to stay stuck as state: active while it
 was very inconsistent for reasons unknown, potentially just bugs.

 The good news is this will be resolved today with a create/destroy of the
 bad replica.

 Thanks all!

 Tim


 On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote:

 Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

 - Mark

 On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Hey all,
 
  Now that I am getting correct results with distrib=false, I've
 identified that 1 of my nodes has just 1/3rd of the total data set and
 totally explains the flapping in results. The fix for this is obvious
 (rebuild replica) but the cause is less obvious.
 
  There is definately more than one issue going on with this SolrCloud
 (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
 /clusterstate.json doesn't seem to get updated when nodes are brought
 down/up is the reason why this replica remained in the distributed request
 chain without recovering/re-replicating from leader.
 
  I imagine my Zookeeper ensemble is having some problems unrelated to
 Solr that is the real root cause.
 
  Thanks!
 
  Tim
 
  On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
  Chris, this is extremely helpful and it's silly I didn't think of this
 sooner! Thanks a lot, this makes the situation make much more sense.
 
  I will gather some proper data with your suggestion and get back to
 the thread shortly.
 
  Thanks!!
 
  Tim
 
  On 04/12/13 02:57 PM, Chris Hostetter wrote:
  :
  : I may be incorrect here, but I assumed when querying a single core
 of a
  : SolrCloud collection, the SolrCloud routing is bypassed and I am
 talking
  : directly to a plain/non-SolrCloud core.
 
  No ... every query received from a client by solr is handled by a
 single
  core -- if that core knows it's part of a SolrCloud collection then it
  will do a distributed search across a random replica from each shard
 in
  that collection.
 
  If you want to bypass the distribute search logic, you have to say so
  explicitly...
 
  To ask an arbitrary replica to only search itself add distrib=false
 to
  the request.
 
  Alternatively: you can ask that only certain shard names (or certain
  explicit replicas) be included in a distribute request..
 
  https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
  -Hoss
  http://www.lucidworks.com/





No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE

2013-12-05 Thread Tim Vaillancourt
Hey guys,

I've been having an issue with 1 of my 4 replicas having an inconsistent
replica, and have been trying to fix it. At the core of this issue, I've
noticed /clusterstate.json doesn't seem to be receiving updates when cores
get unhealthy, or even added/removed.

Today I decided I would remove the bad replica from the SolrCloud and
force a sync of a new clean replica, so I ran a
'/admin/cores?command=UNLOADname=name' to drop it. After this, on the
instance with the bad replica, the core was removed from solr.xml but
strangely NOT the /clusterstate.json in Zookeeper - it remained in
Zookeeper unchanged, still with state: active :(.

So, I then manually edited the clusterstate.json with a Perl script,
removing the json data for the bad replica. I checked all nodes saw the
change themselves, things looked good. Then I brought the node up/down to
check that it was properly adding/removing itself from /live_nodes znode in
Zookeeper. That all worked perfectly, too.

Here is the really odd part: when I created a new replica on this node (to
replace the bad replica), the core was created on the node, and NO update
was made to /clusterstate.json. At this point this node had no cores, no
cores with state in /clusterstate.json, and all data dirs deleted, so this
is quite confusing.

Upon checking ACLs on /clusterstate.json, it is world/anyone accessible:

[zk: localhost:2181(CONNECTED) 18] getAcl /clusterstate.json
'world,'anyone
: cdrwa

Also, keep in mind my external Perl script had no issue updating
/clusterstate.json. Can anyone make any suggestions why /clusterstate.json
isn't getting updated when I create this new core?

One other thing I checked was the health of the Zookeeper ensemble, and all
3 Zookeepers have the same mZxid, ctime, mtime, etc for /clusterstate.json
and receive updates no problem, just this node isn't updating Zookeeper
somehow.

Any thoughts are much appreciated!

Thanks!

Tim


Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of the 
SolrCloud collection on each instance, only to notice the same problem - 
the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of a 
SolrCloud collection, the SolrCloud routing is bypassed and I am talking 
directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are no 
deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core directly?


An interesting observation is when I do an /admin/cores call to see the 
docCount of the core's index, it does not fluctuate, only the query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining state: active in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core directly?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourtt...@elementspace.com
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining state: active in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core directly?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey all,

Now that I am getting correct results with distrib=false, I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Tim Vaillancourt
I (jokingly) propose we take it a step further and drop Java :)! I'm 
getting tired of trying to scale GC'ing JVMs!


Tim

On 25/10/13 09:02 AM, Mark Miller wrote:

Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer 
consider itself a webapp and will consider the fact that Jetty is a used an 
implementation detail.

We won’t necessarily make it impossible to use a different container, but the 
project won’t condone it or support it and may do some things that assume 
Jetty. Solr is taking over this layer in 5.0.

- Mark

On Oct 25, 2013, at 11:18 AM, Cassandra Targettcasstarg...@gmail.com  wrote:


In terms of adding or fixing documentation, the Installing Solr page
(https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
includes a yellow box that says:

Solr ships with a working Jetty server, with optimized settings for
Solr, inside the example directory. It is recommended that you use the
provided Jetty server for optimal performance. If you absolutely must
use a different servlet container then continue to the next section on
how to install Solr.

So, it's stated, but maybe not in a way that makes it clear to most
users. And maybe it needs to be repeated in another section.
Suggestions?

I did find this page,
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
which pretty much contradicts the previous text. I'll fix that now.

Other recommendations for where doc could be more clear are welcome.

On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourtt...@elementspace.com  wrote:

Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Guptaans...@anshumgupta.net  wrote:


Thought you may want to have a look at this:

https://issues.apache.org/jira/browse/SOLR-4792

P.S: There are no timelines for 5.0 for now, but it's the future
nevertheless.



On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourtt...@elementspace.com

wrote:
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE recommended

container

for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim




--

Anshum Gupta
http://www.anshumgupta.net



Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE recommended container
for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim


Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta ans...@anshumgupta.net wrote:

 Thought you may want to have a look at this:

 https://issues.apache.org/jira/browse/SOLR-4792

 P.S: There are no timelines for 5.0 for now, but it's the future
 nevertheless.



 On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt t...@elementspace.com
 wrote:

  I agree with Jonathan (and Shawn on the Jetty explanation), I think the
  docs should make this a bit more clear - I notice many people choosing
  Tomcat and then learning these details after, possibly regretting it.
 
  I'd be glad to modify the docs but I want to be careful how it is worded.
  Is it fair to go as far as saying Jetty is 100% THE recommended
 container
  for Solr, or should a recommendation be avoided, and maybe just a list of
  pros/cons?
 
  Cheers,
 
  Tim
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt

Thanks Yonik,

Does cache=false apply to all caches? The docs make it sound like it 
is for filterCache only, but I could be misunderstanding.


When I force a commit and perform a /select a query many times with 
cache=false, I notice my query gets cached still, my guess is in the 
queryResultCache. At first the query takes 500ms+, then all subsequent 
requests take 0-1ms. I'll confirm this queryResultCache assumption today.


Cheers,

Tim

On 16/10/13 06:33 PM, Yonik Seeley wrote:

On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com  wrote:

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the cache=false param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Yeah, cache=false for q or fq should already not use the cache at
all (read or write).

-Yonik


Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt


  
  
Awesome, this make a lot of sense now. Thanks a lot guys.

Currently the only mention of this setting in the docs is under
filterQuery on the "SolrCaching" page as:

" Solr3.4 Adding the
localParam flag of {!cache=false} to a query will prevent
the filterCache from being consulted for that query. "

I will update the docs sometime soon to reflect that this can apply
to any query (q or fq).

Cheers,

Tim

On 17/10/13 01:44 PM, Chris Hostetter wrote:

  

: Does "cache=false" apply to all caches? The docs make it sound like it is for
: filterCache only, but I could be misunderstanding.

it's per *query* -- not per cache, or per request...

 /select?q={!cache=true}foofq={!cache=false}barfq={!cache=true}baz

...should cause 1 lookup/insert in the filterCache (baz) and 1 
lookup/insert into the queryResultCache (for the main query with it's 
associated filters  pagination)



-Hoss


  



Skipping caches on a /select

2013-10-16 Thread Tim Vaillancourt
Hey guys,

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the cache=false param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Is there a way to do this (without disabling my caches in solrconfig.xml),
or is this feature request?

Thanks!

Tim Vaillancourt


Re: SolrCloud on SSL

2013-10-16 Thread Tim Vaillancourt
Not important, but I'm also curious why you would want SSL on Solr (adds
overhead, complexity, harder-to-troubleshoot, etc)?

To avoid the overhead, could you put Solr on a separate VLAN (with ACLs to
client servers)?

Cheers,

Tim


On 12 October 2013 17:30, Shawn Heisey s...@elyograg.org wrote:

 On 10/11/2013 9:38 AM, Christopher Gross wrote:
  On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org
 wrote:
 
  On 10/11/2013 8:17 AM, Christopher Gross wrote: 
  Is there a spot in a Solr configuration that I can set this up to use
  HTTPS?
 
  From what I can tell, not yet.
 
  https://issues.apache.org/jira/browse/SOLR-3854
  https://issues.apache.org/jira/browse/SOLR-4407
  https://issues.apache.org/jira/browse/SOLR-4470
 
 
  Dang.

 Christopher,

 I was just looking through Solr source code for a completely different
 issue, and it seems that there *IS* a way to do this in your configuration.

 If you were to use https://hostname; or https://ipaddress; as the
 host parameter in your solr.xml file on each machine, it should do
 what you want.  The parameter is described here, but not the behavior
 that I have discovered:

 http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

 Boring details: In the org.apache.solr.cloud package, there is a
 ZkController class.  The getHostAddress method is where I discovered
 that you can do this.

 If you could try this out and confirm that it works, I will get the wiki
 page updated and look into the Solr reference guide as well.

 Thanks,
 Shawn




fq caching question

2013-10-14 Thread Tim Vaillancourt

Hey guys,

Sorry for such a simple question, but I am curious as to the differences 
in caching between a combined filter query, and many separate filter 
queries.


Here are 2 example queries, one with combined fq, one separate:

1) /select?q=*:*fq=type:bidfq=user_id:3
2) /select?q=*:*fq=(type:bid%20AND%20user_id:3)

For query #1: am I correct that the first query will keep 2 independent 
entries in the filterCache for type:bid and user_id:3?\
For query #2: is it correct that the 2nd query will keep 1 entry in the 
filterCache that satisfies all conditions?


Lastly, is it a fair statement that under general query patterns, many 
separate filter queries are more-cacheable than 1 combined one? Eg, if I 
performed query #2 (in the filterCache) and then changed the user_id, 
nothing about my new query is cache able, correct (but if I used 2 
separate filter queries than 1 of 2 is still cached)?


Cheers,

Tim Vaillancourt


Re: fq caching question

2013-10-14 Thread Tim Vaillancourt

Thanks Koji!

Cheers,

Tim

On 14/10/13 03:56 PM, Koji Sekiguchi wrote:

Hi Tim,

(13/10/15 5:22), Tim Vaillancourt wrote:

Hey guys,

Sorry for such a simple question, but I am curious as to the 
differences in caching between a

combined filter query, and many separate filter queries.

Here are 2 example queries, one with combined fq, one separate:

1) /select?q=*:*fq=type:bidfq=user_id:3
2) /select?q=*:*fq=(type:bid%20AND%20user_id:3)

For query #1: am I correct that the first query will keep 2 
independent entries in the filterCache

for type:bid and user_id:3?\


Correct.

For query #2: is it correct that the 2nd query will keep 1 entry in 
the filterCache that satisfies

all conditions?


Correct.

Lastly, is it a fair statement that under general query patterns, 
many separate filter queries are
more-cacheable than 1 combined one? Eg, if I performed query #2 (in 
the filterCache) and then
changed the user_id, nothing about my new query is cache able, 
correct (but if I used 2 separate

filter queries than 1 of 2 is still cached)?


Yes, it is.

koji


Re: {soft}Commit and cache flusing

2013-10-09 Thread Tim Vaillancourt
Apologies all. I think the suggestion that I was replying to get noticed
is what erked me, otherwise I would have moved on. I'll follow this advice.

Cheers,

Tim


On 9 October 2013 05:20, Erick Erickson erickerick...@gmail.com wrote:

 Tim:

 I think you're mis-interpreting. By replying to a post with the subject:

 {soft}Commit and cache flushing

 but going in a different direction, it's easy for people to think I'm
 not interested in that
 thread, I'll ignore it, thereby missing the fact that you're asking a
 somewhat different
 question that they might have information about. It's not about whether
 you're
 doing anything particularly wrong with the question. It's about making
 it easy for
 people to help.

 See http://people.apache.org/~hossman/#threadhijack

 Best,
 Erick

 On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
  I have a genuine question with substance here. If anything this
  nonconstructive, rude response was to get noticed. Thanks for
  contributing to the discussion.
 
  Tim
 
 
  On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote:
 
  Tim,
  I suggest you open a new thread and not reply to this one to get
 noticed.
  Dmitry
 
 
  On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com
  wrote:
 
   Is there a way to make autoCommit only commit if there are pending
  changes,
   ie: if there are 0 adds pending commit, don't autoCommit
 (open-a-searcher
   and wipe the caches)?
  
   Cheers,
  
   Tim
  
  
   On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:
  
right. We've got the autoHard commit configured only atm. The
   soft-commits
are controlled on the client. It was just easier to implement the
 first
version of our internal commit policy that will commit to all solr
instances at once. This is where we have noticed the reported
 behavior.
   
   
On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu
   wrote:
   
 if there are no modifications to an index and a softCommit or
   hardCommit
 issued, then solr flushes the cache.


 Indeed. The easiest way to work around this is by disabling auto
   commits
 and only commit when you have to.

   
  
 



Re: solr cpu usage

2013-10-08 Thread Tim Vaillancourt
Yes, you've saved us all lots of time with this article. I'm about to do
the same for the old Jetty or Tomcat? container question ;).

Tim


On 7 October 2013 18:55, Erick Erickson erickerick...@gmail.com wrote:

 Tim:

 Thanks! Mostly I wrote it to have something official looking to hide
 behind when I didn't have a good answer to the hardware sizing question
 :).

 On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
  Fantastic article!
 
  Tim
 
 
  On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote:
 
  From my perspective, your question is almost impossible to
  answer, there are too many variables. See:
 
 
 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
 
  Best,
  Erick
 
  On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
  otis.gospodne...@gmail.com wrote:
   Hi,
  
   More CPU cores means more concurrency.  This is good if you need to
  handle
   high query rates.
  
   Faster cores mean lower query latency, assuming you are not
 bottlenecked
  by
   memory or disk IO or network IO.
  
   So what is ideal for you depends on your concurrency and latency
 needs.
  
   Otis
   Solr  ElasticSearch Support
   http://sematext.com/
   On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote:
  
   hi
   We're building a spec for a machine to purchase.
   We're going to buy 10 machines.
   we aren't sure yet how many proccesses we will run per machine.
   the question is  -should we buy faster cpu with less cores or slower
 cpu
   with more cores?
   in any case we will have 2 cpus in each machine.
   should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?
  
   what will we gain by having many cores?
  
   what kinds of usages would make cpu be the bottleneck?
  
  
  
  
   --
   View this message in context:
   http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 



Re: {soft}Commit and cache flusing

2013-10-08 Thread Tim Vaillancourt
I have a genuine question with substance here. If anything this
nonconstructive, rude response was to get noticed. Thanks for
contributing to the discussion.

Tim


On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote:

 Tim,
 I suggest you open a new thread and not reply to this one to get noticed.
 Dmitry


 On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Is there a way to make autoCommit only commit if there are pending
 changes,
  ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
  and wipe the caches)?
 
  Cheers,
 
  Tim
 
 
  On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:
 
   right. We've got the autoHard commit configured only atm. The
  soft-commits
   are controlled on the client. It was just easier to implement the first
   version of our internal commit policy that will commit to all solr
   instances at once. This is where we have noticed the reported behavior.
  
  
   On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu
  wrote:
  
if there are no modifications to an index and a softCommit or
  hardCommit
issued, then solr flushes the cache.
   
   
Indeed. The easiest way to work around this is by disabling auto
  commits
and only commit when you have to.
   
  
 



Re: {soft}Commit and cache flusing

2013-10-07 Thread Tim Vaillancourt
Is there a way to make autoCommit only commit if there are pending changes,
ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
and wipe the caches)?

Cheers,

Tim


On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:

 right. We've got the autoHard commit configured only atm. The soft-commits
 are controlled on the client. It was just easier to implement the first
 version of our internal commit policy that will commit to all solr
 instances at once. This is where we have noticed the reported behavior.


 On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote:

  if there are no modifications to an index and a softCommit or hardCommit
  issued, then solr flushes the cache.
 
 
  Indeed. The easiest way to work around this is by disabling auto commits
  and only commit when you have to.
 



Re: solr cpu usage

2013-10-07 Thread Tim Vaillancourt
Fantastic article!

Tim


On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote:

 From my perspective, your question is almost impossible to
 answer, there are too many variables. See:

 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

 Best,
 Erick

 On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
  Hi,
 
  More CPU cores means more concurrency.  This is good if you need to
 handle
  high query rates.
 
  Faster cores mean lower query latency, assuming you are not bottlenecked
 by
  memory or disk IO or network IO.
 
  So what is ideal for you depends on your concurrency and latency needs.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote:
 
  hi
  We're building a spec for a machine to purchase.
  We're going to buy 10 machines.
  we aren't sure yet how many proccesses we will run per machine.
  the question is  -should we buy faster cpu with less cores or slower cpu
  with more cores?
  in any case we will have 2 cpus in each machine.
  should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?
 
  what will we gain by having many cores?
 
  what kinds of usages would make cpu be the bottleneck?
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: App server?

2013-10-02 Thread Tim Vaillancourt
Jetty should be sufficient, and is the more-common container for Solr.
Also, Solr tests are written for Jetty.

Lastly, I'd argue Jetty is just-as enterprise as Tomcat. Google App
Engine (running lots of enterprise), is Jetty-based, for example.

Cheers,

Tim


On 2 October 2013 15:44, Mark static.void@gmail.com wrote:

 Is Jetty sufficient for running Solr or should I go with something a
 little more enterprise like tomcat?

 Any others?


Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
)

  at


org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)

  at


org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)

  at


org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)

  at


org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)

  at


org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)

  at


org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)

  at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)

  at


org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)

  at


org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)

  at


org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

  at org.eclipse.jetty.server.Server.handle(Server.java:445)
  at

org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)

  at


org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)

  at


org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)

  at


org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)

  at


org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)

  at java.lang.Thread.run(Thread.java:724)

On your live_nodes question, I don't have historical data on this from

when

the crash occurred, which I guess is what you're looking for. I could

add

this to our monitoring for future tests, however. I'd be glad to

continue

further testing, but I think first more monitoring is needed to

understand

this further. Could we come up with a list of metrics that would be

useful

to see following another test and successful crash?

Metrics needed:

1) # of live_nodes.
2) Full stack traces.
3) CPU used by Solr's JVM specifically (instead of system-wide).
4) Solr's JVM thread count (already done)
5) ?

Cheers,

Tim Vaillancourt


On 6 September 2013 13:11, Mark Millermarkrmil...@gmail.com  wrote:


Did you ever get to index that long before without hitting the

deadlock?

There really isn't anything negative the patch could be introducing,

other

than allowing for some more threads to possibly run at once. If I had

to

guess, I would say its likely this patch fixes the deadlock issue and

your

seeing another issue - which looks like the system cannot keep up

with

the

requests or something for some reason - perhaps due to some OS

networking

settings or something (more guessing). Connection refused happens

generally

when there is nothing listening on the port.

Do you see anything interesting change with the rest of the system?

CPU

usage spikes or something like that?

Clamping down further on the overall number of threads night help

(which

would require making something configurable). How many nodes are

listed in

zk under live_nodes?

Mark

Sent from my iPhone

On Sep 6, 2013, at 12:02 PM, Tim Vaillancourtt...@elementspace.com
wrote:


Hey guys,

(copy of my post to SOLR-5216)

We tested this patch and unfortunately encountered some serious

issues a

few hours of 500 update-batches/sec. Our update batch is 10 docs, so

we

are

writing about 5000 docs/sec total, using autoCommit to commit the

updates

(no explicit commits).

Our environment:

   Solr 4.3.1 w/SOLR-5216 patch.
   Jetty 9, Java 1.7.
   3 solr instances, 1 per physical server.
   1 collection.
   3 shards.
   2 replicas (each instance is a leader and a replica).
   Soft autoCommit is 1000ms.
   Hard autoCommit is 15000ms.

After about 6 hours of stress-testing this patch, we see many of

these

stalled transactions (below), and the Solr instances start to see

each

other as down, flooding our Solr logs with Connection Refused

exceptions,

and otherwise no obviously-useful logs that I could see.

I did notice some stalled transactions on both /select and /update,
however. This never occurred without this patch.

Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC
Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9

Lastly, I have a summary of the ERROR-severity logs from this

24-hour

soak.

My script normalizes the ERROR-severity stack traces and returns

them

in

order of occurrence.

Summary of my solr.log: http://pastebin.com/pBdMAWeb

Thanks!

Tim Vaillancourt


On 6 September 2013 07:27, Markus Jelsma

markus.jel...@openindex.io

wrote:

Thanks!

-Original message-

From:Erick Ericksonerickerick...@gmail.com
Sent: Friday 6th September 2013 16:20
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud 4.x hangs under high update volume

Markus:

See: https://issues.apache.org/jira/browse/SOLR-5216


On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma
markus.jel...@openindex.iowrote:


Hi Mark,

Got an issue to watch?

Thanks,
Markus

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
Lol, at breaking during a demo - always the way it is! :) I agree, we are
just tip-toeing around the issue, but waiting for 4.5 is definitely an
option if we get-by for now in testing; patched Solr versions seem to
make people uneasy sometimes :).

Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up
worse due to less limitations on thread), I'm guessing only SOLR-5232 and
SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a
world of difference!

Thanks so much again guys!

Tim



On 12 September 2013 03:43, Erick Erickson erickerick...@gmail.com wrote:

 Fewer client threads updating makes sense, and going to 1 core also seems
 like it might help. But it's all a crap-shoot unless the underlying cause
 gets fixed up. Both would improve things, but you'll still hit the problem
 sometime, probably when doing a demo for your boss ;).

 Adrien has branched the code for SOLR 4.5 in preparation for a release
 candidate tentatively scheduled for next week. You might just start working
 with that branch if you can rather than apply individual patches...

 I suspect there'll be a couple more changes to this code (looks like
 Shikhar already raised an issue for instance) before 4.5 is finally cut...

 FWIW,
 Erick



 On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Thanks Erick!
 
  Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
  patch. I think that is a very, very useful patch by the way. SOLR-5232
  seems promising as well.
 
  I see your point on the more-shards idea, this is obviously a
  global/instance-level lock. If I really had to, I suppose I could run
 more
  Solr instances to reduce locking then? Currently I have 2 cores per
  instance and I could go 1-to-1 to simplify things.
 
  The good news is we seem to be more stable since changing to a bigger
  client-solr batch-size and fewer client threads updating.
 
  Cheers,
 
  Tim
 
  On 11/09/13 04:19 AM, Erick Erickson wrote:
 
  If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
  copy of the 4x branch. By recent, I mean like today, it looks like
 Mark
  applied this early this morning. But several reports indicate that this
  will
  solve your problem.
 
  I would expect that increasing the number of shards would make the
 problem
  worse, not
  better.
 
  There's also SOLR-5232...
 
  Best
  Erick
 
 
  On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourttim@elementspace.
 **comt...@elementspace.com
  wrote:
 
   Hey guys,
 
  Based on my understanding of the problem we are encountering, I feel
  we've
  been able to reduce the likelihood of this issue by making the
 following
  changes to our app's usage of SolrCloud:
 
  1) We increased our document batch size to 200 from 10 - our app
 batches
  updates to reduce HTTP requests/overhead. The theory is increasing the
  batch size reduces the likelihood of this issue happening.
  2) We reduced to 1 application node sending updates to SolrCloud - we
  write
  Solr updates to Redis, and have previously had 4 application nodes
  pushing
  the updates to Solr (popping off the Redis queue). Reducing the number
 of
  nodes pushing to Solr reduces the concurrency on SolrCloud.
  3) Less threads pushing to SolrCloud - due to the increase in batch
 size,
  we were able to go down to 5 update threads on the update-pushing-app
  (from
  10 threads).
 
  To be clear the above only reduces the likelihood of the issue
 happening,
  and DOES NOT actually resolve the issue at hand.
 
  If we happen to encounter issues with the above 3 changes, the next
 steps
  (I could use some advice on) are:
 
  1) Increase the number of shards (2x) - the theory here is this reduces
  the
  locking on shards because there are more shards. Am I onto something
  here,
  or will this not help at all?
  2) Use CloudSolrServer - currently we have a plain-old least-connection
  HTTP VIP. If we go direct to what we need to update, this will reduce
  concurrency in SolrCloud a bit. Thoughts?
 
  Thanks all!
 
  Cheers,
 
  Tim
 
 
  On 6 September 2013 14:47, Tim Vaillancourttim@elementspace.**com
 t...@elementspace.com
   wrote:
 
   Enjoy your trip, Mark! Thanks again for the help!
 
  Tim
 
 
  On 6 September 2013 14:18, Mark Millermarkrmil...@gmail.com  wrote:
 
   Okay, thanks, useful info. Getting on a plane, but ill look more at
  this
  soon. That 10k thread spike is good to know - that's no good and
 could
  easily be part of the problem. We want to keep that from happening.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 6, 2013, at 2:05 PM, Tim Vaillancourttim@elementspace.**com
 t...@elementspace.com
  
  wrote:
 
   Hey Mark,
 
  The farthest we've made it at the same batch size/volume was 12
 hours
  without this patch, but that isn't consistent. Sometimes we would
 only
 
  get
 
  to 6 hours or less.
 
  During the crash I can see an amazing spike in threads to 10k which
 is
  essentially our ulimit

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
That makes sense, thanks Erick and Mark for you help! :)

I'll see if I can find a place to assist with the testing of SOLR-5232.

Cheers,

Tim



On 12 September 2013 11:16, Mark Miller markrmil...@gmail.com wrote:

 Right, I don't see SOLR-5232 making 4.5 unfortunately. It could perhaps
 make a 4.5.1 - it does resolve a critical issue - but 4.5 is in motion and
 SOLR-5232 is not quite ready - we need some testing.

 - Mark

 On Sep 12, 2013, at 2:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  My take on it is this, assuming I'm reading this right:
  1 SOLR-5216 - probably not going anywhere, 5232 will take care of it.
  2 SOLR-5232 - expected to fix the underlying issue no matter whether
  you're using CloudSolrServer from SolrJ or sending lots of updates from
  lots of clients.
  3 SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the
  meantime.
 
  I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it
  hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0
  is looking like it'll be ready to cut next week so it might not be
 included.
 
  Best,
  Erick
 
 
  On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
 
  Lol, at breaking during a demo - always the way it is! :) I agree, we
 are
  just tip-toeing around the issue, but waiting for 4.5 is definitely an
  option if we get-by for now in testing; patched Solr versions seem to
  make people uneasy sometimes :).
 
  Seeing there seems to be some danger to SOLR-5216 (in some ways it
 blows up
  worse due to less limitations on thread), I'm guessing only SOLR-5232
 and
  SOLR-4816 are making it into 4.5? I feel those 2 in combination will
 make a
  world of difference!
 
  Thanks so much again guys!
 
  Tim
 
 
 
  On 12 September 2013 03:43, Erick Erickson erickerick...@gmail.com
  wrote:
 
  Fewer client threads updating makes sense, and going to 1 core also
 seems
  like it might help. But it's all a crap-shoot unless the underlying
 cause
  gets fixed up. Both would improve things, but you'll still hit the
  problem
  sometime, probably when doing a demo for your boss ;).
 
  Adrien has branched the code for SOLR 4.5 in preparation for a release
  candidate tentatively scheduled for next week. You might just start
  working
  with that branch if you can rather than apply individual patches...
 
  I suspect there'll be a couple more changes to this code (looks like
  Shikhar already raised an issue for instance) before 4.5 is finally
  cut...
 
  FWIW,
  Erick
 
 
 
  On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt 
 t...@elementspace.com
  wrote:
 
  Thanks Erick!
 
  Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
  patch. I think that is a very, very useful patch by the way. SOLR-5232
  seems promising as well.
 
  I see your point on the more-shards idea, this is obviously a
  global/instance-level lock. If I really had to, I suppose I could run
  more
  Solr instances to reduce locking then? Currently I have 2 cores per
  instance and I could go 1-to-1 to simplify things.
 
  The good news is we seem to be more stable since changing to a bigger
  client-solr batch-size and fewer client threads updating.
 
  Cheers,
 
  Tim
 
  On 11/09/13 04:19 AM, Erick Erickson wrote:
 
  If you use CloudSolrServer, you need to apply SOLR-4816 or use a
  recent
  copy of the 4x branch. By recent, I mean like today, it looks like
  Mark
  applied this early this morning. But several reports indicate that
  this
  will
  solve your problem.
 
  I would expect that increasing the number of shards would make the
  problem
  worse, not
  better.
 
  There's also SOLR-5232...
 
  Best
  Erick
 
 
  On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourttim@elementspace.
  **comt...@elementspace.com
  wrote:
 
  Hey guys,
 
  Based on my understanding of the problem we are encountering, I feel
  we've
  been able to reduce the likelihood of this issue by making the
  following
  changes to our app's usage of SolrCloud:
 
  1) We increased our document batch size to 200 from 10 - our app
  batches
  updates to reduce HTTP requests/overhead. The theory is increasing
  the
  batch size reduces the likelihood of this issue happening.
  2) We reduced to 1 application node sending updates to SolrCloud -
 we
  write
  Solr updates to Redis, and have previously had 4 application nodes
  pushing
  the updates to Solr (popping off the Redis queue). Reducing the
  number
  of
  nodes pushing to Solr reduces the concurrency on SolrCloud.
  3) Less threads pushing to SolrCloud - due to the increase in batch
  size,
  we were able to go down to 5 update threads on the
 update-pushing-app
  (from
  10 threads).
 
  To be clear the above only reduces the likelihood of the issue
  happening,
  and DOES NOT actually resolve the issue at hand.
 
  If we happen to encounter issues with the above 3 changes, the next
  steps
  (I could use some advice on) are:
 
  1) Increase

Re: SolrCloud 4.x hangs under high update volume

2013-09-10 Thread Tim Vaillancourt
Hey guys,

Based on my understanding of the problem we are encountering, I feel we've
been able to reduce the likelihood of this issue by making the following
changes to our app's usage of SolrCloud:

1) We increased our document batch size to 200 from 10 - our app batches
updates to reduce HTTP requests/overhead. The theory is increasing the
batch size reduces the likelihood of this issue happening.
2) We reduced to 1 application node sending updates to SolrCloud - we write
Solr updates to Redis, and have previously had 4 application nodes pushing
the updates to Solr (popping off the Redis queue). Reducing the number of
nodes pushing to Solr reduces the concurrency on SolrCloud.
3) Less threads pushing to SolrCloud - due to the increase in batch size,
we were able to go down to 5 update threads on the update-pushing-app (from
10 threads).

To be clear the above only reduces the likelihood of the issue happening,
and DOES NOT actually resolve the issue at hand.

If we happen to encounter issues with the above 3 changes, the next steps
(I could use some advice on) are:

1) Increase the number of shards (2x) - the theory here is this reduces the
locking on shards because there are more shards. Am I onto something here,
or will this not help at all?
2) Use CloudSolrServer - currently we have a plain-old least-connection
HTTP VIP. If we go direct to what we need to update, this will reduce
concurrency in SolrCloud a bit. Thoughts?

Thanks all!

Cheers,

Tim


On 6 September 2013 14:47, Tim Vaillancourt t...@elementspace.com wrote:

 Enjoy your trip, Mark! Thanks again for the help!

 Tim


 On 6 September 2013 14:18, Mark Miller markrmil...@gmail.com wrote:

 Okay, thanks, useful info. Getting on a plane, but ill look more at this
 soon. That 10k thread spike is good to know - that's no good and could
 easily be part of the problem. We want to keep that from happening.

 Mark

 Sent from my iPhone

 On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Hey Mark,
 
  The farthest we've made it at the same batch size/volume was 12 hours
  without this patch, but that isn't consistent. Sometimes we would only
 get
  to 6 hours or less.
 
  During the crash I can see an amazing spike in threads to 10k which is
  essentially our ulimit for the JVM, but I strangely see no OutOfMemory:
  cannot open native thread errors that always follow this. Weird!
 
  We also notice a spike in CPU around the crash. The instability caused
 some
  shard recovery/replication though, so that CPU may be a symptom of the
  replication, or is possibly the root cause. The CPU spikes from about
  20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
 while
  spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons,
 whole
  index is in 128GB RAM, 6xRAID10 15k).
 
  More on resources: our disk I/O seemed to spike about 2x during the
 crash
  (about 1300kbps written to 3500kbps), but this may have been the
  replication, or ERROR logging (we generally log nothing due to
  WARN-severity unless something breaks).
 
  Lastly, I found this stack trace occurring frequently, and have no idea
  what it is (may be useful or not):
 
  java.lang.IllegalStateException :
   at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
   at org.eclipse.jetty.server.Response.sendError(Response.java:325)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
   at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
   at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
   at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
   at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
   at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
   at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
   at
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
   at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
   at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
   at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
   at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
   at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
   at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:445

Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Hey guys,

(copy of my post to SOLR-5216)

We tested this patch and unfortunately encountered some serious issues a
few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are
writing about 5000 docs/sec total, using autoCommit to commit the updates
(no explicit commits).

Our environment:

Solr 4.3.1 w/SOLR-5216 patch.
Jetty 9, Java 1.7.
3 solr instances, 1 per physical server.
1 collection.
3 shards.
2 replicas (each instance is a leader and a replica).
Soft autoCommit is 1000ms.
Hard autoCommit is 15000ms.

After about 6 hours of stress-testing this patch, we see many of these
stalled transactions (below), and the Solr instances start to see each
other as down, flooding our Solr logs with Connection Refused exceptions,
and otherwise no obviously-useful logs that I could see.

I did notice some stalled transactions on both /select and /update,
however. This never occurred without this patch.

Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC
Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9

Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak.
My script normalizes the ERROR-severity stack traces and returns them in
order of occurrence.

Summary of my solr.log: http://pastebin.com/pBdMAWeb

Thanks!

Tim Vaillancourt


On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote:

 Thanks!

 -Original message-
  From:Erick Erickson erickerick...@gmail.com
  Sent: Friday 6th September 2013 16:20
  To: solr-user@lucene.apache.org
  Subject: Re: SolrCloud 4.x hangs under high update volume
 
  Markus:
 
  See: https://issues.apache.org/jira/browse/SOLR-5216
 
 
  On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
   Hi Mark,
  
   Got an issue to watch?
  
   Thanks,
   Markus
  
   -Original message-
From:Mark Miller markrmil...@gmail.com
Sent: Wednesday 4th September 2013 16:55
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud 4.x hangs under high update volume
   
I'm going to try and fix the root cause for 4.5 - I've suspected
 what it
   is since early this year, but it's never personally been an issue, so
 it's
   rolled along for a long time.
   
Mark
   
Sent from my iPhone
   
On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
   wrote:
   
 Hey guys,

 I am looking into an issue we've been having with SolrCloud since
 the
 beginning of our testing, all the way from 4.1 to 4.3 (haven't
 tested
   4.4.0
 yet). I've noticed other users with this same issue, so I'd really
   like to
 get to the bottom of it.

 Under a very, very high rate of updates (2000+/sec), after 1-12
 hours
   we
 see stalled transactions that snowball to consume all Jetty
 threads in
   the
 JVM. This eventually causes the JVM to hang with most threads
 waiting
   on
 the condition/stack provided at the bottom of this message. At this
   point
 SolrCloud instances then start to see their neighbors (who also
 have
   all
 threads hung) as down w/Connection Refused, and the shards become
   down
 in state. Sometimes a node or two survives and just returns 503s
 no
   server
 hosting shard errors.

 As a workaround/experiment, we have tuned the number of threads
 sending
 updates to Solr, as well as the batch size (we batch updates from
   client -
 solr), and the Soft/Hard autoCommits, all to no avail. Turning off
 Client-to-Solr batching (1 update = 1 call to Solr), which also
 did not
 help. Certain combinations of update threads and batch sizes seem
 to
 mask/help the problem, but not resolve it entirely.

 Our current environment is the following:
 - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
 - 3 x Zookeeper instances, external Java 7 JVM.
 - 1 collection, 3 shards, 2 replicas (each node is a leader of 1
 shard
   and
 a replica of 1 shard).
 - Log4j 1.2 for Solr logs, set to WARN. This log has no movement
 on a
   good
 day.
 - 5000 max jetty threads (well above what we use when we are
 healthy),
 Linux-user threads ulimit is 6000.
 - Occurs under Jetty 8 or 9 (many versions).
 - Occurs under Java 1.6 or 1.7 (several minor versions).
 - Occurs under several JVM tunings.
 - Everything seems to point to Solr itself, and not a Jetty or Java
   version
 (I hope I'm wrong).

 The stack trace that is holding up all my Jetty QTP threads is the
 following, which seems to be waiting on a lock that I would very
 much
   like
 to understand further:

 java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0007216e68d8 (a
 java.util.concurrent.Semaphore$NonfairSync)
at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186

Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Hey Mark,

The farthest we've made it at the same batch size/volume was 12 hours
without this patch, but that isn't consistent. Sometimes we would only get
to 6 hours or less.

During the crash I can see an amazing spike in threads to 10k which is
essentially our ulimit for the JVM, but I strangely see no OutOfMemory:
cannot open native thread errors that always follow this. Weird!

We also notice a spike in CPU around the crash. The instability caused some
shard recovery/replication though, so that CPU may be a symptom of the
replication, or is possibly the root cause. The CPU spikes from about
20-30% utilization (system + user) to 60% fairly sharply, so the CPU, while
spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons, whole
index is in 128GB RAM, 6xRAID10 15k).

More on resources: our disk I/O seemed to spike about 2x during the crash
(about 1300kbps written to 3500kbps), but this may have been the
replication, or ERROR logging (we generally log nothing due to
WARN-severity unless something breaks).

Lastly, I found this stack trace occurring frequently, and have no idea
what it is (may be useful or not):

java.lang.IllegalStateException :
  at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
  at org.eclipse.jetty.server.Response.sendError(Response.java:325)
  at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
  at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
  at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
  at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
  at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
  at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
  at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
  at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
  at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.eclipse.jetty.server.Server.handle(Server.java:445)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
  at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
  at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
  at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
  at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
  at java.lang.Thread.run(Thread.java:724)

On your live_nodes question, I don't have historical data on this from when
the crash occurred, which I guess is what you're looking for. I could add
this to our monitoring for future tests, however. I'd be glad to continue
further testing, but I think first more monitoring is needed to understand
this further. Could we come up with a list of metrics that would be useful
to see following another test and successful crash?

Metrics needed:

1) # of live_nodes.
2) Full stack traces.
3) CPU used by Solr's JVM specifically (instead of system-wide).
4) Solr's JVM thread count (already done)
5) ?

Cheers,

Tim Vaillancourt


On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote:

 Did you ever get to index that long before without hitting the deadlock?

 There really isn't anything negative the patch could be introducing, other
 than allowing for some more threads to possibly run at once. If I had to
 guess, I would say its likely this patch fixes the deadlock issue and your
 seeing another issue - which looks like the system cannot keep up with the
 requests or something for some reason - perhaps due to some OS networking
 settings or something (more guessing). Connection refused happens generally
 when there is nothing listening on the port.

 Do you see anything interesting change with the rest of the system? CPU
 usage spikes or something like that?

 Clamping down further on the overall number of threads night help (which
 would require making something configurable). How many nodes are listed in
 zk under live_nodes?

 Mark

 Sent from my iPhone

 On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Hey guys

Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Enjoy your trip, Mark! Thanks again for the help!

Tim

On 6 September 2013 14:18, Mark Miller markrmil...@gmail.com wrote:

 Okay, thanks, useful info. Getting on a plane, but ill look more at this
 soon. That 10k thread spike is good to know - that's no good and could
 easily be part of the problem. We want to keep that from happening.

 Mark

 Sent from my iPhone

 On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt t...@elementspace.com wrote:

  Hey Mark,
 
  The farthest we've made it at the same batch size/volume was 12 hours
  without this patch, but that isn't consistent. Sometimes we would only
 get
  to 6 hours or less.
 
  During the crash I can see an amazing spike in threads to 10k which is
  essentially our ulimit for the JVM, but I strangely see no OutOfMemory:
  cannot open native thread errors that always follow this. Weird!
 
  We also notice a spike in CPU around the crash. The instability caused
 some
  shard recovery/replication though, so that CPU may be a symptom of the
  replication, or is possibly the root cause. The CPU spikes from about
  20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
 while
  spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons,
 whole
  index is in 128GB RAM, 6xRAID10 15k).
 
  More on resources: our disk I/O seemed to spike about 2x during the crash
  (about 1300kbps written to 3500kbps), but this may have been the
  replication, or ERROR logging (we generally log nothing due to
  WARN-severity unless something breaks).
 
  Lastly, I found this stack trace occurring frequently, and have no idea
  what it is (may be useful or not):
 
  java.lang.IllegalStateException :
   at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
   at org.eclipse.jetty.server.Response.sendError(Response.java:325)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
   at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
   at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
   at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
   at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
   at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
   at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
   at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
   at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
   at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
   at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
   at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
   at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
   at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
   at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:445)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
   at
 
 org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
   at
 
 org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
   at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
   at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
   at java.lang.Thread.run(Thread.java:724)
 
  On your live_nodes question, I don't have historical data on this from
 when
  the crash occurred, which I guess is what you're looking for. I could add
  this to our monitoring for future tests, however. I'd be glad to continue
  further testing, but I think first more monitoring is needed to
 understand
  this further. Could we come up with a list of metrics that would be
 useful
  to see following another test and successful crash?
 
  Metrics needed:
 
  1) # of live_nodes.
  2) Full stack traces.
  3) CPU used by Solr's JVM specifically (instead of system-wide).
  4) Solr's JVM thread count (already done)
  5) ?
 
  Cheers,
 
  Tim Vaillancourt
 
 
  On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote:
 
  Did you ever get to index that long before without hitting the deadlock?
 
  There really isn't anything negative the patch could be introducing,
 other
  than allowing for some more threads to possibly run at once. If I had to
  guess, I would say its likely this patch fixes the deadlock issue and
 your
  seeing another issue - which looks like the system

Re: solrcloud shards backup/restoration

2013-09-06 Thread Tim Vaillancourt
I wouldn't say I love this idea, but wouldn't it be safe to LVM snapshot
the Solr index? I think this may even work on a live server, depending on
some file I/O details. Has anyone tried this?

An in-Solr solution sounds more elegant, but considering the tlog concern
Shalin mentioned, I think this may work as an interim solution.

Cheers!

Tim


On 6 September 2013 15:41, Aditya Sakhuja aditya.sakh...@gmail.com wrote:

 Thanks Shalin and Mark for your responses. I am on the same page about the
 conventions for taking the backup. However, I am less sure about the
 restoration of the index. Lets say we have 3 shards across 3 solrcloud
 servers.

 1. I am assuming we should take a backup from each of the shard leaders to
 get a complete collection. do you think that will get the complete index (
 not worrying about what is not hard committed at the time of backup ). ?

 2. How do we go about restoring the index in a fresh solrcloud cluster ?
 From the structure of the snapshot I took, I did not see any
 replication.properties or index.properties  which I see normally on a
 healthy solrcloud cluster nodes.
 if I have the snapshot named snapshot.20130905 does the snapshot.20130905/*
 go into data/index ?

 Thanks
 Aditya



 On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote:

  Phone typing. The end should not say don't hard commit - it should say
  do a hard commit and take a snapshot.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote:
 
   I don't know that it's too bad though - its always been the case that
 if
  you do a backup while indexing, it's just going to get up to the last
 hard
  commit. With SolrCloud that will still be the case. So just make sure you
  do a hard commit right before taking the backup - yes, it might miss a
 few
  docs in the tran log, but if you are taking a back up while indexing, you
  don't have great precision in any case - you will roughly get a snapshot
  for around that time - even without SolrCloud, if you are worried about
  precision and getting every update into that backup, you want to stop
  indexing and commit first. But if you just want a rough snapshot for
 around
  that time, in both cases you can still just don't hard commit and take a
  snapshot.
  
   Mark
  
   Sent from my iPhone
  
   On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
  
   The replication handler's backup command was built for pre-SolrCloud.
   It takes a snapshot of the index but it is unaware of the transaction
   log which is a key component in SolrCloud. Hence unless you stop
   updates, commit your changes and then take a backup, you will likely
   miss some updates.
  
   That being said, I'm curious to see how peer sync behaves when you try
   to restore from a snapshot. When you say that you haven't been
   successful in restoring, what exactly is the behaviour you observed?
  
   On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja 
  aditya.sakh...@gmail.com wrote:
   Hello,
  
   I was looking for a good backup / recovery solution for the solrcloud
   indexes. I am more looking for restoring the indexes from the index
   snapshot, which can be taken using the replicationHandler's backup
  command.
  
   I am looking for something that works with solrcloud 4.3 eventually,
  but
   still relevant if you tested with a previous version.
  
   I haven't been successful in have the restored index replicate across
  the
   new replicas, after I restart all the nodes, with one node having the
   restored index.
  
   Is restoring the indexes on all the nodes the best way to do it ?
   --
   Regards,
   -Aditya Sakhuja
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
 



 --
 Regards,
 -Aditya Sakhuja



Re: SolrCloud 4.x hangs under high update volume

2013-09-05 Thread Tim Vaillancourt
Update: It is a bit too soon to tell, but about 6 hours into testing there
are no crashes with this patch. :)

We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard
cluster I mentioned above. 5000 updates per second total.

More tomorrow after a 24 hr soak!

Tim

On Wednesday, 4 September 2013, Tim Vaillancourt wrote:

 Thanks so much for the explanation Mark, I owe you one (many)!

 We have this on our high TPS cluster and will run it through it's paces
 tomorrow. I'll provide any feedback I can, more soon! :D

 Cheers,

 Tim



Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt

Hey Alejandro,

I guess it means what you call more than one instance.

The request handlers are at the core-level, and not the Solr 
instance/global level, and within each of those cores you could have one 
or more data import handlers.


Most setups have 1 DIH per core at the handler location /dataimport, 
but I believe you could have several, ie: /dataimport2, /dataimport3 
if you had different DIH configs for each handler.


Within a single data import handler, you can have several entities, 
which are what explain to the DIH processes how to get/index the data. 
What you can do here is have several entities that construct your index, 
and execute those entities with several separate HTTP calls to the DIH, 
thus creating more than one instance of the DIH process within 1 core 
and 1 DIH handler.


ie:

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=suppliers; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=parts; 

curl 
http://localhost:8983/solr/core1/dataimport?command=full-importentity=companies; 



http://wiki.apache.org/solr/DataImportHandler#Commands

Cheers,

Tim

On 03/09/13 09:25 AM, Alejandro Calbazana wrote:

Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro



Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim



On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote:

 There is an issue if I remember right, but I can't find it right now.

 If anyone that has the problem could try this patch, that would be very
 helpful: http://pastebin.com/raw.php?i=aaRWwSGP

 - Mark


 On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io
 wrote:

  Hi Mark,
 
  Got an issue to watch?
 
  Thanks,
  Markus
 
  -Original message-
   From:Mark Miller markrmil...@gmail.com
   Sent: Wednesday 4th September 2013 16:55
   To: solr-user@lucene.apache.org
   Subject: Re: SolrCloud 4.x hangs under high update volume
  
   I'm going to try and fix the root cause for 4.5 - I've suspected what
 it
  is since early this year, but it's never personally been an issue, so
 it's
  rolled along for a long time.
  
   Mark
  
   Sent from my iPhone
  
   On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com
  wrote:
  
Hey guys,
   
I am looking into an issue we've been having with SolrCloud since the
beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
  4.4.0
yet). I've noticed other users with this same issue, so I'd really
  like to
get to the bottom of it.
   
Under a very, very high rate of updates (2000+/sec), after 1-12 hours
  we
see stalled transactions that snowball to consume all Jetty threads
 in
  the
JVM. This eventually causes the JVM to hang with most threads waiting
  on
the condition/stack provided at the bottom of this message. At this
  point
SolrCloud instances then start to see their neighbors (who also have
  all
threads hung) as down w/Connection Refused, and the shards become
  down
in state. Sometimes a node or two survives and just returns 503s no
  server
hosting shard errors.
   
As a workaround/experiment, we have tuned the number of threads
 sending
updates to Solr, as well as the batch size (we batch updates from
  client -
solr), and the Soft/Hard autoCommits, all to no avail. Turning off
Client-to-Solr batching (1 update = 1 call to Solr), which also did
 not
help. Certain combinations of update threads and batch sizes seem to
mask/help the problem, but not resolve it entirely.
   
Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1
 shard
  and
a replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
  good
day.
- 5000 max jetty threads (well above what we use when we are
 healthy),
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java
  version
(I hope I'm wrong).
   
The stack trace that is holding up all my Jetty QTP threads is the
following, which seems to be waiting on a lock that I would very much
  like
to understand further:
   
java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007216e68d8 (a
java.util.concurrent.Semaphore$NonfairSync)
   at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
   at
   
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
   at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
   at
   
 
 org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
   at
   
 
 org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
   at
   
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks so much for the explanation Mark, I owe you one (many)!

We have this on our high TPS cluster and will run it through it's paces
tomorrow. I'll provide any feedback I can, more soon! :D

Cheers,

Tim


SolrCloud 4.x hangs under high update volume

2013-09-03 Thread Tim Vaillancourt
)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:724)

Some questions I had were:
1) What exclusive locks does SolrCloud make when performing an update?
2) Keeping in mind I do not read or write java (sorry :D), could someone
help me understand what solr is locking in this case at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
when performing an update? That will help me understand where to look next.
3) It seems all threads in this state are waiting for 0x0007216e68d8,
is there a way to tell what 0x0007216e68d8 is?
4) Is there a limit to how many updates you can do in SolrCloud?
5) Wild-ass-theory: would more shards provide more locks (whatever they
are) on update, and thus more update throughput?

To those interested, I've provided a stacktrace of 1 of 3 nodes at this URL
in gzipped form:
https://s3.amazonaws.com/timvaillancourt.com/tmp/solr-jstack-2013-08-23.gz

Any help/suggestions/ideas on this issue, big or small, would be much
appreciated.

Thanks so much all!

Tim Vaillancourt


Re: Sharing SolrCloud collection configs w/overrides

2013-09-01 Thread Tim Vaillancourt

Here you go Erick, feel free to update this.

I am unable to assign to you, but asked for someone to do so:

https://issues.apache.org/jira/browse/SOLR-5208

Cheers,

Tim

On 21/08/13 10:40 AM, Tim Vaillancourt wrote:
Well, the mention of DIH is a bit off-topic. I'll simplify and say all 
I need is the ability to set ANY variables in solrconfig.xml without 
having to make N number of copies of the same configuration to achieve 
that. Essentially I need 10+ collections to use the exact same config 
dir in Zookeeper with minor/trivial differences set in variables.


Your proposal of taking in values at core creation-time is a neat one 
and would be a very flexible solution for a lot of use cases. My only 
concern for my really-specific use cae is that I'd be setting DB 
user/passwords via plain-text HTTP calls, but having this feature is 
better than not.


In a perfect world I'd like to be able to include files in Zookeeper 
(like XInclude) that are outside the common config dir (eg: 
'/configs/sharedconfig') all the collections would be sharing. On the 
other hand, that sort of solution would open up the Zookeeper layout 
to arbitrary files and could end up in a nightmare if not done 
carefully, however.


Would it be possible for Solr to support specifying multiple configs 
at collection creation, that are merged or concatenated. This idea 
sounds terrible to me even at this moment, but I wonder if there is 
something in there..


Tim


Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Tim Vaillancourt
Well, the mention of DIH is a bit off-topic. I'll simplify and say all I
need is the ability to set ANY variables in solrconfig.xml without having
to make N number of copies of the same configuration to achieve that.
Essentially I need 10+ collections to use the exact same config dir in
Zookeeper with minor/trivial differences set in variables.

Your proposal of taking in values at core creation-time is a neat one and
would be a very flexible solution for a lot of use cases. My only concern
for my really-specific use cae is that I'd be setting DB user/passwords via
plain-text HTTP calls, but having this feature is better than not.

In a perfect world I'd like to be able to include files in Zookeeper (like
XInclude) that are outside the common config dir (eg:
'/configs/sharedconfig') all the collections would be sharing. On the other
hand, that sort of solution would open up the Zookeeper layout to arbitrary
files and could end up in a nightmare if not done carefully, however.

Would it be possible for Solr to support specifying multiple configs at
collection creation, that are merged or concatenated. This idea sounds
terrible to me even at this moment, but I wonder if there is something in
there..

Tim


Sharing SolrCloud collection configs w/overrides

2013-08-20 Thread Tim Vaillancourt
Hey guys,

I have a situation where I have a lot of collections that share the same
core config in Zookeeper. For each of my SolrCloud collections, 99.9% of
the config (schema.xml, solrcloud.xml) are the same, only the
DataImportHandler parameters are different for different database
names/credentials, per collection.

To provide the different DIH credentials per collection, I currently upload
many copies of the exact-same Solr config dir with 1 Xincluded file with
the 4-5 database parameters that are different alongside the schema.xml and
solrconfig.xml.

I don't feel this ideal and is wasting space in Zookeeper considering most
of my configs are duplicated.

At a high level, is there a way for me to share one config in Zookeeper
while having minor overrides to the variables?

Is there a way for me to XInclude a file outside of my Zookeeper config
dir, ie: could I XInclude arbitrary locations in Zookeeper so that I can
have the same config dir for all collections and a file in Zookeeper that
is external to the common config dir to apply the collection-specific
overrides?

To extend my question for Solr 4.4 core.properties files: am I stuck in the
same boat under Solr 4.4 if I have say 10 collections sharing one config,
but I want each to have a unique core.properties?

Cheers!

Tim


Re: Problems installing Solr4 in Jetty9

2013-08-17 Thread Tim Vaillancourt

Try adding 'ext' to your OPTIONS= line for Jetty.

Tim

On 16/08/13 05:04 AM, Dmitry Kan wrote:

Hi,

I have the following jar in jetty/lib/ext:

log4j-1.2.16.jar
slf4j-api-1.6.6.jar
slf4j-log4j12-1.6.6.jar
jcl-over-slf4j-1.6.6.jar
jul-to-slf4j-1.6.6.jar

do you?

Dmitry


On Thu, Aug 8, 2013 at 12:49 PM, Spadezjames_will...@hotmail.com  wrote:


Apparently this is the error:

2013-08-08 09:35:19.994:WARN:oejw.WebAppContext:main: Failed startup of
context
o.e.j.w.WebAppContext@64a20878
{/solr,file:/tmp/jetty-0.0.0.0-8080-solr.war-_solr-any-/webapp/,STARTING}{/solr.war}
org.apache.solr.common.SolrException: Could not find necessary SLF4j
logging
jars. If using Jetty, the SLF4j logging jars need to go in the jetty
lib/ext
directory. For other containers, the corresponding directory should be
used.
For more information, see: http://wiki.apache.org/solr/SolrLogging



--
View this message in context:
http://lucene.472066.n3.nabble.com/Problems-installing-Solr4-in-Jetty9-tp4083209p4083224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud Load Balancer weight

2013-08-15 Thread Tim Vaillancourt

Soon ended up being a while :), feel free to add any thoughts.

https://issues.apache.org/jira/browse/SOLR-5166

Tim

On 07/06/13 03:07 PM, Vaillancourt, Tim wrote:

Cool!

Having those values influenced by stats is a neat idea too. I'll get on that 
soon.

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, June 03, 2013 5:07 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Load Balancer weight


On Jun 3, 2013, at 3:33 PM, Tim Vaillancourtt...@elementspace.com  wrote:


Should I JIRA this? Thoughts?

Yeah - it's always been in the back of my mind - it's come up a few times - 
eventually we would like nodes to report some stats to zk to influence load 
balancing.

- mark


Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-11 Thread Tim Vaillancourt
Another option is defining the location of these jars in your 
solrconfig.xml and storing the libraries external to jetty, which has 
some advantages.


Eg: MySQL connector is located at '/opt/mysql_connector' and adding this 
to your solrconfig.xml alongside the other lib entities:


   lib dir=/opt/mysql_connector/ regex=.*\.jar /

Cheers,

Tim

On 06/08/13 08:02 AM, Spadez wrote:

Thank you very much



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Internal shard communication - performance?

2013-08-11 Thread Tim Vaillancourt
For me the biggest deal with increased chatter between SolrCloud is 
object creation and GCs.


The resulting CPU load from the increase GCing seems to affect 
performance for me in some load tests, but I'm still trying to gather 
hard numbers on it.


Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:

On 8/7/2013 2:45 PM, Torsten Albrecht wrote:

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth 
zookeeper will be a virtual machine.


For true HA with zookepeer, you need at least three instances on 
separate physical hardware.  If you want to use VMs, that would be 
fine, but you must ensure that you aren't running more than one 
instance on the same physical server.


For best results, use an odd number of ZK instances.  With three ZK 
instances, one can go down and everything still works.  With five, two 
can go down and everything still works.


If you've got a fully switched network that's at least gigabit speed, 
then the network latency involved in internal communication shouldn't 
really matter.


Thanks,
Shawn



Re: debian package for solr with jetty

2013-08-02 Thread Tim Vaillancourt

Hey guys,

It is by no means perfect or pretty, but I use this script below to 
build Solr into a .deb package that installs Solr to /opt/solr-VERSION 
with 'example' and 'docs' removed, and a symlink to /opt/solr. When 
building, the script wget's the tgz, builds it in a tmpdir within the 
cwd and makes a .deb.


There is no container included or anything, so this essentially builds a 
library-style package of Solr to be included by other packages, so it's 
probably not entirely what people are looking for here, but here goes:


solr-dpkg.sh:
#!/bin/bash

set -e

VERSION=$1
if test -z ${VERSION};
then
  echo Usage: $0 [SOLR VERSION]
  exit 1
fi

NAME=solr
MIRROR_BASE=http://apache.mirror.iweb.ca;
PREFIX=/opt
PNAME=solr_${VERSION}
BUILD_BASE=$$
BUILD_DIR=${BUILD_BASE}/${PNAME}
START_DIR=${PWD}

# Clean build dir:
if test -e ${BUILD_DIR};
then
  rm -rf ${BUILD_DIR}
fi

# Wget solr:
SOLR_TAR=solr-${VERSION}.tgz
if test ! -e ${SOLR_TAR};
then
  wget -N ${MIRROR_BASE}/lucene/solr/${VERSION}/${SOLR_TAR}
fi

# Debian metadata:
mkdir -p ${BUILD_DIR} ${BUILD_DIR}/DEBIAN
cat EOF ${BUILD_DIR}/DEBIAN/control
Package: solr
Priority: extra
Maintainer: Tim Vaillancourt t...@timvaillancourt.com
Section: libs
Homepage: http://lucene.apache.org/solr/
Version: ${VERSION}
Description: Apache Solr ${VERSION}
Architecture: all
EOF

# Unpack solr in correct location:
mkdir -p ${BUILD_DIR}${PREFIX}
tar xfz ${SOLR_TAR} -C ${BUILD_DIR}${PREFIX}
rm -rf ${BUILD_DIR}${PREFIX}/solr-${VERSION}/{docs,example}
ln -s ${PREFIX}/solr-${VERSION} ${BUILD_DIR}${PREFIX}/solr

# Package and cleanup after:
cd ${BUILD_BASE}
dpkg-deb -b ${PNAME}  \
  mv ${PNAME}.deb ${START_DIR}/${PNAME}.deb
cd ${START_DIR}
rm -rf ${BUILD_BASE}

exit 0


Usage example: ./solr-dpkg.sh 4.4.0

In my setup I have other packages pointing to this package's path as a 
library with solr, jetty and the 'instance-package' separated. These 
packages depend on the version of the solr 'library package' built by 
this script.


Enjoy!

Tim

On 01/08/13 08:14 PM, Yago Riveiro wrote:

Some time ago a found this 
https://github.com/LucidWorks/solr-fabric/blob/master/solr-fabric-guide.md , 
Instead of puppet or chef (I don't know if it is a requirement) it is developed 
with fabric.

--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, August 2, 2013 at 3:32 AM, Alexandre Rafalovitch wrote:


Well, it is one of the requests with a couple of vote on the Solr Usability
Contest:
https://solrstart.uservoice.com/forums/216001-usability-contest/suggestions/4249809-puppet-chef-configuration-to-automatically-setup-s


So, if somebody with the knowledge of those tools could review the space
and figure out what the state of the art for this is, it would be great. If
somebody could identify the gap and fill in, it would be awesome. :-)

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)


On Thu, Aug 1, 2013 at 10:25 PM, Michael Della Bitta
michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com)  
wrote:


There should be at least a good Chef recipe, since Chef uses Solr
internally. I'm not using anything of theirs, since we've thus far been a
Tomcat shop. If nothing exists, I should whip something up.
On Aug 1, 2013 3:06 PM, Alexandre Rafalovitcharafa...@gmail.com 
(mailto:arafa...@gmail.com)
wrote:


And are there good chef/puppet/etc rules for the public use? I could not
find when I looked.

Regards,
Alex

On 1 Aug 2013 11:32, Michael Della Bitta
michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com)  
wrote:


Hi Manasi,

We use Chef for this type of thing here at my current job. Have you
considered something like it?

Other ones to look at are Puppet, CFEngine, Salt, and Ansible.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062 | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinionshttps://twitter.com/Appinions  | g+:
plus.google.com/appinions (http://plus.google.com/appinions)






https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts




w: appinions.comhttp://www.appinions.com/


On Wed, Jul 31, 2013 at 8:10 PM, smanadsma...@gmail.com 
(mailto:sma...@gmail.com)  wrote:


Hi,

I am trying to create a debian package for solr 4.3 (default

installation

with jetty).
Is there anything already available?

Also, I need 3 different cores so plan to create corresponding

packages

for

each of them to create solr core using admin/cores or collections







api.


I also want to use, solrcloud setup with external zookeeper ensemble,

whats

the best way to create a debian package for updating zookeeper config


files

as well?

Please suggest. Any pointers will be helpful.

Thanks,
-Manasi

Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks for the reply Erick,

Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true

15sec hard commit was sort of a guess, I could try a smaller number. 
When you say getting too large what limit do you think it would be 
hitting: a ulimit (nofiles), disk space, number of changes, a limit in 
Solr itself?


By my math there would be 15 tlogs max per core, but I don't really know 
how it all works if someone could fill me in/point me somewhere.


Cheers,

Tim

On 27/07/13 07:57 AM, Erick Erickson wrote:

What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.

FWIW,
Erick

On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourtt...@elementspace.com  
wrote:

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data and Zookeeper
data and start over from scratch with a brand new collection and zero'd out
logs.

The test is mostly high volume (2000-4000 updates/sec) and at the start the
SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs at
all. Then that stack trace occurs on all 3 nodes (staggered), I immediately
get some replica down messages and then some cannot connect errors to all
other cluster nodes, who have all crashed the same way. The tlog error could
be a symptom of the problem of running out of threads perhaps.

Shawn: thanks so much for sharing those details! Yes, they seem to be nice
servers, for sure - I don't get to touch/see them but they're fast! I'll
look into firmwares for sure and will try again after updating them. These
Solr instances are not-bare metal and are actually KVM VMs so that's another
layer to look into, although it is consistent between the two clusters.

I am not currently increasing the 'nofiles' ulimit to above default like you
are, but does Solr use 10,000+ file handles? It won't hurt to try it I guess
:). To rule out Java 7, I'll probably also try Jetty 8 and Java 1.6 as an
experiment as well.

Thanks!

Tim


On 25/07/13 05:55 PM, Yonik Seeley wrote:

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourtt...@elementspace.com
wrote:

ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)


That itself isn't necessarily a problem (and why it says non fatal)
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com


Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks Jack/Erick,

I don't know if this is true or not, but I've read there is a tlog per 
soft commit, which is then truncated by the hard commit. If this were 
true, a 15sec hard-commit with a 1sec soft-commit could generate around 
15~ tlogs, but I've never checked. I like Erick's scenario more if it is 
1 tlog/core though. I'll try to find out some more.



Another two test/things I really should try for sanity are:
- Java 1.6 and Jetty 8: just to rule things out (wouldn't actually 
launch this way).

- ulimit for 'nofiles': the default is pretty high but why not?
- Monitor size + # of tlogs.


I'll be sure to share findings and really appreciate the help guys!


PS: This is asking a lot, but if anyone can take a look at that thread 
dump, or give me some pointers on what to look for in a 
stall/thread-pile up thread dump like this, I would really appreciate 
it. I'm quite weak at deciphering those (I use Thread Dump Analyzer) but 
I'm sure it would tell a lot.



Cheers,


Tim


On 27/07/13 02:24 PM, Erick Erickson wrote:

Tim:

15 seconds isn't unreasonable, I was mostly wondering if it was hours.

Take a look at the size of the tlogs as you're indexing, you should see them
truncate every 15 seconds or so. There'll be a varying number of tlogs kept
around, although under heavy indexing I'd only expect 1 or 2 inactive ones,
the internal number is that there'll be enough tlogs kept around to
hold 100 docs.

There should only be 1 open tlog/core as I understand it. When a commit
happens (hard, openSearcher = true or false doesn't matter) the current
tlog is closed and a new one opened. Then some cleanup happens so there
are only enough tlogs kept around to hold 100 docs.

Strange, Im kind of out of ideas.
Erick

On Sat, Jul 27, 2013 at 4:41 PM, Jack Krupanskyj...@basetechnology.com  wrote:

No hard numbers, but the general guidance is that you should set your hard
commit interval to match your expectations for how quickly nodes should come
up if they need to be restarted. Specifically, a hard commit assures that
all changes have been committed to disk and are ready for immediate access
on restart, but any and all soft commit changes since the last hard commit
must be replayed (reexecuted) on restart of a node.

How long does it take to replay the changes in the update log? No firm
numbers, but treat it as if all of those uncommitted updates had to be
resent and reprocessed by Solr. It's probably faster than that, but you get
the picture.

I would suggest thinking in terms of minutes rather than seconds for hard
commits 5 minutes, 10, 15, 20, 30 minutes.

Hard commits may result in kicking off segment merges, so too rapid a rate
of segment creation might cause problems or at least be counterproductive.

So, instead of 15 seconds, try 15 minutes.

OTOH, if you really need to handle 4,000 update a seconds... you are clearly
in uncharted territory and need to expect to need to do some heavy duty
trial and error tuning on your own.

-- Jack Krupansky

-Original Message- From: Tim Vaillancourt
Sent: Saturday, July 27, 2013 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud 4.3.1 - Failure to open existing log file (non
fatal) errors under high load


Thanks for the reply Erick,

Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true

15sec hard commit was sort of a guess, I could try a smaller number.
When you say getting too large what limit do you think it would be
hitting: a ulimit (nofiles), disk space, number of changes, a limit in
Solr itself?

By my math there would be 15 tlogs max per core, but I don't really know
how it all works if someone could fill me in/point me somewhere.

Cheers,

Tim

On 27/07/13 07:57 AM, Erick Erickson wrote:

What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.

FWIW,
Erick

On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourtt...@elementspace.com
wrote:

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data and
Zookeeper
data and start over from scratch with a brand new collection and zero'd
out
logs.

The test is mostly high volume (2000-4000 updates/sec) and at the start
the
SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs
at
all. Then that stack trace occurs on all 3 nodes (staggered), I
immediately
get some replica down messages and then some cannot connect errors to
all
other cluster nodes, who have all crashed the same way. The tlog error
could
be a symptom of the problem of running out of threads perhaps.

Shawn: thanks so much for sharing those details! Yes, they seem to be
nice
servers, for sure - I don't get to touch/see them but they're fast

SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-25 Thread Tim Vaillancourt
Hey guys,

I am reaching out to the Solr list with a very vague issue: under high load
against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas (2
cores per instance), I eventually see failure messages related to
transaction logs, and shortly after these stacktraces occur the cluster
starts to fall apart.

To explain my setup:
- SolrCloud 4.3.1.
- Jetty 9.x.
- Oracle/Sun JDK 1.7.25 w/CMS.
- RHEL 6.x 64-bit.
- 3 instances, 1 per server.
- 3 shards.
- 2 replicas per shard.

The transaction log error I receive after about 10-30 minutes of load
testing is:

ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)
/opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException:
java.io.EOFException
at
org.apache.solr.update.TransactionLog.init(TransactionLog.java:182)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
at
org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:138)
at
org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:125)
at
org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
at
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
at org.apache.solr.core.SolrCore.init(SolrCore.java:805)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at
org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:982)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73)
at
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at
org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
at
org.apache.solr.update.TransactionLog.init(TransactionLog.java:160)
... 25 more


Eventually after a few of these stack traces, the cluster starts to lose
shards and replicas fail. Jetty then creates hung threads until hitting
OutOfMemory on native threads due to the maximum process ulimit.

I know this is quite a vague issue, so I'm not expecting a silver-bullet
answer, but I was wondering if anyone has suggestions on where to look
next? Does this sound Solr-related at all, or possibly system? Has anyone
seen this issue before, or has any hypothesis how to find out more?

I will reply shortly with a thread dump, taken from 1 locked-up node.

Thanks for any suggestions!

Tim


Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-25 Thread Tim Vaillancourt
Stack trace:

http://timvaillancourt.com.s3.amazonaws.com/tmp/solrcloud.nodeC.2013-07-25-16.jstack.gz

Cheers!

Tim


On 25 July 2013 16:44, Tim Vaillancourt t...@elementspace.com wrote:

 Hey guys,

 I am reaching out to the Solr list with a very vague issue: under high
 load against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas
 (2 cores per instance), I eventually see failure messages related to
 transaction logs, and shortly after these stacktraces occur the cluster
 starts to fall apart.

 To explain my setup:
 - SolrCloud 4.3.1.
 - Jetty 9.x.
 - Oracle/Sun JDK 1.7.25 w/CMS.
 - RHEL 6.x 64-bit.
 - 3 instances, 1 per server.
 - 3 shards.
 - 2 replicas per shard.

 The transaction log error I receive after about 10-30 minutes of load
 testing is:

 ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
 Failure to open existing log file (non fatal)
 /opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException:
 java.io.EOFException
 at
 org.apache.solr.update.TransactionLog.init(TransactionLog.java:182)
 at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
 at
 org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83)
 at
 org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:138)
 at
 org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:125)
 at
 org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
 at
 org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:805)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at
 org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:982)
 at
 org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at
 org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.EOFException
 at
 org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73)
 at
 org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
 at
 org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
 at
 org.apache.solr.update.TransactionLog.init(TransactionLog.java:160)
 ... 25 more
 

 Eventually after a few of these stack traces, the cluster starts to lose
 shards and replicas fail. Jetty then creates hung threads until hitting
 OutOfMemory on native threads due to the maximum process ulimit.

 I know this is quite a vague issue, so I'm not expecting a silver-bullet
 answer, but I was wondering if anyone has suggestions on where to look
 next? Does this sound Solr-related at all, or possibly system? Has anyone
 seen this issue before, or has any hypothesis how to find out more?

 I will reply shortly with a thread dump, taken from 1 locked-up node.

 Thanks for any suggestions!

 Tim



Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-25 Thread Tim Vaillancourt
Thanks for the reply Shawn, I can always count on you :).

We are using 10GB heaps and have over 100GB of OS cache free to answer the
JVM question, Young has about 50% of the heap, all CMS. Our max number of
processes for the JVM user is 10k, which is where Solr dies when it blows
up with 'cannot create native thread'.

I also want to say this is system related, but I am seeing this occur on
all 3 servers, which are brand-new Dell R720s. I'm not saying this is
impossible, but I don't see much to suggest that, and it would need to be
one hell of a coincidence.

To add more confusion to the mix, we actually run a 2nd SolrCloud cluster
on the same Solr, Jetty and JVM versions that do not exhibit this issue,
although using a completely different schema, servers and access-patterns,
although it is also at high-TPS. That is some evidence to say the current
software stack is OK, or maybe this only occurs under an extreme load that
2nd cluster does not see, or lastly only with a certain schema.

Lastly, to add a bit more detail to my original description, so far I have
tried:

- Entirely rebuilding my cluster from scratch, reinstalling all deps,
configs, reindexing the data (in case I screwed up somewhere). The EXACT
same issue occurs under load about 20-45 minutes in.
- Moving to Java 1.7.0_21 from _25 due to some known bugs. Same issue
occurs after some load.
- Restarting SolrCloud / forcing rebuilds or cores. Same issue occurs after
some load.

Cheers,

Tim


On 25 July 2013 17:13, Shawn Heisey s...@elyograg.org wrote:

 On 7/25/2013 5:44 PM, Tim Vaillancourt wrote:

 The transaction log error I receive after about 10-30 minutes of load
 testing is:

 ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.**SolrException]
 Failure to open existing log file (non fatal)
 /opt/easw/easw_apps/easo_solr_**cloud/solr/xmshd_shard3_**
 replica2/data/tlog/tlog.**078:org.**apache.solr.common.**
 SolrException:
 java.io.EOFException


 snip


  Caused by: java.io.EOFException
  at
 org.apache.solr.common.util.**FastInputStream.**readUnsignedByte(**
 FastInputStream.java:73)
  at
 org.apache.solr.common.util.**FastInputStream.readInt(**
 FastInputStream.java:216)
  at
 org.apache.solr.update.**TransactionLog.readHeader(**
 TransactionLog.java:266)
  at
 org.apache.solr.update.**TransactionLog.init(**TransactionLog.java:160)
  ... 25 more
 


 This looks to me like a system problem.  RHEL should be pretty solid, I
 use CentOS without any trouble.  My initial guesses are a corrupt
 filesystem, failing hardware, or possibly a kernel problem with your
 specific hardware.

 I'm running Jetty 8, which is the version that the example uses.  Could
 Jetty 9 be a problem here?  I couldn't really say, though my initial guess
 is that it's not a problem.

 I'm running Oracle Java 1.7.0_13.  Normally later releases are better, but
 Java bugs do exist and do get introduced in later releases.  Because you're
 on the absolute latest, I'm guessing that you had the problem with an
 earlier release and upgraded to see if it went away.  If that's what
 happened, it is less likely that it's Java.

 My first instinct would be to do a 'yum distro-sync' followed by 'touch
 /forcefsck' and reboot with console access to the server, so that you can
 deal with any fsck problems.  Perhaps you've already tried that. I'm aware
 that this could be very very hard to get pushed through strict change
 management procedures.

 I did some searching.  SOLR-4519 is a different problem, but it looks like
 it has a similar underlying exception, with no resolution.  It was filed
 When Solr 4.1.0 was current.

 Could there be a resource problem - heap too small, not enough OS disk
 cache, etc?

 Thanks,
 Shawn




Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load

2013-07-25 Thread Tim Vaillancourt

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not 
appearing after a previous crash. Every time I run this high-volume test 
that produced my stack trace, I zero out the logs, Solr data and 
Zookeeper data and start over from scratch with a brand new collection 
and zero'd out logs.


The test is mostly high volume (2000-4000 updates/sec) and at the start 
the SolrCloud runs decently for a good 20-60~ minutes, no errors in the 
logs at all. Then that stack trace occurs on all 3 nodes (staggered), I 
immediately get some replica down messages and then some cannot 
connect errors to all other cluster nodes, who have all crashed the 
same way. The tlog error could be a symptom of the problem of running 
out of threads perhaps.


Shawn: thanks so much for sharing those details! Yes, they seem to be 
nice servers, for sure - I don't get to touch/see them but they're fast! 
I'll look into firmwares for sure and will try again after updating 
them. These Solr instances are not-bare metal and are actually KVM VMs 
so that's another layer to look into, although it is consistent between 
the two clusters.


I am not currently increasing the 'nofiles' ulimit to above default like 
you are, but does Solr use 10,000+ file handles? It won't hurt to try it 
I guess :). To rule out Java 7, I'll probably also try Jetty 8 and Java 
1.6 as an experiment as well.


Thanks!

Tim

On 25/07/13 05:55 PM, Yonik Seeley wrote:

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourtt...@elementspace.com  wrote:

ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)


That itself isn't necessarily a problem (and why it says non fatal)
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com


Re: preferred container for running SolrCloud

2013-07-13 Thread Tim Vaillancourt

We run Jetty 8 and 9 with Solr. No issues I can think of.

We use Jetty interally anyways, and it seemed to be the most common 
container out there for Solr (from reading this mailinglist, articles, 
etc), so that made me feel a bit better if I needed advice or help from 
the community - not to say there isn't a lot of Tomcat + Solr knowledge 
on the list.


Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM 
container in regards to Tomcat, but recent benchmarks I've seen out 
there seem to indicate Tomcat is on par or possibly faster now, although 
I believe while using more RAM. Don't quote me here. I'd love if someone 
could do a Solr-specific benchmark.


Another neat, but sort of unimportant tidbit is Google App Engine went 
with Jetty, which to me indicates the Jetty project isn't going away 
anytime soon. Who knows, Google may even submit back valuable 
improvements to the project. Live in hope!


Tim

On 11/07/13 08:14 PM, Saikat Kanjilal wrote:

One last thing, no issues with jetty.  The issues we did have was actually 
running separate zookeeper clusters.


From: sxk1...@hotmail.com
To: solr-user@lucene.apache.org
Subject: RE: preferred container for running SolrCloud
Date: Thu, 11 Jul 2013 20:13:27 -0700

Separate Zookeeper.


Date: Thu, 11 Jul 2013 19:27:18 -0700
Subject: Re: preferred container for running SolrCloud
From: docbook@gmail.com
To: solr-user@lucene.apache.org

With the embedded Zookeeper or separate Zookeeper? Also have run into any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilalsxk1...@hotmail.comwrote:


We're running under jetty.

Sent from my iPhone

On Jul 11, 2013, at 6:06 PM, Ali, Saqibdocbook@gmail.com  wrote:


1) Jboss
2) Jetty
3) Tomcat
4) Other..

?






Re: preferred container for running SolrCloud

2013-07-13 Thread Tim Vaillancourt

Very good point, Furkan.

The unit tests being ran against Jetty is another very good reason to 
feel safer on Jetty, IMHO. I'm assuming the SolrCloud ChaosMonkey tests 
are ran against Jetty as well?


Tim

On 13/07/13 02:46 PM, Furkan KAMACI wrote:

Of course you may have some reasons to use Tomcat or anything else (i.e.
your stuff may have more experience at Tomcat etc.) However developers
generally runs Jetty because it is default for Solr and I should point that
Solr unit tests run against jetty (in fact, a specific version of Jetty)
and well tested (if you search in mail list you can find some conversations
about it). If you follow Solr developer list you may realize using a well
tested container or not. For example:
https://issues.apache.org/jira/browse/SOLR-4716 and
https://issues.apache.org/jira/browse/SOLR-4584?focusedCommentId=13625276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13625276can
show that there maybe some bugs for non Jetty containers and if you
choose any other container except for Jetty you can hit one of them.

If you want to look at the comparison of Jetty vs. Tomcat I suggest you
look at here:

http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server

and here:

http://www.infoq.com/news/2009/08/google-chose-jetty



2013/7/13 Tim Vaillancourtt...@elementspace.com


We run Jetty 8 and 9 with Solr. No issues I can think of.

We use Jetty interally anyways, and it seemed to be the most common
container out there for Solr (from reading this mailinglist, articles,
etc), so that made me feel a bit better if I needed advice or help from the
community - not to say there isn't a lot of Tomcat + Solr knowledge on the
list.

Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM
container in regards to Tomcat, but recent benchmarks I've seen out there
seem to indicate Tomcat is on par or possibly faster now, although I
believe while using more RAM. Don't quote me here. I'd love if someone
could do a Solr-specific benchmark.

Another neat, but sort of unimportant tidbit is Google App Engine went
with Jetty, which to me indicates the Jetty project isn't going away
anytime soon. Who knows, Google may even submit back valuable improvements
to the project. Live in hope!

Tim


On 11/07/13 08:14 PM, Saikat Kanjilal wrote:


One last thing, no issues with jetty.  The issues we did have was
actually running separate zookeeper clusters.

  From: sxk1...@hotmail.com

To: solr-user@lucene.apache.org
Subject: RE: preferred container for running SolrCloud
Date: Thu, 11 Jul 2013 20:13:27 -0700

Separate Zookeeper.

  Date: Thu, 11 Jul 2013 19:27:18 -0700

Subject: Re: preferred container for running SolrCloud
From: docbook@gmail.com
To: solr-user@lucene.apache.org

With the embedded Zookeeper or separate Zookeeper? Also have run into
any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilalsxk1...@hotmail.com

wrote:

  We're running under jetty.

Sent from my iPhone

On Jul 11, 2013, at 6:06 PM, Ali, Saqibdocbook@gmail.com
  wrote:

  1) Jboss

2) Jetty
3) Tomcat
4) Other..

?





Re: documentCache not used in 4.3.1?

2013-06-29 Thread Tim Vaillancourt

That's a good idea, I'll try that next week.

Thanks!

Tim

On 29/06/13 12:39 PM, Erick Erickson wrote:

Tim:

Yeah, this doesn't make much sense to me either since,
as you say, you should be seeing some metrics upon
occasion. But do note that the underlying cache only gets
filled when getting documents to return in query results,
since there's no autowarming going on it may come and
go.

But you can test this pretty quickly by lengthening your
autocommit interval or just not indexing anything
for a while, then run a bunch of queries and look at your
cache stats. That'll at least tell you whether it works at all.
You'll have to have hard commits turned off (or openSearcher
set to 'false') for that check too.

Best
Erick


On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Timtvaillanco...@ea.comwrote:


Yes, we are softCommit'ing every 1000ms, but that should be enough time to
see metrics though, right? For example, I still get non-cumulative metrics
from the other caches (which are also throw away). I've also curl/sampled
enough that I probably should have seen a value by now.

If anyone else can reproduce this on 4.3.1 I will feel less crazy :).

Cheers,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, June 29, 2013 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: documentCache not used in 4.3.1?

It's especially weird that the hit ratio is so high and you're not seeing
anything in the cache. Are you perhaps soft committing frequently? Soft
commits throw away all the top-level caches including documentCache I
think

Erick


On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourtt...@elementspace.com

wrote:
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodneticotis.gospodne...@gmail.com
wrote:


Hi Tim,

Not sure about the zeros in 4.3.1, but in SPM we see all these
numbers are non-0, though I haven't had the chance to confirm with

Solr 4.3.1.

Note that you can't really autowarm document cache...

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/ Performance
Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt
t...@elementspace.com
wrote:

Hey guys,

This has to be a stupid question/I must be doing something wrong,
but

after

frequent load testing with documentCache enabled under Solr 4.3.1
with autoWarmCount=150, I'm noticing that my documentCache metrics
are

always

zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see
the non-cumlative result, but after 100s of samples I still always
get zero values.

Here is the current output of my documentCache from Solr's admin
for 1

core:



- documentCache

http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en
try=documentCache

   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=512, initialSize=512,
   autowarmCount=150, regenerator=null)
   - src:$URL: https:/
   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
   solr/core/src/java/org/apache/solr/search/LRUCache.java

https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s
olr/core/src/java/org/apache/solr/search/LRUCache.java

$
   - stats:
  - lookups:0
  - hits:0
  - hitratio:0.00
  - inserts:0
  - evictions:0
  - size:0
  - warmupTime:0
  - cumulative_lookups:65198986
  - cumulative_hits:63075669
  - cumulative_hitratio:0.96
  - cumulative_inserts:2123317
  - cumulative_evictions:1010262
   

The cumulative values seem to rise, suggesting doc cache is
working,

but

at

the same time it seems I never see non-cumlative metrics, most

importantly

warmupTime.

Am I doing something wrong, is this normal/by-design, or is there
an

issue

here?

Thanks for helping with my silly question! Have a good weekend,

Tim


documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Hey guys,

This has to be a stupid question/I must be doing something wrong, but after
frequent load testing with documentCache enabled under Solr 4.3.1 with
autoWarmCount=150, I'm noticing that my documentCache metrics are always
zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see the
non-cumlative result, but after 100s of samples I still always get zero
values.

Here is the current output of my documentCache from Solr's admin for 1 core:



   - 
documentCachehttp://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
  - class:org.apache.solr.search.LRUCache
  - version:1.0
  - description:LRU Cache(maxSize=512, initialSize=512,
  autowarmCount=150, regenerator=null)
  - src:$URL: https:/
  /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
  
solr/core/src/java/org/apache/solr/search/LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java$
  - stats:
 - lookups:0
 - hits:0
 - hitratio:0.00
 - inserts:0
 - evictions:0
 - size:0
 - warmupTime:0
 - cumulative_lookups:65198986
 - cumulative_hits:63075669
 - cumulative_hitratio:0.96
 - cumulative_inserts:2123317
 - cumulative_evictions:1010262
  

The cumulative values seem to rise, suggesting doc cache is working, but at
the same time it seems I never see non-cumlative metrics, most importantly
warmupTime.

Am I doing something wrong, is this normal/by-design, or is there an issue
here?

Thanks for helping with my silly question! Have a good weekend,

Tim


Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
To answer some of my own question, Shawn H's great reply on this thread
explains why I see no autoWarming on doc cache:

http://www.marshut.com/iznwr/soft-commit-and-document-cache.html

It is still unclear to me why I see no other metrics, however.

Thanks Shawn,

Tim


On 28 June 2013 16:14, Tim Vaillancourt t...@elementspace.com wrote:

 Hey guys,

 This has to be a stupid question/I must be doing something wrong, but
 after frequent load testing with documentCache enabled under Solr 4.3.1
 with autoWarmCount=150, I'm noticing that my documentCache metrics are
 always zero for non-cumlative.

 At first I thought my commit rate is fast enough I just never see the
 non-cumlative result, but after 100s of samples I still always get zero
 values.

 Here is the current output of my documentCache from Solr's admin for 1
 core:

 

- 
 documentCachehttp://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=512, initialSize=512,
   autowarmCount=150, regenerator=null)
   - src:$URL: https:/
   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
   
 solr/core/src/java/org/apache/solr/search/LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java$
   - stats:
  - lookups:0
  - hits:0
  - hitratio:0.00
  - inserts: 0
  - evictions:0
  - size:0
  - warmupTime:0
  - cumulative_lookups: 65198986
  - cumulative_hits:63075669
  - cumulative_hitratio:0.96
  - cumulative_inserts: 2123317
  - cumulative_evictions:1010262
   

 The cumulative values seem to rise, suggesting doc cache is working, but
 at the same time it seems I never see non-cumlative metrics, most
 importantly warmupTime.

 Am I doing something wrong, is this normal/by-design, or is there an issue
 here?

 Thanks for helping with my silly question! Have a good weekend,

 Tim






Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodnetic otis.gospodne...@gmail.com wrote:

 Hi Tim,

 Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
 are non-0, though I haven't had the chance to confirm with Solr 4.3.1.

 Note that you can't really autowarm document cache...

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
  Hey guys,
 
  This has to be a stupid question/I must be doing something wrong, but
 after
  frequent load testing with documentCache enabled under Solr 4.3.1 with
  autoWarmCount=150, I'm noticing that my documentCache metrics are always
  zero for non-cumlative.
 
  At first I thought my commit rate is fast enough I just never see the
  non-cumlative result, but after 100s of samples I still always get zero
  values.
 
  Here is the current output of my documentCache from Solr's admin for 1
 core:
 
  
 
 - documentCache
 http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
 
- class:org.apache.solr.search.LRUCache
- version:1.0
- description:LRU Cache(maxSize=512, initialSize=512,
autowarmCount=150, regenerator=null)
- src:$URL: https:/
/svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
solr/core/src/java/org/apache/solr/search/LRUCache.java
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
 $
- stats:
   - lookups:0
   - hits:0
   - hitratio:0.00
   - inserts:0
   - evictions:0
   - size:0
   - warmupTime:0
   - cumulative_lookups:65198986
   - cumulative_hits:63075669
   - cumulative_hitratio:0.96
   - cumulative_inserts:2123317
   - cumulative_evictions:1010262

 
  The cumulative values seem to rise, suggesting doc cache is working, but
 at
  the same time it seems I never see non-cumlative metrics, most
 importantly
  warmupTime.
 
  Am I doing something wrong, is this normal/by-design, or is there an
 issue
  here?
 
  Thanks for helping with my silly question! Have a good weekend,
 
  Tim



Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Tim Vaillancourt
To answer Otis' question of whether or not this would be useful, the
trouble is, I don't know! :) It very well could be useful for my use case.

Is there any way to determine the impact of result merging (time spent?
Etc?) aside from just 'trying it'?

Cheers,

Tim


On 10 June 2013 14:48, Otis Gospodnetic otis.gospodne...@gmail.com wrote:

 I think it would be useful.  I know people using ElasticSearch use it
 relatively often.

   Is aggregation expensive enough to warrant a separate box?

 I think it can get expensive if X in rows=X is highish.  We've seen
 this reported here on the Solr ML before
 So to make sorting/merging of N result set from N data nodes on this
 aggregator node you may want to get all the CPU you can get and not
 have the CPU simultaneously also try to handle incoming queries.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
  No, there's no such notion in SolrCloud. Each node that is part of a
  collection/shard is a replica and will handle indexing/querying. Even
  though you can send a request to a node containing a different
 collection,
  the request would just be forwarded to the right node and will be
 executed
  there.
 
  That being said, do people find such a feature useful? Is aggregation
  expensive enough to warrant a separate box? In a distributed search, the
  local index is used. One'd would just be adding a couple of extra network
  requests if you don't have a local index.
 
 
  On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Is there a notion of a data-node vs. non-data node in SolrCloud?
  Something a la
 http://www.elasticsearch.org/guide/reference/modules/node/
 
 
  Thanks,
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.



Re: Lucene/Solr Filesystem tunings

2013-06-07 Thread Tim Vaillancourt
I figured as much for atime, thanks Otis!

I haven't ran benchmarks just yet, but I'll be sure to share whatever I
find. I plan to try ext4 vs xfs.

I am also curious what effect disabling journaling (ext2) would have,
relying on SolrCloud to manage 'consistency' over many instances vs FS
journaling. Anyone have opinions there? If I test I'll share the results.

Cheers,

Tim


On 4 June 2013 16:11, Otis Gospodnetic otis.gospodne...@gmail.com wrote:

 Hi,

 You can use noatime, nodiratime, nothing in Solr depends on that as
 far as I know.  We tend to use ext4.  Some people love xfs.  Want to
 run some benchmarks and publish the results? :)

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Jun 4, 2013 at 6:48 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
  Hey all,
 
  Does anyone have any advice or special filesytem tuning to share for
  Lucene/Solr, and which file systems they like more?
 
  Also, does Lucene/Solr care about access times if I turn them off (I
 think I
  doesn't care)?
 
  A bit unrelated: What are people's opinions on reducing some consistency
  things like filesystem journaling, etc (ext2?) due to SolrCloud's
 additional
  HA with replicas? How about RAID 0 x 3 replicas or so?
 
  Thanks!
 
  Tim Vaillancourt



Re: Two instances of solr - the same datadir?

2013-06-07 Thread Tim Vaillancourt
If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.

My main concern was re-using the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote:

 So here it is for a record how I am solving it right now:

 Write-master is started with: -Dmontysolr.warming.enabled=false
 -Dmontysolr.write.master=true -Dmontysolr.read.master=
 http://localhost:5005
 Read-master is started with: -Dmontysolr.warming.enabled=true
 -Dmontysolr.write.master=false


 solrconfig.xml changes:

 1. all index changing components have this bit,
 enable=${montysolr.master:true} - ie.

 updateHandler class=solr.DirectUpdateHandler2
  enable=${montysolr.master:true}

 2. for cache warming de/activation

 listener event=newSearcher
   class=solr.QuerySenderListener
   enable=${montysolr.enable.warming:true}...

 3. to trigger refresh of the read-only-master (from write-master):

 listener event=postCommit
   class=solr.RunExecutableListener
   enable=${montysolr.master:true}
   str name=execurl/str
   str name=dir./str
   bool name=waitfalse/bool
   arr name=args str${montysolr.read.master:http://localhost

 }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr
 /listener

 This works, I still don't like the reload of the whole core, but it seems
 like the easiest thing to do now.

 -- roman


 On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Hi Peter,
 
  Thank you, I am glad to read that this usecase is not alien.
 
  I'd like to make the second instance (searcher) completely read-only, so
 I
  have disabled all the components that can write.
 
  (being lazy ;)) I'll probably use
  http://wiki.apache.org/solr/CollectionDistribution to call the curl
 after
  commit, or write some IndexReaderFactory that checks for changes
 
  The problem with calling the 'core reload' - is that it seems lots of
 work
  for just opening a new searcher, eeekkk...somewhere I read that it is
 cheap
  to reload a core, but re-opening the index searches must be definitely
  cheaper...
 
  roman
 
 
  On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com
 wrote:
 
  Hi,
  We use this very same scenario to great effect - 2 instances using the
  same
  dataDir with many cores - 1 is a writer (no caching), the other is a
  searcher (lots of caching).
  To get the searcher to see the index changes from the writer, you need
 the
  searcher to do an empty commit - i.e. you invoke a commit with 0
  documents.
  This will refresh the caches (including autowarming), [re]build the
  relevant searchers etc. and make any index changes visible to the RO
  instance.
  Also, make sure to use lockTypenative/lockType in solrconfig.xml to
  ensure the two instances don't try to commit at the same time.
  There are several ways to trigger a commit:
  Call commit() periodically within your own code.
  Use autoCommit in solrconfig.xml.
  Use an RPC/IPC mechanism between the 2 instance processes to tell the
  searcher the index has changed, then call commit when called (more
 complex
  coding, but good if the index changes on an ad-hoc basis).
  Note, doing things this way isn't really suitable for an NRT
 environment.
 
  HTH,
  Peter
 
 
 
  On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Replication is fine, I am going to use it, but I wanted it for
 instances
   *distributed* across several (physical) machines - but here I have one
   physical machine, it has many cores. I want to run 2 instances of solr
   because I think it has these benefits:
  
   1) I can give less RAM to the writer (4GB), and use more RAM for the
   searcher (28GB)
   2) I can deactivate warming for the writer and keep it for the
 searcher
   (this considerably speeds up indexing - each time we commit, the
 server
  is
   rebuilding a citation network of 80M edges)
   3) saving disk space and better OS caching (OS should be able to use
  more
   RAM for the caching, which should result in faster operations - the
 two
   processes are accessing the same index)
  
   Maybe I should just forget it and go with the replication, but it
  doesn't
   'feel right' IFF it is on the same physical machine. And Lucene
   specifically has a method for discovering changes and re-opening the
  index
   (DirectoryReader.openIfChanged)
  
   Am I not seeing something?
  
   roman
  
  
  
   On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman 
   jhell...@innoventsolutions.com wrote:
  
Roman,
   
Could you be more specific as to 

Lucene/Solr Filesystem tunings

2013-06-04 Thread Tim Vaillancourt

Hey all,

Does anyone have any advice or special filesytem tuning to share for 
Lucene/Solr, and which file systems they like more?


Also, does Lucene/Solr care about access times if I turn them off (I 
think I doesn't care)?


A bit unrelated: What are people's opinions on reducing some consistency 
things like filesystem journaling, etc (ext2?) due to SolrCloud's 
additional HA with replicas? How about RAID 0 x 3 replicas or so?


Thanks!

Tim Vaillancourt


SolrCloud Load Balancer weight

2013-06-03 Thread Tim Vaillancourt
Hey guys,

I have recently looked into an issue with my Solrcloud related to very high
load when performing a full-import on DIH.

While some work could be done to improve my queries, etc in DIH, this lead
me to a new feature idea in Solr: weighted internal load balancing.

Basically, I can think of two uses cases, and how a weight on load
balancing could help:

1) My situation from above - I'm doing a huge import and want SolrCloud to
direct fewer queries to the node handling the DIH full-import, say weight
10/100 (10%) instead of 100/100.
2) Mixed hardware - Although I wouldn't recommend doing this, some people
may have mixed hardware, some capable of handling more or less traffic.

These weights wouldn't be expected to be exact, just best-effort to be able
generally to influence load on nodes inside the cluster. They of course
would only matter on reads (/get, /select, etc).

A full blown approach would have weight awareness in the Zookeeper-aware
client implementation, and on inter-node replica requests.

Should I JIRA this? Thoughts?

Tim


Re: seeing lots of autowarming messages in log during DIH indexing

2013-05-25 Thread Tim Vaillancourt

Interesting.

In your scenario would you use commit=true, or commit=false, and do you use 
auto soft/hard commits?

Secondly, if you did use auto-soft/hard commits, how would they affect this 
scenario? I'm guessing even with commit=false, the autoCommits would be 
triggered either by time or max docs, which opens a searcher anyways. A total 
guess though.

I'm interested in doing full-imports without committing/opening new searchers 
until it is complete.

Cheers!

Tim

On 20/05/13 03:59 PM, shreejay wrote:

geeky2 wrote

you mean i would add this switch to my script that kicks of the
dataimport?

exmaple:


OUTPUT=$(curl -v
http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport
-F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
optimize=${OPTIMIZE} -F openSearcher=false)

Yes. Thats correct



geeky2 wrote

what needs to be done _AFTER_ the DIH finishes (if anything)?

eg, does this need to be turned back on after the DIH has finished?

Yes. You need to open the searcher to be able to search. Just run another
commit with openSearcher = true , once your indexing process finishes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: protect solr pages

2013-05-17 Thread Tim Vaillancourt
A lot of people (including me) are asking for this type of support in this
JIRA:

https://issues.apache.org/jira/browse/SOLR-4470

Although brought up frequently on the list, the effort doesn't seem to be
moving too much. I can confirm that the most recent patch on this JIRA will
work with the specific revision of 4.2.x though.

Cheers,

Tim


On 17 May 2013 13:11, gpssolr2020 psgoms...@gmail.com wrote:

 Hi,

 i want implement security through jetty realm in solr4. So i configured
 related stuffs in realm.properties ,jetty.xml, webdefault.xml under
 /solrhome/example/etc. But still it is not working. Please advise.



 Thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/protect-solr-pages-tp4064274.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Does solr cloud support rename or swap function for collection?

2013-04-15 Thread Tim Vaillancourt

I added a brief description on CREATEALIAS here, feel free to tweak:

http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

Tim

On 07/04/13 05:29 PM, Mark Miller wrote:

It's pretty simple - just as Brad said, it's just

http://localhost:8983/solr/admin/collections?action=CREATEALIASname=aliascollections=collection1,collection2,…

You also have action=DELETEALIAS

CREATEALIAS will create and update.

For update requests, you only want a 1to1 alias. For read requests, you can map 
1to1 or 1toN.

I've also started work on shard level aliases, but I've yet to get back to 
finishing it.

- Mark

On Apr 7, 2013, at 5:10 PM, Tim Vaillancourtt...@elementspace.com  wrote:


I aim to use this feature in more in testing soon. I'll be sure to doc what I 
can.

Cheers,

Tim

On 07/04/13 12:28 PM, Mark Miller wrote:

On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com   wrote:


Thanks Mark for this great feature but I suggest you can update the wiki
too.

Yeah, I've stopped updating the wiki for a while now looking back - paralysis 
on how to handle versions (I didn't want to do the std 'this applies to 4.1', 
'this applied to 4.0' all over the page) and the current likely move to a new 
Confluence wiki with Docs based on documentation LucidWorks recently donated to 
the project.

That's all a lot of work away still I guess.

I'll try and add some basic doc for this to the SolrCloud wiki page soon.

- Mark


Re: Storing Solr Index on NFS

2013-04-15 Thread Tim Vaillancourt
If centralization of storage is your goal by choosing NFS, iSCSI works 
reasonably well with SOLR indexes, although good local-storage will 
always be the overall winner.


I noticed a near 5% degredation in overall search performance (casual 
testing, nothing scientific) when moving a 40-50GB indexes to iSCSI 
(10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup.


Tim

On 15/04/13 09:59 AM, Walter Underwood wrote:

Solr 4.2 does have field compression which makes smaller indexes. That will 
reduce the amount of network traffic. That probably does not help much, because 
I think the latency of NFS is what causes problems.

wunder

On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:


Hello Walter,

Thanks for the response. That has been my experience in the past as well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwun...@wunderwood.orgwrote:


On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:


Greetings,

Are there any issues with storing Solr Indexes on a NFS share? Also any
recommendations for using NFS for Solr indexes?

I recommend that you do not put Solr indexes on NFS.

It can be very slow, I measured indexing as 100X slower on NFS a few years
ago.

It is not safe to share Solr index files between two Solr servers, so
there is no benefit to NFS.

wunder
--
Walter Underwood
wun...@wunderwood.org





--
Walter Underwood
wun...@wunderwood.org






Re: Basic auth on SolrCloud /admin/* calls

2013-04-14 Thread Tim Vaillancourt
I've thought about this too, and have heard of some people running a 
lightweight http proxy upstream of Solr.


With the right network restrictions (only way for a client to reach solr 
is via a proxy + the nodes can still talk to each other), you could 
achieve the same thing SOLR-4470 is doing, with the drawback of 
additional proxy and firewall components to maintain, plus added 
overhead on HTTP calls.


A benefit though is a lightweight proxy ahead of Solr could implement 
HTTP caching, taking some load off of Solr.


In a perfect world, I'd say rolling out SOLR-4470 is the best solution, 
but again, it seems to be losing momentum (please Vote/support the 
discussion!). While proxies can achieve this, I think enough people have 
pondered about this to implement this as a feature in Solr.


Tim

On 14/04/13 12:32 AM, adfel70 wrote:

Did anyone try blocking access to the ports in the firewall level, and
allowing all the solr servers in the cluster+given control-machines?
Assuming that search request to solr run though a proxy..





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Basic-auth-on-SolrCloud-admin-calls-tp4052266p4055868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Basic auth on SolrCloud /admin/* calls

2013-04-13 Thread Tim Vaillancourt

This JIRA covers a lot of what you're asking:

https://issues.apache.org/jira/browse/SOLR-4470

I am also trying to get this sort of solution in place, but it seems to 
be dying off a bit. Hopefully we can get some interest on this again, 
this question comes up every few weeks, it seems.


I can confirm the latest patch from this JIRA works as expected, 
although my primary concern is the credentials appear in the JVM 
command, and I'd like to move that to a file.


Cheers,

Tim

On 11/04/13 10:41 AM, Michael Della Bitta wrote:

It's fairly easy to lock down Solr behind basic auth using just the
servlet container it's running in, but the problem becomes letting
services that *should* be able to access Solr in. I've rolled with
basic auth in some setups, but certain deployments such as Solr Cloud
or sharded setups don't play well with auth because there's no good
way to configure them to use it.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 11, 2013 at 1:19 PM, Raymond Wikerrwi...@gmail.com  wrote:

On Apr 11, 2013, at 17:12 , adfel70adfe...@gmail.com  wrote:

Hi
I need to implement security in solr as follows:
1. prevent unauthorized users from accessing to solr admin pages.
2. prevent unauthorized users from performing solr operations - both /admin
and /update.


Is the conclusion of this thread is that this is not possible at the moment?


The obvious solution (to me, at least) would be to (1) restrict access to solr to 
localhost, and (2) use a reverse proxy (e.g, apache) on the same node to provide 
authenticated  restricted access to solr. I think I've seen recipes for (1), somewhere, 
and I've used (2) fairly extensively for similar purposes.


CSS appearing in Solr 4.2.1 logs

2013-04-12 Thread Tim Vaillancourt
Hey guys,

This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x
logs?

Often I am finding entire CSS documents (likely from Solr's Admin) in my
jetty's stderrout log.

Example:

2013-04-12 00:23:20.363:WARN:oejh.HttpGenerator:Ignoring extra content /**
 * @license RequireJS order 1.0.5 Copyright (c) 2010-2011, The Dojo
Foundation All Rights Reserved.
 * Available via the MIT or new BSD license.
 * see: http://github.com/jrburke/requirejs for details
 */
/*jslint nomen: false, plusplus: false, strict: false */
/*global require: false, define: false, window: false, document: false,
  setTimeout: false */

//Specify that requirejs optimizer should wrap this code in a closure that
//maps the namespaced requirejs API to non-namespaced local variables.
/*requirejs namespace: true */

(function () {

//Sadly necessary browser inference due to differences in the way
//that browsers load and execute dynamically inserted javascript
//and whether the script/cache method works when ordered execution is
//desired. Currently, Gecko and Opera do not load/fire onload for
scripts with
//type=script/cache but they execute injected scripts in order
//unless the 'async' flag is present.
//However, this is all changing in latest browsers implementing HTML5
//spec. With compliant browsers .async true by default, and
//if false, then it will execute in order. Favor that test first for
forward
//compatibility.
var testScript = typeof document !== undefined 
 typeof window !== undefined 
 document.createElement(script),

supportsInOrderExecution = testScript  (testScript.async ||
   ((window.opera 

Object.prototype.toString.call(window.opera) === [object Opera]) ||
   //If Firefox 2 does not have to be
supported, then
   //a better check may be:
   //('mozIsLocallyAvailable' in
window.navigator)
   (MozAppearance in
document.documentElement.style))),



Due this, my logs are getting really huge, and sometimes it breaks my tail
-F commands on the logs, printing what looks like binary, so there is
possibly some other junk in my logs aside from CSS.

I am running Jetty 8.1.10 and Solr 4.2.1 (stable build).

Cheers!

Tim Vaillancourt


/admin/stats.jsp in SolrCloud

2013-04-10 Thread Tim Vaillancourt
Hey guys,

This feels like a silly question already, here goes:

In SolrCloud it doesn't seem obvious to me where one can grab stats
regarding caches for a given core using an http call (JSON/XML). Those
values are available in the web-based app, but I am looking for a http call
that would return this same data.

In 3.x this was located at /admin/stats.php, and I used a script to grab
the data, but in SolrCloud I am unclear and would like to add that to the
docs below:

http://wiki.apache.org/solr/SolrCaching#Overview
http://wiki.apache.org/solr/SolrAdminStats

Thanks!

Tim


Re: /admin/stats.jsp in SolrCloud

2013-04-10 Thread Tim Vaillancourt
There we go, Thanks Stefan!

You're right, 3.x has this as well, I guess I missed it. I'll add this to
the docs for SolrCaching.

Cheers!

Tim



On 10 April 2013 13:19, Stefan Matheis matheis.ste...@gmail.com wrote:

 Hey Tim

 SolrCloud-Mode or not does not really matter for this fact .. in 4.x (and
 afaik as well in 3.x) you can find the stats here: 
 http://host:port/solr/admin/mbeans?stats=true
 in xml or json (setting the responsewriter with wt=json) - as you like

 HTH
 Stefan



 On Wednesday, April 10, 2013 at 9:53 PM, Tim Vaillancourt wrote:

  Hey guys,
 
  This feels like a silly question already, here goes:
 
  In SolrCloud it doesn't seem obvious to me where one can grab stats
  regarding caches for a given core using an http call (JSON/XML). Those
  values are available in the web-based app, but I am looking for a http
 call
  that would return this same data.
 
  In 3.x this was located at /admin/stats.php, and I used a script to grab
  the data, but in SolrCloud I am unclear and would like to add that to the
  docs below:
 
  http://wiki.apache.org/solr/SolrCaching#Overview
  http://wiki.apache.org/solr/SolrAdminStats
 
  Thanks!
 
  Tim




Re: Solr 4.2.1 Branch

2013-04-08 Thread Tim Vaillancourt
There is also this path for the SVN guys out there: 
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1


Cheers,

Tim

On 05/04/13 05:53 PM, Jagdish Nomula wrote:

That works out. Thanks for shooting the link.

On Fri, Apr 5, 2013 at 5:51 PM, Jack Krupanskyj...@basetechnology.comwrote:


You want the tagged branch:

https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2_1https://github.com/apache/lucene-solr/tree/lucene_solr_4_2_1


-- Jack Krupansky

-Original Message- From: Jagdish Nomula Sent: Friday, April 05,
2013 8:36 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 Branch
Hello,

I was trying to get hold of solr 4.2.1 branch on github. I see
https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2https://github.com/apache/lucene-solr/tree/lucene_solr_4_2.
  I don't see
any branch for 4.2.1. Am i missing anything ?.

Thanks in advance for your help.

--
***Jagdish Nomula*

Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.comyourem...@simplyhired.com

www.simplyhired.com






Re: Does solr cloud support rename or swap function for collection?

2013-04-07 Thread Tim Vaillancourt
I aim to use this feature in more in testing soon. I'll be sure to doc 
what I can.


Cheers,

Tim

On 07/04/13 12:28 PM, Mark Miller wrote:

On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com  wrote:


Thanks Mark for this great feature but I suggest you can update the wiki
too.


Yeah, I've stopped updating the wiki for a while now looking back - paralysis 
on how to handle versions (I didn't want to do the std 'this applies to 4.1', 
'this applied to 4.0' all over the page) and the current likely move to a new 
Confluence wiki with Docs based on documentation LucidWorks recently donated to 
the project.

That's all a lot of work away still I guess.

I'll try and add some basic doc for this to the SolrCloud wiki page soon.

- Mark


Re: Zookeeper dataimport.properties node

2013-04-04 Thread Tim Vaillancourt
It its in your SolrCloud-based collection's config, it won't be on disk 
and only in Zookeeper.


What I did was use the XInclude feature to include a file with my 
dataimport handler properties, so I'm assuming you're doing the same. 
Use a relative path to the config dir in Zookeeper, ie: no path and just 
'dataimport.properties', unless it is in a subdir of your config, then 
'subdir/dataimport.properties'.


I have a deployment system template the properties file before it is 
inserted into Zookeeper.


Tim

On 03/04/13 08:48 PM, Nathan Findley wrote:

 - Is dataimport.properties ever written to the filesystem? (Trying to
 determine if I have a permissions error because I don't see it
 anywhere on disk). - How do you manually edit dataimport.properties?
 My system is periodically pulling in new data. If that process has
 issues, I want to be able to reset to an earlier known good timestamp
 value.

 Regards, Nate