Re: documentCache not used in 4.3.1?

2013-07-02 Thread Daniel Collins
, it sounds like you're sorting by score. But none of that is worthwhile if you're getting good enough results as it stands. Best Erick On Mon, Jul 1, 2013 at 12:28 PM, Daniel Collins danwcoll...@gmail.com wrote: Regrettably, visibility is key for us :( Documents must be searchable as soon

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
We see similar results, again we softCommit every 1s (trying to get as NRT as we can), and we very rarely get any hits in our caches. As an unscheduled test last week, we did shutdown indexing and noticed about 80% hit rate in caches (and average query time dropped from ~1s to 100ms!) so I think

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
. FWIW, Erick On Mon, Jul 1, 2013 at 3:40 AM, Daniel Collins danwcoll...@gmail.com wrote: We see similar results, again we softCommit every 1s (trying to get as NRT as we can), and we very rarely get any hits in our caches. As an unscheduled test last week, we did shutdown indexing

Re: shard failure, leader transition took 11s (seems high?)

2013-06-27 Thread Daniel Collins
returned an error code so we know the updates have failed, but at our application level, there is nothing else we can do, surely? Solr has to send to the leader, but the leader isn't available, so shouldn't the cloud be handling that? On 24 June 2013 14:58, Daniel Collins danwcoll...@gmail.com

Re: Shard identification

2013-06-26 Thread Daniel Collins
When you say you move to different machines, did you copy the zoo_data from your old setup, or did you just start up zookeeper and your shards one by one? Also did you use collection API to create the collection or just start up your cores and let them attach to ZK. I believe the ZK rules for

How to truncate a particular field, LimitTokenCountAnalyzer or LimitTokenCountFilter?

2013-06-26 Thread Daniel Collins
We have a requirement to grab the first N words in a particular field and weight them differently for scoring purposes. So I thought to use a copyField and have some extra filter on the destination to truncate it down (post tokenization). Did a quick search and found both a

Re: OOM killer script woes

2013-06-26 Thread Daniel Collins
Ooh, I guess Jetty is trapping that java.lang.OutOfMemoryError, and throwing it/packaging it as a java.lang.RuntimeException. The -XX option assumes that the application doesn't handle the Errors and so they would reach the JVM and thus invoke the handler. Since Jetty has an exception handler

Re: Solr Security

2013-06-24 Thread Daniel Collins
To change Solr's default port number just pass -Djetty.port= on the command line, works a treat. As Solr is deployed as a web-app, it is assumed that the administrator would be familiar with web apps, servlet containers and their security, if not, then that is something you need to

shard failure, leader transition took 11s (seems high?)

2013-06-24 Thread Daniel Collins
Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829 patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across 8 machines. We sent all our updates into a single instance, and we shutdown a leader for maintenance, expecting it to failover to the other replica. What

Re: shard failure, leader transition took 11s (seems high?)

2013-06-24 Thread Daniel Collins
. - Mark On Jun 24, 2013, at 8:25 AM, Daniel Collins danwcoll...@gmail.com wrote: Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829 patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across 8 machines. We sent all our updates into a single instance, and we

Re: SolrCloud excluding certain files in conf from zookeeper

2013-06-14 Thread Daniel Collins
We had something similar, we had backup copies of files that were getting uploaded to ZK and we didn't want them to. The morale I learned from that was that the files for ZK don't need to live anywhere under the Solr deployment area, they can be in a totally separate directory structure (in

Re: Question on Copy filed

2013-06-04 Thread Daniel Collins
I guess the longer answer is it depends on the analyzer chain of Field 2. As Gora mentions, fields are copied before analysis, so they are re-analysed/tokenized in the destination field. If the destination field has different analysis rules, then those will be applied. We explicitly use that

Re: Solr directories in 4.3

2013-06-04 Thread Daniel Collins
The example is just that, its an example, not a cast iron base to work from. contexts, etc, lib, resources, solr-webapp and webapps are part of/related to the Jetty deployment. You might not need all the files in them though. cloud-scripts is just some sample scripts for accessing ZK (optional

Re: 2 VM setup for SOLRCLOUD?

2013-06-01 Thread Daniel Collins
Document updates will fail with less than the quorum of ZKs, so you won't be able to index anything when 1 server is down. Its the one area that always seems counter intuitive (to me at any rate), after all you have your 2 instances on 1 server, so you have all the shard data, logically you

Re: Shard Keys and Distributed Search

2013-06-01 Thread Daniel Collins
Yes it is doing a distributed search, Solr cloud will do that by default unless you say distrib=false. My understanding of Solr's Load balancer is that it picks a random instance from the list of available instances serving each shard. So in your example: 1. Query comes in to Server 1,

zk disconnects and failure to retry?

2013-05-24 Thread Daniel Collins
Had a scenario on a dev system here that has me confused. We have a simple Solr cloud (dev) system running 4.3, 4 shards, running on 2 machines (2 instances per machine), 2 ZKs (external) and no replicas (or 1 replica depending on your definition, we only have 1 instance of each shard!) Yes, we

Re: solr UI logging when using logback?

2013-05-21 Thread Daniel Collins
Ah I vaguely remember seeing that when we first used logback (on 4.0), as Shawn says I think its the problem that as logback is starting up, where can it log before it has configured its logging (catch-22), answer it has to log to its own internal format. If memory serves we just disabled

clusterstate stores IP address instead of hostname now?

2013-05-20 Thread Daniel Collins
Just done an upgrade from Solr (cloud) 4.0 to 4.3 and noticed that clusterstate.json now contains the IP address instead of the hostname for each shard descriptor. Was this a conscious change? It caused us some pain when migrating and breaks our own admin tools, so just checking if this is

Re: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Daniel Collins
What actual error do you see in Solr? Is there an exception and if so, can you post that? As I understand it, datatDir is set from the solrconfig.xml file, so either your instances are picking up the wrong file, or you have some override which is incorrect? Where do you set solr.data.dir, at

Re: Billion document index

2013-05-15 Thread Daniel Collins
Just on our experiences, we have a large collection (350M documents, but 1.2Tb in size spread across 4 shards/machines and multiple replicas, we may well need more) and the first thing we needed to do for size estimation was to work out how big a set number of documents would be on disk. So we

Re: SolrCloud Ping Status is 503

2013-05-14 Thread Daniel Collins
Normally whenever I see a 503, that means the Solr Cloud has one of its shards down, i.e. there isn't a full collection available. You can see them at index time if you have lost connection to zookeeper but searches should be ok. If you see them on searches, it (in my experience) means the

Re: SolrCloud Master/Master

2013-02-22 Thread Daniel Collins
Yes, that's a good solution to this whole Cloud/DC issue (which seems to have cropped up several times), you have one of the ZK instances external to the cloud. You can lose any 1 machine, and the others are still ok. The next level would be a Cloud of 3 servers + 2 external ZKs, that would

Re: SolrCloud new zookeper node on different ip/ replicate between two clasters

2013-02-10 Thread Daniel Collins
The consensus here seems to be NOT to do that. Have 2 SolrClouds, one per DC and feed them the same data. That way each cloud has its own quorum of ZKs, and as long as both clouds are up, they should be in sync. You have true DR in that each site is separate and not dependent on any data from

Re: Manually assigning shard leader and replicas during initial setup on EC2

2013-01-23 Thread Daniel Collins
This is exactly the problem we are encountering as well, how to deal with the ZK Quorum when we have multiple DCs. Our index is spread so that each DC has a complete copy and *should* be able to survive on its own, but how to arrange ZK to deal with that. The problem with Quorum is we need an

Re: Problem querying collection in Solr 4.1

2013-01-23 Thread Daniel Collins
Interesting, that sounds like a bit of an issue really, the cloud is hiding the real error. Presumably the non ok status: 500 (buried at the bottom of your trace) was where the actual shard was returning the error (we've had issues with positional stuff before and it normally says something

Re: Solr cloud recovery, why does restarting leader need replicas?

2012-11-29 Thread Daniel Collins
Hi Mark, I get that use case, if the non-leader dies, when it comes back it has to allow for recovery, that makes perfect sense. I guess I was (naively!) assuming there was an optimized scenario if the leader dies, and is the first one to come back (is still therefore leader), there is no

<    1   2