, it sounds like you're sorting
by score.
But none of that is worthwhile if you're getting
good enough results as it stands.
Best
Erick
On Mon, Jul 1, 2013 at 12:28 PM, Daniel Collins danwcoll...@gmail.com
wrote:
Regrettably, visibility is key for us :( Documents must be searchable as
soon
We see similar results, again we softCommit every 1s (trying to get as NRT
as we can), and we very rarely get any hits in our caches. As an
unscheduled test last week, we did shutdown indexing and noticed about 80%
hit rate in caches (and average query time dropped from ~1s to 100ms!) so I
think
.
FWIW,
Erick
On Mon, Jul 1, 2013 at 3:40 AM, Daniel Collins danwcoll...@gmail.com
wrote:
We see similar results, again we softCommit every 1s (trying to get as
NRT
as we can), and we very rarely get any hits in our caches. As an
unscheduled test last week, we did shutdown indexing
returned an error code so we know the updates have failed, but at
our application level, there is nothing else we can do, surely? Solr has to
send to the leader, but the leader isn't available, so shouldn't the cloud
be handling that?
On 24 June 2013 14:58, Daniel Collins danwcoll...@gmail.com
When you say you move to different machines, did you copy the zoo_data from
your old setup, or did you just start up zookeeper and your shards one by
one? Also did you use collection API to create the collection or just
start up your cores and let them attach to ZK. I believe the ZK rules for
We have a requirement to grab the first N words in a particular field and
weight them differently for scoring purposes. So I thought to use a
copyField and have some extra filter on the destination to truncate it
down (post tokenization).
Did a quick search and found both a
Ooh, I guess Jetty is trapping that java.lang.OutOfMemoryError, and
throwing it/packaging it as a java.lang.RuntimeException. The -XX option
assumes that the application doesn't handle the Errors and so they would
reach the JVM and thus invoke the handler.
Since Jetty has an exception handler
To change Solr's default port number just pass -Djetty.port= on the
command line, works a treat.
As Solr is deployed as a web-app, it is assumed that the administrator
would be familiar with web apps, servlet containers and their security, if
not, then that is something you need to
Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829
patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across 8
machines.
We sent all our updates into a single instance, and we shutdown a leader
for maintenance, expecting it to failover to the other replica. What
.
- Mark
On Jun 24, 2013, at 8:25 AM, Daniel Collins danwcoll...@gmail.com wrote:
Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829
patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across 8
machines.
We sent all our updates into a single instance, and we
We had something similar, we had backup copies of files that were getting
uploaded to ZK and we didn't want them to.
The morale I learned from that was that the files for ZK don't need to live
anywhere under the Solr deployment area, they can be in a totally separate
directory structure (in
I guess the longer answer is it depends on the analyzer chain of Field 2.
As Gora mentions, fields are copied before analysis, so they are
re-analysed/tokenized in the destination field. If the destination field
has different analysis rules, then those will be applied.
We explicitly use that
The example is just that, its an example, not a cast iron base to work from.
contexts, etc, lib, resources, solr-webapp and webapps are part of/related
to the Jetty deployment. You might not need all the files in them though.
cloud-scripts is just some sample scripts for accessing ZK (optional
Document updates will fail with less than the quorum of ZKs, so you won't be
able to index anything when 1 server is down.
Its the one area that always seems counter intuitive (to me at any rate),
after all you have your 2 instances on 1 server, so you have all the shard
data, logically you
Yes it is doing a distributed search, Solr cloud will do that by default
unless you say distrib=false.
My understanding of Solr's Load balancer is that it picks a random instance
from the list of available instances serving each shard.
So in your example:
1. Query comes in to Server 1,
Had a scenario on a dev system here that has me confused.
We have a simple Solr cloud (dev) system running 4.3, 4 shards, running on
2 machines (2 instances per machine), 2 ZKs (external) and no replicas (or
1 replica depending on your definition, we only have 1 instance of each
shard!)
Yes, we
Ah I vaguely remember seeing that when we first used logback (on 4.0), as
Shawn says I think its the problem that as logback is starting up, where can
it log before it has configured its logging (catch-22), answer it has to log
to its own internal format.
If memory serves we just disabled
Just done an upgrade from Solr (cloud) 4.0 to 4.3 and noticed that
clusterstate.json now contains the IP address instead of the hostname for
each shard descriptor.
Was this a conscious change? It caused us some pain when migrating and
breaks our own admin tools, so just checking if this is
What actual error do you see in Solr? Is there an exception and if so, can
you post that? As I understand it, datatDir is set from the solrconfig.xml
file, so either your instances are picking up the wrong file, or you have
some override which is incorrect? Where do you set solr.data.dir, at
Just on our experiences, we have a large collection (350M documents, but
1.2Tb in size spread across 4 shards/machines and multiple replicas, we may
well need more) and the first thing we needed to do for size estimation was
to work out how big a set number of documents would be on disk. So we
Normally whenever I see a 503, that means the Solr Cloud has one of its
shards down, i.e. there isn't a full collection available.
You can see them at index time if you have lost connection to zookeeper but
searches should be ok.
If you see them on searches, it (in my experience) means the
Yes, that's a good solution to this whole Cloud/DC issue (which seems to
have cropped up several times), you have one of the ZK instances external
to the cloud. You can lose any 1 machine, and the others are still ok.
The next level would be a Cloud of 3 servers + 2 external ZKs, that would
The consensus here seems to be NOT to do that. Have 2 SolrClouds, one per
DC and feed them the same data. That way each cloud has its own quorum of
ZKs, and as long as both clouds are up, they should be in sync. You have
true DR in that each site is separate and not dependent on any data from
This is exactly the problem we are encountering as well, how to deal with
the ZK Quorum when we have multiple DCs. Our index is spread so that each
DC has a complete copy and *should* be able to survive on its own, but how
to arrange ZK to deal with that. The problem with Quorum is we need an
Interesting, that sounds like a bit of an issue really, the cloud is
hiding the real error. Presumably the non ok status: 500 (buried at the
bottom of your trace) was where the actual shard was returning the error
(we've had issues with positional stuff before and it normally says
something
Hi Mark,
I get that use case, if the non-leader dies, when it comes back it has to
allow for recovery, that makes perfect sense.
I guess I was (naively!) assuming there was an optimized scenario if the
leader dies, and is the first one to come back (is still therefore leader),
there is no
101 - 126 of 126 matches
Mail list logo