https://github.com/whitepages/solrcloud_manager was designed to provide some
easier operations for common kinds of cluster operation.
It hasn’t been tested with 6.0 though, so if you try it, please let me know
your experience.
On 5/23/16, 6:28 AM, "Tom Evans"
The PingRequestHandler contains support for a file check, which allows you to
control whether the ping request succeeds based on the presence/absence of a
file on disk on the node.
http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html
I suppose you could
My first thought is that you haven’t indexed such that all values of the field
you’re grouping on are found in the same cores.
See the end of the article here: (Distributed Result Grouping Caveats)
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
And the “Document Routing”
That case related to consistency after a ZK outage or network connectivity
issue. Your case is standard operation, so I’m not sure that’s really the same
thing. I’m aware of a few issues that cam happen if ZK connectivity goes wonky,
that I hope are fixed in SOLR-8697.
This one might be a
I have a solr 5.4 cluster with three collections, A, B, C.
Nodes either host replicas for collection A, or B and C. Collections B and C
are not currently used - no inserts or queries. Collection A is getting
significant query traffic, but no insert traffic, and queries are only directed
to
have replicas B and C.
>
>What the "something" is that sends requests I'm not quite sure, but
>that's a place
>to start.
>
>Best,
>Erick
>
>On Mon, May 16, 2016 at 11:08 AM, Jeff Wartes <jwar...@whitepages.com> wrote:
>>
>> I have a solr 5.4 clus
An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for
80k ids though is basically 80k searches as far as Solr is concerned, so it’s
not altogether surprising that it takes a while. Your complaint seems to be
that the query planner doesn’t know in advance that should be
I have no numbers to back this up, but I’d expect Atomic Updates to be slightly
slower than a full update, since the atomic approach has to retrieve the fields
you didn't specify before it can write the new (updated) document.
On 4/19/16, 11:54 AM, "Tim Robertson"
If you’re already using java, just use the CloudSolrClient.
If you’re using the default router, (CompositeId) it’ll figure out the leaders
and send documents to the right place for you.
If you’re not using java, then I’d still look there for hints on how to
duplicate the functionality.
On
I’m all for finding another way to make something work, but I feel like this is
the wrong advice.
There are two options:
1) You are doing something wrong. In which case, you should probably invest in
figuring out what.
2) Solr is doing something wrong. In which case, you should probably invest
At the risk of thread hijacking, this is an area where I don’t know I fully
understand, so I want to make sure.
I understand the case where a node is marked “down” in the clusterstate, but
what if it’s down for less than the ZK heartbeat? That’s not unreasonable, I’ve
seen some
SolrCloud never creates replicas automatically, unless perhaps you’re using the
HDFS-only autoAddReplicas option. Start the new node using the same ZK, and
then use the Collections API
(https://cwiki.apache.org/confluence/display/solr/Collections+API) to
ADDREPLICA.
The replicationFactor you
data?
>
>Thanks!
>Kent
>
>2016-07-12 23:02 GMT+08:00 Jeff Wartes <jwar...@whitepages.com>:
>
>> Well, two thoughts:
>>
>>
>> 1. If you’re not using solrcloud, presumably you don’t have any replicas.
>> If you are, presumably you do. This makes fo
This isn’t really a question, although some validation would be nice. It’s more
of a warning.
Tldr is that the insert order of documents in my collection appears to have had
a huge effect on my query speed.
I have a very large (sharded) SolrCloud 5.4 index. One aspect of this index is
a
h routing:
https://sematext.com/blog/2015/09/29/solrcloud-large-tenants-and-routing/
Regards,
Emir
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On 11.08.2016 19:39, Je
It sounds like the node-local version of the ZK clusterstate has diverged from
the ZK cluster state. You should check the contents of zookeeper and verify the
state there looks sane. I’ve had issues (v5.4) on a few occasions where leader
election got screwed up to the point where I had to
Well, two thoughts:
1. If you’re not using solrcloud, presumably you don’t have any replicas. If
you are, presumably you do. This makes for a biased comparison, because
SolrCloud won’t acknowledge a write until it’s been safely written to all
replicas. In short, solrcloud write time is
This might come a little late to be helpful, but I had a similar situation with
Solr 5.4 once.
We ended up finding a ZK snapshot we could restore, but we did also get the
cluster back up for most of the interim by taking the now-empty ZK cluster,
re-uploading the configs that the collections
A variation on #1 here - Use the same cluster, create a new collection, but use
the createNodeSet option to logically partition your cluster so no node has
both the old and new collection.
If your clients all reference a collection alias, instead of a collection name,
then all you need to do
Sounds similar to a thread last year:
http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html
On 2/1/17, 7:49 AM, "tedsolr" wrote:
I have version 5.2.1. Short of an upgrade, are there any remedies?
Adding my anecdotes:
I’m using heavily tuned ParNew/CMS. This is a SolrCloud collection, but
per-node I’ve got a 28G heap and a 200G index. The large heap turned out to be
necessary because certain operations in Lucene allocate memory based on things
other than result size, (index size
Hah, interesting.
The fact that the CMS collector fails back to a *single-threaded* collection on
concurrent-mode-failure had me seriously considering trying the Parallel
collector a year or two ago. I figured out (and stopped) the queries that were
doing the sudden massive allocations that
https://issues.apache.org/jira/browse/SOLR-5894 had some pretty interesting
looking work on heuristic counts for facets, among other things.
Unfortunately, it didn’t get picked up, but if you don’t mind using Solr 4.10,
there’s a jar.
On 11/4/16, 12:02 PM, "John Davis"
Expanding on my comment on the ticket, I’m really quite happy with using
codahale/dropwizard metrics with Solr. I don’t know if I’m comfortable just
sharing a screenshot of the resulting grafana dashboard, but I’ve got, per-host:
- Percentile latencies and rates for GET vs POST (which in
I’ll also mention the choice to improve processing speed by allocating more
memory, which increases the importance of GC tuning. This bit me when I tried
using it on a larger index.
https://issues.apache.org/jira/browse/SOLR-9125
I don’t know if the result grouping feature shares the same
I’d prefer it if the alias was required to be removed, or pointed elsewhere,
before the collection could be deleted.
As a best practice, I encourage all SolrCloud users to configure an alias to
each collection, and use only the alias in their clients. This allows atomic
switching between
I found this, which intends to explore the usage of RoaringDocIdSet for solr:
https://issues.apache.org/jira/browse/SOLR-9008
This suggests Lucene’s filter cache already uses it, or did at one point:
https://issues.apache.org/jira/browse/LUCENE-6077
I was playing with id set implementations
Here’s an earlier post where I mentioned some GC investigation tools:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E
In my experience, there are many aspects of the Solr/Lucene memory allocation
model that scale
tldr: Recently, I tried moving an existing solrcloud configuration from a local
datacenter to EC2. Performance was roughly 1/10th what I’d expected, until I
applied a bunch of linux tweaks.
This should’ve been a straight port: one datacenter server -> one EC2 node.
Solr 5.4, Solrcloud, Ubuntu
It’s presumably not a small degradation - this guy very recently suggested it’s
77% slower:
https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/
The other reason that blog post is interesting to me is that his benchmark
utility showed the work of entering the kernel
I started with the same three-node 15-shard configuration I’d been used to, in
an RF1 cluster. (the index is almost 700G so this takes three r4.8xlarge’s if I
want to be entirely memory-resident) I eventually dropped down to a 1/3rd size
index on a single node (so 5 shards, 100M docs each) so I
Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS.
On 5/1/17, 7:22 PM, "Will Martin" <wmartin...@outlook.com> wrote:
Ubuntu 16.04 LTS - Xenial (HVM)
Is this your Xenial version?
On 5/1/2017 6:37 PM, Jeff Wartes wrote:
> I tri
with you having such different
performance between local and EC2
But thanks for telling us about this! It's totally baffling
Erick
On Fri, Apr 28, 2017 at 9:09 AM, Jeff Wartes <jwar...@whitepages.com> wrote:
>
> tldr: Recently, I tried moving an existing
We settled on the R4.2XL... The R series is labeled "High-Memory"
Which instance type did you end up using?
On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 4/28/2017 10:09 AM, Jeff Wartes wrote:
> > tldr: Recen
I’ve been messing around with the Solr 7.2 autoscaling framework this week.
Some things seem trivial, but I’m also running into questions and issues. If
anyone else has experience with this stuff, I’d be glad to hear it.
Specifically:
Context:
-One collection, consisting of 42 shards, where
lica": "<7", "node":"#ANY"} , means don't put more than 7
replicas of the collection (irrespective of the shards) in a given
node
what do you mean by distinct 'RF' ? I think we are screwing up the
terminologies a bit here
On Wed, Feb 7, 2018
I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query, I want to gather a relevance-ranked
subset of the child documents. It seemed like the subquery transformer would be
ideal:
for the duration
of the restore
But the former isn't tenable if you're sharding due to space constraints, and
the latter can't be easily predicted.
On 3/28/18, 11:30 AM, "Shawn Heisey" <apa...@elyograg.org> wrote:
On 3/28/2018 10:34 AM, Jeff Wartes wrote:
> The backup/res
ere is a shared filesystem requirement. It would be nice if this
> Solr feature could be enhanced to have more options like backing up
> directly to another SolrCloud using replication/fetchIndex like your cool
> solrcloud_manager thing.
>
> On Wed, Mar 28, 2018 at
't a query so it isn't parsed. So I have no way to
dereference the "$row.[shard]".
On 3/27/18, 4:00 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote:
I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query,
There're some edge cases around the response based on the timing. In case it's
useful:
Here's the bit from solrcloud-haft: (java)
The backup/restore still requires setting up a shared filesystem on all your
nodes though right?
I've been using the fetchindex trick in my solrcloud_manager tool for ages now:
https://github.com/whitepages/solrcloud_manager#cluster-commands
Some of the original features in that tool have been
101 - 142 of 142 matches
Mail list logo