Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread Toke Eskildsen
cValues changed with schema version 1.6 (https://issues.apache.org/jira/browse/SOLR-8220). Have you checked that the same number of fields are returned for the two setups? - Toke Eskildsen?

Re: Solr Float/Double multivalues fields

2020-07-03 Thread Toke Eskildsen
es-vs-stored-fields-apache-solr-features-and-performance-smackdown.html BTW: The documentation should definitely mention that stored preserves order & duplicates. It is not obvious. - Toke Eskildsen, Royal Danish Library

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Toke Eskildsen
o be processed, it indicates that the cluster is overloaded. Increasing the timeout is just a band-aid. - Toke Eskildsen, Royal Danish Library

Re: multivalue faceting term optimization

2020-03-09 Thread Toke Eskildsen
* OR hash:03* OR hash:04* -> Facets for 1950K documents (100M/256 * 5) Prefix queries might prove to be too expensive, so you could also create fields with random values from 0-9, 0-99, 0-999 etc. and do exact match filtering on those to get the number of hits down. - Toke Eskildsen, Royal Danish Library

Re: Number of requested rows

2020-02-05 Thread Toke Eskildsen
eeding-up-core-search/ and there is https://issues.apache.org/jira/browse/LUCENE-8875 which takes care of the Sentinel thing in solr 8.2. - Toke Eskildsen, Royal Danish Library

Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Toke Eskildsen
e problem. - Toke Eskildsen, Royal Danish Library

Re: How to block expensive solr queries

2019-10-08 Thread Toke Eskildsen
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: > /solr/mycollection/select?stats=true=unique_ids > cdistinct=true ... > Is there a way to block certain solr queries based on url pattern? > i.e. ignore the stats.calcdistinct request in this case. It sounds like it is possible for users to issue

Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Toke Eskildsen
d indexes maximizes throughput at the possible cost of latency, so that seems fitting for your requirements. - Toke Eskildsen, Royal Danish Library

Re: Incremental export of a huge collection

2019-09-09 Thread Toke Eskildsen
rFactory that is mentioned: http://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/schema/DatePointField.html - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-09 Thread Toke Eskildsen
ead up on that and respond in that thread, to avoid hi-jacking this one. It probably won't be this week as Real Work is heating up. - Toke Eskildsen, Royal Danish Library

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-08 Thread Toke Eskildsen
last year we experienced similar > problems. The iterator-based DocValues implementation in Solr 7 has a performance issue with large segments, with symptoms akin to SOLR-8096. If you have not already solved your problems, Solr 8 (with an upgraded index) might help. - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-07 Thread Toke Eskildsen
ing grouping fully. It does not explain the difference between Solr 4 & 8, but I agree with David that we need to isolate what causes the overall slowdown first, before we can attempt to fix the Solr 4 vs 8 thing. - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-04 Thread Toke Eskildsen
uncompress in Solr 8 (but less IO)) * Do you have any response related defaults in your solrconfig.xml, such as faceting or grouping? (You might be doing heavy aggregation even if you don't explicitly ask for it) - Toke Eskildsen, Royal Danish Library

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Toke Eskildsen
rformance setup with a budget for tuning: Matching terms and joining filters is core Solr (Lucene really) functionality. Plain query & filter-matching time tend to be dwarfed by aggregations (grouping, faceting, stats). - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-03 Thread Toke Eskildsen
large documents? How big is your index in bytes? - Toke Eskildsen

Re: Multi-lingual Search & Accent Marks

2019-08-31 Thread Toke Eskildsen
ght now and keep > going back and forth on whether we should preserve accent marks. Going with what we do, my answer would be: Yes, do preserve and also remove :-). You could even have 3 or more levels of normalisation, depending on how much time you have for polishing. - Toke Eskildsen

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-28 Thread Toke Eskildsen
rted for each collect call? Could you share your code somewhere? - Toke Eskildsen

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-27 Thread Toke Eskildsen
lr, so the safe (best performance) solution would be to implement something like the pseudo code I wrote earlier. - Toke Eskildsen, Royal Danish Library

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-27 Thread Toke Eskildsen
; isValid(dv.binaryValue().utf8ToString()) in your collect method. https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/index/DocValues.html#getSorted-org.apache.lucene.index.LeafReader-java.lang.String - If you want to speed it up further, you can use BytesRefs as keys in your customMap i

Re: Configuration recommendation for SolrCloud

2019-06-29 Thread Toke Eskildsen
s which cluster to use? Can it be divided further? - Toke Eskildsen

Re: Is Solr can do that ?

2019-06-22 Thread Toke Eskildsen
obs. Scaling this specialized setup to your corpus size would require about 3TB of SSD, 64MB RAM and 4 CPU-cores, divided among 4 shards. You are likely to need quite a lot more than that, so this is just to say that at this scale the use of the index matters _a lot_. - Toke Eskildsen

Re: not able to optimize

2019-06-04 Thread Toke Eskildsen
as worst case for storage usage during optimize is a total of 3*index size. - Toke Eskildsen, Royal Danish Library

Re: Graph query extremely slow

2019-05-22 Thread Toke Eskildsen
h query (any query really) and asking for 1M results to be returned. With that in mind, what do you set rows to? - Toke Eskildsen, Royal Danish Library

Re: Solr8.0.0 Performance Test

2019-05-21 Thread Toke Eskildsen
aceting-parameters - Toke Eskildsen, Royal Danish Library

Re: Graph query extremely slow

2019-05-20 Thread Toke Eskildsen
pache.org/jira/browse/SOLR-13013 If it is easy for you to test, you could try Solr 8 as that should work better for random access of DocValues. - Toke Eskildsen, Royal Danish Library

Re: Solr8.0.0 Performance Test

2019-05-20 Thread Toke Eskildsen
e indexes and/or setups where performance is very important. - Toke Eskildsen, Royal Danish library

Re: Performance of /export requests

2019-05-11 Thread Toke Eskildsen
er a regression for DocValues that is very visible when using export. See https://issues.apache.org/jira/browse/SOLR-13013), so I would expect it to be slower than Solr 5. You could try with Solr 8 where this regression should be mitigated somewhat. - Toke Eskildsen

Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-08 Thread Toke Eskildsen
Instead you can look at Common Grams, where your high-frequency words gets concatenated with surrounding words. This only works with phrases though. There's a nice article at https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
me the work. - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
Solr version. Currently it is not possible to tweak the docValues indexing parameters outside of code changes. Do note that we're still operating on guesses here. The cause for your regression might easily be elsewhere. - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
e tiny, but mistakes happen. With that in mind, do you have DocValues enabled for a lot of your fields? Performance issues like this one are notoriously hard to debug remote. Is it possible for you to share your setup and your test data? - Toke Eskildsen, Royal Danish Library

Re: Using copyFields

2019-03-28 Thread Toke Eskildsen
ld , the query > doesn't fetch results. You need to tell Solr which fields it should search: df=cfield https://lucene.apache.org/solr/guide/7_7/the-standard-query-parser.html#standard-query-parser-parameters - Toke Eskildsen, Royal Danish Library

Re: Solr index slow response

2019-03-18 Thread Toke Eskildsen
due to stop-the-world garbage collections. Try dialing Xmx _way_ down: If your batches are only 5MB each, try Xmx=20g or less. I know that the stats above says that Solr uses 111GB, but the JVM has a tendency to expand the heap quite a lot when it is getting hammered. If you want to check beforehand, you can see how much memeory is freed from full GCs in the GC-log. - Toke Eskildsen, Royal Danish Library

Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread Toke Eskildsen
On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote: > http://lucene.apache.org/solr/8_0_0/changes/Changes.html Thank you for the hard work of rolling the release! Looking forward to upgrading. - Toke Eskildsen, Royal Danish Library

Re: What is the benefit of stored="true" in *PointFields

2019-02-07 Thread Toke Eskildsen
val) doc values performance for indexes with many documents. - Toke Eskildsen, royal Danish Library

Re: Infrastructure required for SOLR 7.5

2018-12-12 Thread Toke Eskildsen
hat you are unsure of. - Toke Eskildsen

Re: URL Case Sensitive/Insensitive

2018-12-11 Thread Toke Eskildsen
ves, but they do add up. For most practical purposes (URL-lookup & grouping, following links between archived pages, resolving embedded resources from pages) we use the heavily normalised URL. - Toke Eskildsen

Re: Moving Solr index from Staging to Production

2018-11-28 Thread Toke Eskildsen
Arunan Sugunakumar wrote: > https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html We (also?) prefer to keep our stage/build setup separate from production. Backup + restore works well for us. It is very fast, as it is basically just copying the segment files. - T

Re: Able to search with indexed=false and docvalues=true

2018-11-21 Thread Toke Eskildsen
dea: Issue a query with debug=sanity and get a report from checks on both the underlying index and the issued query for indicators of problems: https://github.com/tokee/lucene-solr/issues/54 - Toke Eskildsen, Royal Danish Library

Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Toke Eskildsen
I'll just note that faceting on a DocValues=true indexed=false field on a multi-shard index also has a performance penalty as the field will be slow-searched (using the DocValues) in the secondary fine-counting phase. - Toke Eskildsen, Royal Danish Library

Re: Median in Solr json facet api

2018-11-14 Thread Toke Eskildsen
On Wed, 2018-11-14 at 17:53 +0530, Anil wrote: > I don;t see median aggregation in JSON facet api documentation. It's the 50 percentile: https://lucene.apache.org/solr/guide/7_5/json-facet-api.html#metrics-example - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-11-14 Thread Toke Eskildsen
ent with different amounts of concurrent requests to see what gives the optimum throughput. This also tells you how much extra hardware you need, if you decide you need to expand.. - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-11-08 Thread Toke Eskildsen
ully that would unearth very few problematic parts, such as regexp, function or prefix-wildcard queries. There might be ways to replace or tune those. - Toke Eskildsen, Royal Danish Library

Re: Re: SolrCloud scaling/optimization for high request rate

2018-11-05 Thread Toke Eskildsen
that is the problem. If using fl=id speeds up substantially, the next step would be to add fields gradually until (hopefully) there is a sharp performance decrease. - Toke Eskildsen, Royal Danish Library

Re: Index optimization takes too long

2018-11-04 Thread Toke Eskildsen
e bottleneck. Are you looking at overall CPU usage or single-core? When we run force merge, we have a single core at 100% while the rest are idle. NB: There is currently a thread "Static index, fastest way to do forceMerge" in the Lucene users mailinglist, which seem to be quite parallel t

Re: Re: SolrCloud scaling/optimization for high request rate

2018-11-01 Thread Toke Eskildsen
ly how that affects performance? 1) Only request simple sorting by score 2) Reduce rows to 0 3) Increase rows to 100 4) Set fl=id only - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-10-29 Thread Toke Eskildsen
uring (which of course also takes resources, this time in the form of work hours). My rough suggestion of a factor 10 for your system is guesswork erring on the side of a high number. - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-10-27 Thread Toke Eskildsen
with a max amount of concurrent connections and a sensible queue. Preferably after a bit of testing to locale where the highest throughput is. It won't make you hit your overall goal, but it can move you closer to it. - Toke Eskildsen

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen
lues=true, Solr treats all existing documents as having docValues enabled for that field. As there is no docValue content, DocValues-aware functionality such as sorting and faceting will not work for that field, until the documents has been re-indexed. - Toke Eskildsen

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen
p with the patch. - Toke Eskildsen

Re: Device I/O trouble with solr 7.5

2018-10-22 Thread Toke Eskildsen
ivity (LUCENE-8374). With that in mind, could you tell me * How many documents you have in your index? * Whether you use stored or docValues for the fields that you retrieve as part of the search result? * If you perform heavy faceting, grouping or stats? Maybe provide a sample query, if you are able? Than

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen
here. If you have an index that you update very rarely, it can save memory and processing power. If you have a live index where you add and delete documents, it will probably be a bad idea. One strategy used with time series data is to have old and immutable data in dedicated collections, which can then be optimized. - Toke Eskildsen

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen
up, so I would expect streaming to do the same. I would not expect a 30% increase to cause something serious on that account though. How many documents in your index? - Toke Eskildsen, Royal Danish Library

Re: OOM Solr 4.8.1

2018-09-18 Thread Toke Eskildsen
ive traffic in short bursts, not for a sustained high traffic level. This advice is independent of Shawn's BTW. You could increase your server capabiblities 10-fold and it would still apply. - Toke Eskildsen, Royal Danish Library

Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-09-03 Thread Toke Eskildsen
ind, have you considered posting a write-up of your hard work somewhere? It seems a shame only to have is as an input on this mailing list. - Toke Eskildsen, Royal Danish Library

Re: Solr admin client crash - caused by too many fields

2018-08-14 Thread Toke Eskildsen
ow why the Admin GUI has a timeout at all. It seems to me that anyone capable of using that GUI is also capable of pressing reload if Solr takes too long to respond. But I digress. What Shawn & Erick says still stands: Having 60K fields is an outlier in Solr Land and as such warrants cautio

Re: Solr Server crashes when requesting a result with too large resultRows

2018-08-08 Thread Toke Eskildsen
the result set. I would argue your OOM with small result sets and huge rows is a good thing: You encounter the problem immediately, instead of hitting it at some random time when a match-a-lot query is issued by a user. - Toke Eskildsen, Royal Danish Library

Re: facet.method=uif not working in solr cloud?

2018-02-08 Thread Toke Eskildsen
g/jira/ browse/SOLR-8988 Try setting facet.distrib.mco=true - Toke Eskildsen, Royal Danish Library

Re: With 100% CPU usage giving out of memory exception and solr is not responding

2017-12-29 Thread Toke Eskildsen
eting on a high-cardinality field. If the query above is representative of your general queries, I'll guess it's the many docs + large filterCache one. It's fairly easy to check: * What is your Xmx? * How many documents in your index? * What is your filterCache size? - Toke Eskildsen

Re: OOM spreads to other replica's/HA when OOM

2017-12-19 Thread Toke Eskildsen
ueries from the same user and then blacklisting the user? But what if the query is a link shared on a forum? And so forth. Hardening by blacklisting is a game that is hard to win. So to paraphrase Shawn: Make sure your users cannot issue OOMing queries. - Toke Eskildsen, Royal Danish Library - Aarhus

Re: JVM GC Issue

2017-12-02 Thread Toke Eskildsen
ncluding all the parameters? - Toke Eskildsen

Re: JVM GC Issue

2017-12-01 Thread Toke Eskildsen
7:00-19:00 and 2017-12-01 08:00-12:00. If you cannot share, please check if you have excessive traffic around that time or if there is a lot of UnInverting going on (triggered by faceting on non.DocValues String fields). I know your post implies that you have already done so, so this is more of a

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Toke Eskildsen
ch, I would probably still use Threads (wrapped as Futures) as they are easy to work with. Getting into thousands of connections in Solr seems like a danger sigh to me, whether they are done async or not. - Toke Eskildsen

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Toke Eskildsen
olr can create. I guess you have ~20 shards in your cloud? The issue of the default 10K limit is an old one: https://issues.apache.org/jira/browse/SOLR-7344 I suggest you put a proxy in from of your Solr-cloud to handle queueing of incoming requests. - Toke Eskildsen

Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Toke Eskildsen
p filter-queries for all the different groups so that the users does not pay the first-call penalty. This requires your filter- cache to be large enough to hold all the author lists. - Toke Eskildsen, Royal Danish Library

Re: Solr7: Very High number of threads on aggregator node

2017-11-25 Thread Toke Eskildsen
ize too. Could you check if you have any "Overlapping onDeckSearchers" in your solr.log? - Toke Eskildsen

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Toke Eskildsen
s of whether the previous queries has finished or not? It it is the latter, one explanation could be that your Solr 7 setup is simply slower on average to respond than your Solr 4 setup, to the point where it cannot keep up with the influx of queries. - Toke Eskildsen

Re: Solr7: Very High number of threads on aggregator node

2017-11-18 Thread Toke Eskildsen
mple, 200 concurrent requests and a queue to hold the rest? Even with an overprovisioning of 4 requests/CPU-core to get them running close to 100% we're talking 1000 CPU-cores in your system. - Toke Eskildsen

Re: Faceting Word Count

2017-11-09 Thread Toke Eskildsen
ent search criteria: Do they all take ~1 minute or just the first? - Toke Eskildsen, Royal Danish Library 

Re: Facets based on sampling

2017-10-24 Thread Toke Eskildsen
tracking idea at https://sbdevel.wordpress.com/2014/03/17/fast-faceting-with-high-cardinality-and-small-result-set/ - Toke Eskildsen

Re: Solr deep paging queries run very slow due to redundant q param

2017-10-24 Thread Toke Eskildsen
is a design decision. In order to provide pagination without recomputing the result set, you would need a guaranteed server-side state. Solr does not implement that pattern and thanks for that. - Toke Eskildsen, Royal Danish Library

Re: Really slow facet performance in 6.6

2017-10-23 Thread Toke Eskildsen
ds have > docValues=false since they are multi-valued). Debug info below. docValues works fine with multi-values (at least for Strings). - Toke Eskildsen

Re: 3 color jvm memory usage bar

2017-10-23 Thread Toke Eskildsen
find it very usable for observing and tweaking heap size. The GC-log is better. - Toke Eskildsen, Royal Danish Library

Re: Solr staying constant on popularity indexes

2017-10-10 Thread Toke Eskildsen
complicated syntax Solr > uses. I think V2 APIs are coming to address this, but they did come a > bit late in the game. I guess you mean JSON APIs? Anyway, I fully agree that the old Solr syntax is extremely clunky as soon as we move beyond the simple "just supply a few search terms&

Re: Doubt about facet with dates

2017-10-06 Thread Toke Eskildsen
h-dates.html#Workin gwithDates-DateMath Your query would be something like mydate:[* TO NOW/DAY] AND mydate:[NOW+1DAY/DAY TO *] - Toke Eskildsen, Royal Danish Library

Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Toke Eskildsen
ooking much further ahead, the whole caching system would benefit from having constraints that encompasses all the shards & collections served in the same Solr. Unfortunately it is a daunting task just to figure out the overall principles in this. - Toke Eskildsen, Royal Danish Library

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Toke Eskildsen
32. Best solution: Use maxSizeMB (if it works) Second best solution: Reduce to 32 or less Third best, but often used, solution: Hope that most of the entries are sparse and will remain so - Toke Eskildsen, Royal Danish Library

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Toke Eskildsen
you indexing while you search? If so, you need to set auto-warm or state a few explicit warmup-queries. If not, your measuring will not be representative as it will be on first-searches, which are always slower than warmed-searches. - Toke Eskildsen, Royal Danish Library

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Toke Eskildsen
shard. Only ask for the number you need. Same goes for rows BTW. - Toke Eskildsen

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Toke Eskildsen
hope the heap size will continue to sustain for the index size.  You can check the memory usage in the admin GUI. - Toke Eskildsen, Royal Danish Library

Re: Freeze Index

2017-09-14 Thread Toke Eskildsen
2GB JVM or something like that. One of the symptoms for having too large a memory allocation for the JVM are occasional long pauses due to garbage collection. However, you should not lose anything - it is just a pause. Can you describe in more detail what you mean by freeze and losing data? -

Re: slow solr facet processing

2017-09-05 Thread Toke Eskildsen
. A fairly easy optimization would be to replace the BytesRef[] indexedTermsArray with a BytesRefArray. - Toke Eskildsen, Royal Danish Library

Re: slow solr facet processing

2017-09-04 Thread Toke Eskildsen
memory? What I am aiming at is if this is primarily a "many relatively slow random access"-thing or more due to the way DocValues are represented in the segments (the codec). - Toke Eskildsen, Royal Danish Library

Re: How many collections in a solrcloud are too many, how to determine this?

2017-08-09 Thread Toke Eskildsen
n-trivial overhead going from 1 to more than 1 shard. If your collections are not too large, chances are that you will lower your hardware requirements (and/or improve response times) by using only 1 shard/collection. - Toke Eskildsen, Royal Danish Library

Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-03 Thread Toke Eskildsen
Solr, but even then you might want to have a hard limit, just to avoid the occasional "cat steps on F5 and the browser issues a gazillion requests"-scenario. --  Toke Eskildsen, Royal Danish Library

Re: Questions about typical/simple clustered Solr software and hardware architecture

2017-06-24 Thread Toke Eskildsen
on Solr instead. - Toke Eskildsen

Re: Will Solr support google like organic search ?

2017-06-09 Thread Toke Eskildsen
could say. Out-of-the-box Solr is pure relevance ranked. By the definition in the Wikipedia-article, it is already Organic Search. I think you need to go back to your client and ask what the client thinks "Organic Search" is. --  Toke Eskildsen, Royal Danish Library

Re: Slow inserting with SolrCloud when increasing replicas

2017-06-07 Thread Toke Eskildsen
single physical machine that could be an explanation. What is your hardware-setup? -- Toke Eskildsen, Royal Danish Library

Re: solr 6 at scale

2017-05-26 Thread Toke Eskildsen
ot an expert in segment merge mechanics). We're also using a 1 Solr/shard setup, but with SolrCloud. Our initial rationale for 1 Solr/shard was to avoid long GC-pauses due to large heaps, but that does not seem to be a problem here. Now we stick to it as it works fine and makes for simple logisti

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
Nawab Zada Asad Iqbal wrote: > @Toke, I stumbled upon your page last week but it seems that your huge > index doesn't receive a lot of query traffic. It switches between two kinds of usage: Everyday use is very low traffic by researchers using it interactively: 1-2

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
Shawn Heisey <apa...@elyograg.org> wrote: > On 5/24/2017 3:44 AM, Toke Eskildsen wrote: >> It is relatively easy to downgrade to an earlier release within the >> same major version. We have not switched to 6.5.1 simply because we >> have no pressing need for it -

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
works well for us. I guess it depends quite a bit on your need for stability. We are a library and uptime is only "best effort". --  Toke Eskildsen, Royal Danish Library

Re: Recommended index-size per core

2017-05-10 Thread Toke Eskildsen
ning) ought to ensure a fully cached index. - Toke Eskildsen

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen
Shawn Heisey wrote: > Adding more shards as Toke suggested *might* help,[...] I seem to have phrased my suggestion poorly. What I meant to suggest was a switch to a single shard (with 4 replicas) setup, instead of the current 2 shards (with 2 replicas). - Toke

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen
use q instead of fq for the part of your request that changes? -- Toke Eskildsen, Royal Danish Library

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-15 Thread Toke Eskildsen
Chetas Joshi wrote: > Thanks for the insights into the memory requirements. Looks like cursor > approach is going to require a lot of memory for millions of documents. Sorry, that is a premature conclusion from your observations. > If I run a query that returns only 500K

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-11 Thread Toke Eskildsen
? Does it mean solr will serve stale data( i.e. > send stale data to the slaves) ignoring the changes from the second > commit? [...] Sorry, I am not that familiar with the details of master-slave-setups. -- Toke Eskildsen, Royal Danish Library

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-06 Thread Toke Eskildsen
e two problems may be linked. Quick sanity check: Look for "Overlapping onDeckSearchers" in your solr.log to see if your memory problems are caused by multiple open searchers: https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm ingSearchers.3DX.22_mean.3F -- Toke Eskildsen, Royal Danish Library

Re: SOLR Data Locality

2017-03-17 Thread Toke Eskildsen
h-webscale/ I don't understand the expected gain of adding replicas, if the data are remote. Why can't the replica Solrs run on the nodes with the data? Do you have very CPU-intensive search? - Toke Eskildsen

Re: Indexing CPU performance

2017-03-15 Thread Toke Eskildsen
e. You can get a detailed breakdown by doing VisualVM profiling and doing a snapshot instead of sampling, but be prepared to restart your Solr afterwards as that is quite intrusive. Another (and simpler) option would be to check how much IO-wait there is with 'top' from a shell. - Toke Eskildse

  1   2   3   4   5   6   >