Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
sing something else. It should not make a difference (as your non-truncated queries are fast), but could you try to reduce the slow request to the simplest possible? No grouping, faceting or other special processing, just q=network se* - Toke Eskildsen, State and University Library, Denmark

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
ected is going on while you test? > How can I disable replication(as it is implicitly enabled) permanently as > in our case we are not using it but can see warnings related to leader > election? If you are using spinning drives and only have 32GB of RAM in total in each machine, you are probably st

Re: Performance degradation with two collection on same sole instance

2015-10-30 Thread Toke Eskildsen
tabilizes. - Toke Eskildsen, State and University Library, Denmark

Re: Solr Pagination

2015-10-21 Thread Toke Eskildsen
done #segments benchmarking for your huge datasets? Only informally. However, the guys at UKWA run a similar scale index and have done multiple segment-count-oriented tests. They have not published a report, but there are measurements & graphs at https://github.com/ukwa/shine/tree/master/pytho

Re: LIX readability index calculation by solr

2015-10-21 Thread Toke Eskildsen
tternReplaceCharFilter, matching on something like ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper}) and replacing with 'capital' (the regexp above probably fails - it was just from memory). - Toke Eskildsen

Re: Anyone users IBM J9 JVM with 32G max heap ? Tuning recommendations?

2015-10-19 Thread Toke Eskildsen
VM works the same as the Oracle one in this aspect, but for the Oracle one, it is important to set Xmx _below_ 32GB instead of at exactly 32GB: https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ You might want to try the program at that page to check where the IBM l

Re: Solr Pagination

2015-10-12 Thread Toke Eskildsen
th sufficiently long pauses between index updates. Nightly index updates with few active users at that time could be an example. - Toke Eskildsen, State and University Library, Denmark

Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
tarted AFTER 29 seconds. Any logic behind > what I am seeing here? It shows that the shard-searches themselves is not what is slowing you down. Are the returned documents very large? Try setting fl=id,score and see if it brings response times below 1 second. - Toke Eskildsen

Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
s. A manual process that requires clicking next 1000 times is a severe indicator that something can be done differently. - Toke Eskildsen

Re: Solr Pagination

2015-10-09 Thread Toke Eskildsen
rt parameter, the difference is small as long as you stay below a start of 1000. 10K might also work for you. Do your users page beyond that? - Toke Eskildsen

Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-07 Thread Toke Eskildsen
ve the amount of memory available for new processes. If you start a new and memory-hungry process, it will take the memory from the free pool first, then from the disk cache. - Toke Eskildsen, State and University Library, Denmark

Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-06 Thread Toke Eskildsen
on the machine, only 1 CPU is running at full tilt. There is always a bottleneck. What might help is that the SSD (probably) does not get bogged down by the process, so it should be much better at handling other requests while the optimization is running. - Toke Eskildsen, State and University L

Re: Facet queries blow out the filterCache

2015-10-02 Thread Toke Eskildsen
way, operating under the assumption that the single-core facet request for some reason acts as a distributed call, the key to avoid the fine-counting is to ensure that _all_ possibly relevant term counts has been returned in the first facet phase. Try setting both facet.mincount=0 and facet.limit=-1. - T

Re: [poll] virtualization platform for SOLR

2015-10-01 Thread Toke Eskildsen
the virtualized instances, we only use local SSDs to hold our index data. That might affect the trade-off as even slight delays in IO becomes visible, when storage access times are < 0.1ms instead of > 1ms. I suspect the relative impact of virtualization is less with spinning drives or networ

Re: Regression tests and evaluate quality of results

2015-09-30 Thread Toke Eskildsen
On Wed, 2015-09-30 at 06:58 -0700, marotosg wrote: > b) Based on full data. I would like to run queries and see if the results > are good enough. That's the part I am not sure if makes sense or how to do > it. Seems like an exact match for http://quepid.com/ (I am not affil

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-28 Thread Toke Eskildsen
hers. It was a bad idea for us. - Toke Eskildsen, State and University Library, Denmark

Re: Solr facets implementation question

2015-09-23 Thread Toke Eskildsen
reaming faceting work well if one wants to export the full result set. For top-X requests it seems that there is a lot of overhead resolving terms that will not be used in the final result. But my understanding of Solr streams is very shaky. - Toke Eskildsen, State and University Library, Denmark

Re: Solr facets implementation question

2015-09-22 Thread Toke Eskildsen
ng. It is basically 'original_query AND facet_field:fine_count_term'. Quite fast for a few terms, but if there is a need for resolving tens or hundreds of terms for a non-trivial index, the fine-counting phase can take longer than the initial faceting phase. - Toke Eskildsen (sorry for the

Re: Does more shards in core improve performance?

2015-09-21 Thread Toke Eskildsen
Thank you for the verification, Toke Eskildsen, State and University Library, Denmark

Re: Does more shards in core improve performance?

2015-09-17 Thread Toke Eskildsen
it is substantially less than 100%, then feed Solr from more than one thread at a time. - Toke Eskildsen, State and University Library, Denmark

Re: Does more shards in core improve performance?

2015-09-17 Thread Toke Eskildsen
hat the CPU-cores are nicely utilized with our low queries/second usage pattern. - Toke Eskildsen, State and University Library, Denmark

Re: Solr facets implementation question

2015-09-08 Thread Toke Eskildsen
e number of unique Terms might mean that the disk cache is not large enough. Blatant plug: I have spend a fair amount of time trying to make some of this faster http://tokee.github.io/lucene-solr/ - Toke Eskildsen

Re: Strange interpretation of invalid ISO date strings

2015-09-07 Thread Toke Eskildsen
ld be to use lenient=false as default, and to allow overriding it in solrconfig.xml for backwards compatibility. - Toke Eskildsen, State and University Library, Denmark

Re: SOLR last modified different than filesystem last modified

2015-09-07 Thread Toke Eskildsen
ifference. Changing time zone on the machine might have triggered that, but then we're entering random-guessing. - Toke Eskildsen, State and University Library, Denmark

Re: SOLR last modified different than filesystem last modified

2015-09-04 Thread Toke Eskildsen
Linux, I would suspect it to be very easy. If you use the in-build graphical file explorer, I suspect the only way to do so is by adjusting timezone settings for the whole system. Etc. - Toke Eskildsen

Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Toke Eskildsen
dated. So if I check the index > folder, it will not be accurately reflexing the last time the index files > are updated. Just watch index/segments.gen. That is precise as it tracks when the logical index was last updated, whereas segment files currently being written are with later tim

Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Toke Eskildsen
Renee Sun wrote: > But I did a test with heavy indexing on going, and observed the index file > in [core]/index with a latest updated timestamp keep growing for about 7 > minutes... That is not a file, but the folder that holds the immutable segment files. What you observe is segments being writ

Re: Re: Re: Re: Re: concept and choice: custom sharding or auto sharding?

2015-09-03 Thread Toke Eskildsen
it does not support multiple index threads. - Toke Eskildsen

Re: Re: Re: Re: concept and choice: custom sharding or auto sharding?

2015-09-03 Thread Toke Eskildsen
e that does not help if you are already doing that. Also sanity check that you are not doing commits all the time. - Toke Eskildsen

Re: SOLR last modified different than filesystem last modified

2015-09-03 Thread Toke Eskildsen
). I guess your local timezone is UTC+2 and that your country is using daylight saving? Solr uses UTC only for timestamps, which is fairly unambiguous. If you want the filesystem dates to match, you can normalise them to UTC in your viewer - how to do that depends on your system. - Toke Eskildsen

Re: Re: concept and choice: custom sharding or auto sharding?

2015-09-03 Thread Toke Eskildsen
nal hiccup, so we'll be switching to SolrCloud at some point. - Toke Eskildsen, State and University Library, Denmark

Re: Re: concept and choice: custom sharding or auto sharding?

2015-09-03 Thread Toke Eskildsen
blems with 10M documents calls for locating the bottlenecks, before trying to scale the problem away. - Toke Eskildsen, State and University Library, Denmark

Re: Difference between Legacy Facets and JSON Facets

2015-09-02 Thread Toke Eskildsen
With a large field this map cannot be in the fast caches. Combine this with a gazillion references and it makes sense that JSON Facets is slower in this scenario. A factor 20 sounds like way too much though. I would have expected maybe 2. - Toke Eskildsen

Re: Solr 5.2.1 versus Solr 4.7.0 performance

2015-08-27 Thread Toke Eskildsen
limit you requested or if it is higher (default formula is limit * 1.5 + 10). The rest of your questions are too far outside of my knowledge for me to try and answer. - Toke Eskildsen, State and University Library, Denmark

Re: Solr 5.2.1 versus Solr 4.7.0 performance

2015-08-27 Thread Toke Eskildsen
en 4 & 5 that you are observing. Are you doing faceting as part of your test? - Toke Eskildsen, State and University Library, Denmark

Re: Solr performance is slow with just 1GB of data indexed

2015-08-26 Thread Toke Eskildsen
as little to do with Solr and a lot to do with carrot (assuming here that carrot is the bottleneck). You might have more success asking in a carrot forum? - Toke Eskildsen, State and University Library, Denmark

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
s not working as > it says 'Page Not Found'. That is because it is too long for a single line. Try copy-pasting it: https://cwiki.apache.org/confluence/display/solr/Result +Clustering#ResultClustering-Configuration - Toke Eskildsen, State and University Library, Denmark

Re: Solr performance is slow with just 1GB of data indexed

2015-08-25 Thread Toke Eskildsen
ed something else? Plain faceting perhaps? Or maybe enrichment of the documents with some sort of entity extraction? - Toke Eskildsen, State and University Library, Denmark

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Toke Eskildsen
t. It would be great if someone with a bit of time did some experiments with Solr on this issue. Locally we side step it a bit as we are able to get by with a 30GB heap for our largest installation and do not need more than 10GB for the rest. - Toke Eskildsen, State and University Library, Denmark

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Toke Eskildsen
u are running a lot of requests in parallel. Have you considered using a queue instead? If you currently use hundreds of parallel requests to a single machine, chances are you will get higher throughput by limiting that. As a bonus, it will require less heap. - Toke Eskildsen, State and Uni

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Toke Eskildsen
he Solr part) or the clustering itself (the Carrot part) that is the bottleneck. - Toke Eskildsen

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread Toke Eskildsen
ues upon first call. > I assume my fallback is to not index with doc values, and use an uninverting > reader to get the field data. Is there a better approach? You could index your integers as DocValued Strings, prefixed with zeroes to ensure same length and proper integer sort. - Toke Eskildsen

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Toke Eskildsen wrote > Use more than one cloud. Make them fully independent. > As I suggested when you asked 4 days ago. That would > also make it easy to scale: Just measure how much a > single setup can take and do the math. The goal is 250K documents/second. I tried modifying t

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Use more than one cloud. Make them fully independent. As I suggested when you asked 4 days ago. That would also make it easy to scale: Just measure how much a single setup can take and do the math. - Toke Eskildsen

Re: Performance issue with FILTER QUERY

2015-08-18 Thread Toke Eskildsen
and the two bitsets are merged. Next time you use the same fq, it should be cached (if you have caching enabled) and be a lot faster. Also, if you ran your two tests right after each other, the second one benefits from disk caching. If you had executed them in reverse order, the q+fq might have

Re: Query time out. Solr node goes down.

2015-08-18 Thread Toke Eskildsen
ory. If you can paste a problematic query, it is easier to see what is happening. - Toke Eskildsen, State and University Library, Denmark

Re: Query time out. Solr node goes down.

2015-08-18 Thread Toke Eskildsen
rement. - Toke Eskildsen, State and University Library, Denmark

Re: Query term matches

2015-08-16 Thread Toke Eskildsen
Scott Derrick wrote: > Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread Toke Eskildsen
nderstand correctly), one of our 256GB machines holds 6 billion documents in 20TB of index data. You might want to investigate that option. Some details at https://sbdevel.wordpress.com/net-archive-search/ - Toke Eskildsen

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread Toke Eskildsen
u do with your data. Most of the time, IO is the bottleneck for Solr and for those cases it is probably more bang-for-the-buck to buy machines with 256GB of RAM (or maybe the 148GB you have currently) as it minimizes the overhead per box. - Toke Eskildsen

Re: Index very large number of documents from large number of clients

2015-08-15 Thread Toke Eskildsen
. > 3) How many shards / replicas per collection should I use? > 4) Do I need multiple Solr servers? Not enough data about index usage to say. Between 1 and 50, not kidding. - Toke Eskildsen

Re: Big SolrCloud cluster with a lot of collections

2015-08-15 Thread Toke Eskildsen
ler collections have better performance than fewer larger collections? > (I also have cross customers queries) If you make independent setups, that could be solved by querying them independently and do the merging yourself. - Toke Eskildsen

Re: are facets or MatchAllDocsQuery not cached?

2015-08-06 Thread Toke Eskildsen
ging from a single-shard setup to a multi-shard one. As always, measure. - Toke Eskildsen

Re: are facets or MatchAllDocsQuery not cached?

2015-08-06 Thread Toke Eskildsen
ink an all_parameters -> complete_response cache is possible? > It could be initialized right before or during warmup and would not take to > much memory. Sorry, I don't know much of the mechanics of handlers in Solr and cannot say how the in-theory-simple caching would fit. - Toke Eskildsen, State and University Library, Denmark

Re: are facets or MatchAllDocsQuery not cached?

2015-08-06 Thread Toke Eskildsen
If that is not the case, your best bet would probably be to cache the match-all outside of Solr. > My assumption is that the queryResultCache is catching such a > MatchAllDocsQuery(*:*). It only stores the docIDs. I don't know why there is is no all_parameters -> complete_response

Re: Limits in individual filter sub queries

2015-08-06 Thread Toke Eskildsen
ow how can I force to fetch 50 Indian & 50 Iran records using a > single SOLR query? q=*.*&fq=(country:india) OR (country:iran) &group=true&group.field=country&group.limit=50 https://cwiki.apache.org/confluence/display/solr/Result+Grouping - Toke Eskildsen, State and University Library, Denmark

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-20 Thread Toke Eskildsen
, which cannot be done on String fields. > Would you please help me to solve this problem? With the information we have, it does not seem to be easy to solve: It seems like you want to facet on all terms in your index. As they need to be String (to use docValues), you would have to do all the

Re: Best way to facets with value preprocessing (w/ docValues)

2015-07-12 Thread Toke Eskildsen
due to 2 analyzed-but-single-token text fields with 10-20M values that we use for faceting. I am not a committer and on vacation anyway, so this is just a thumbs up to the initiative. - Toke Eskildsen

Re: DocValues: Which format is better Default or Memory?

2015-07-02 Thread Toke Eskildsen
e UnInverted structure has a speed edge due to being directly accessible as standard on-heap memory structures. The difference is likely to vary a great deal depending on concrete corpus & hardware. - Toke Eskildsen

Re: Using Facets to Limit the Scope of a Search

2015-07-01 Thread Toke Eskildsen
Paden wrote: > How would I perform a http request that would say return the documents of > previous query but ONLY the documents where author = (author with 31 > documents) Simplest thing is to add it as a filter query: q=fairy+tales&fq=author:"H. C. Andersen" - Toke Eskildsen

Re: Some guidance on memory requirements/usage/tuning

2015-06-30 Thread Toke Eskildsen
t people are seeing. There might be a perfectly fine reason for those response times, but I suggest we sanity check them: Could you show us a typical query and tell us how many concurrent queries you normally serve? - Toke Eskildsen, State and University Library, Denmark

Re: optimize status

2015-06-29 Thread Toke Eskildsen
cy? I have zero experience with that: We build the shards one at a time and don't touch them after that. 90% of our building power goes to Tika analysis, so there hasn't been a apparent need for tuning Solr's indexing. - Toke Eskildsen

Re: optimize status

2015-06-29 Thread Toke Eskildsen
be better. Turning it around: To minimize the risk of occasional performance-degrading large merges, one might want an index where all the shards are below a certain size. Splitting larger shards into smaller ones would in that case also be an optimization, just towards a different goal. - Toke Eskildsen

Re: Limit indexed documents.

2015-06-19 Thread Toke Eskildsen
illion documents, divided across 1000 shards. - Toke Eskildsen

Re: solr/lucene index merge and optimize performance improvement

2015-06-17 Thread Toke Eskildsen
ng, would work for us. Switching to a new controlling layer is not trivial, so the win by better utilization during the optimization phase is not enough in itself to pay the cost. - Toke Eskildsen, State and University Library, Denmark

Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Toke Eskildsen
t I do not know how hard it would be to do so. - Toke Eskildsen

Re: indexing issue

2015-06-04 Thread Toke Eskildsen
On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote: > I have some indexing issue . While indexing IOwait is high in solr server > and load also. Might be because you commit too frequently. How often do you do that? - Toke Eskildsen, State and University Library, Denmark

Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Toke Eskildsen
p will slowly fill up as more and more users perform faceted queries on their content. - Toke Eskildsen

Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Toke Eskildsen
nd even if your 1 million facet fields all had just 1 value, represented by 1 bit, it would still require 10M * 1M * 1 bits in memory, which is 10 terabyte of RAM. - Toke Eskildsen

Re: HW requirements

2015-05-27 Thread Toke Eskildsen
o with the data, how much the machine(s) will be used while indexing and your requirements to speed. See https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ - Toke Eskildsen

Re: solr 5.x on glassfish/tomcat instead of jetty

2015-05-20 Thread Toke Eskildsen
point: Require Solr to be run as an application instead of in a generic container) might be an idea? - Toke Eskildsen

Re: Unable to identify why faceting is taking so much time

2015-05-13 Thread Toke Eskildsen
hits. > Also subsequent calls are not fast: > First call time: 297572 > Second call time (made with in 2 sec): 249287 Are you indexing while searching? Each time the index is changed, the UnInversion will have to be re-done. facet.method=fcs seems a better choice with an often-changing

Re: Unable to identify why faceting is taking so much time

2015-05-11 Thread Toke Eskildsen
just calculate facets of 137 records? 6½ minute is a long time, even for first call. Do you have tens to hundreds of millions of documents in your index? Or do you have a similiar amount of unique values in your facet? Either way, subsequent faceting calls should be much faster and a switch to D

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread Toke Eskildsen
judging from your previous post "problem with facets - out of memory exception", you are doing non-trivial faceting. Are you using DocValues, as Marc suggested? - Toke Eskildsen, State and University Library, Denmark

Re: Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Toke Eskildsen
s, but it seems like a lot of work for a special case. - Toke Eskildsen, State and University Library, Denmark

Re: Confusing SOLR 5 memory usage

2015-04-21 Thread Toke Eskildsen
wasn't using any > RAM... wasn't getting any requests. No problem at all. On the contrary, thank you for closing the issue. - Toke Eskildsen

Re: SolrCloud 4.8.0 upgrade

2015-04-17 Thread Toke Eskildsen
wbacks? Support for the Disk-format for DocValues was removed after 4.8, so you should check if you use that: DocValuesFormat="Disk" for the field in the schema, if I remember correctly. - Toke Eskildsen

Re: Memory Leak in solr 4.8.1

2015-04-08 Thread Toke Eskildsen
Do you have a large and active filter cache? Each entry is 30MB, so it does not take many entries to fill a 8GB heap. That would match the description of ever-running GC. - Toke Eskildsen, State and University Library, Denmark

Re: Facet

2015-04-05 Thread Toke Eskildsen
g) would probably be a lot higher with just a single shard. - Toke Eskildsen

Re: Facet

2015-04-05 Thread Toke Eskildsen
n each field, it just means that multiple fields are processed in parallel. - Toke Eskildsen

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-04 Thread Toke Eskildsen
y guess is that \u0001 matches it. So something like regexp="^([^\u0001]*)\u0001([^\u0001]*)\u0001([^\u0001]*)\u0001...$"? Untested and all. But why not use the CSV import handler? That seems like the best fit. - Toke Eskildsen

Re: DOcValues

2015-04-03 Thread Toke Eskildsen
oks like DocValues now and it seems (guessing quite a bit here) that the old 16M-limitation is gone. - Toke Eskildsen

Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-03 Thread Toke Eskildsen
like this: regex="^([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),...([^,]*)$" The match speed for 28 groups with that regexp was about 0.002ms (average over 1000 matches). - Toke Eskildsen

Re: DOcValues

2015-04-03 Thread Toke Eskildsen
unique values per shard for docValues. I would like to see that go away, but that's just part of an ongoing mission to get Solr to break free from the old "2 billion should be enough for everyone"-design. - Toke Eskildsen

Re: sort on facet.index?

2015-04-02 Thread Toke Eskildsen
ide, there seems to be renewed interest for it. - Toke Eskildsen

Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Toke Eskildsen
100GB index from a same-size machine. The one hardware advice I will give is to start with SSDs and scale from there. With present day price/performance, using spinning drives for anything IO-intensive makes little sense. - Toke Eskildsen

RE: What's the need for copyField> when you have "fq"

2015-03-31 Thread Toke Eskildsen
t can be accomplished by having differently analyzed versions of the same logical field: Having a single catch-all is just easy to do. Another reason can be performance: fq-matching against all fields is heavier than matching against a few fields and the catch-all. - Toke Eskildsen

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Toke Eskildsen
ution to work, that would be the preferable solution. - Toke Eskildsen, State and University Library, Denmark

RE: rough maximum cores (shards) per machine?

2015-03-24 Thread Toke Eskildsen
ost makes a lot more sense. I will not argue against that. - Toke Eskildsen

RE: rough maximum cores (shards) per machine?

2015-03-24 Thread Toke Eskildsen
Jack Krupansky [jack.krupan...@gmail.com] wrote: > Don't confuse customers and tenants. Perhaps you could explain what you mean by multi-tenant in the context of Ian's setup? It is not clear to me what the distinction is in this case. - Toke Eskildsen

Re: index duplicate records from data source into 1 document

2015-03-20 Thread Toke Eskildsen
that update processor ...but this capability is not available out of the > box. I have not tried it at all, but I thought https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of +Documents was doing exactly what you describe? - Toke Eskildsen, State and University Library, Denmark

Re: Best way to dump out entire solr content?

2015-03-13 Thread Toke Eskildsen
practically no difference in speed between page 1 and page 5.000. I say practically because on paper, requesting page 5.000 will be a smidgen faster (there are less inserts into the priority queue), but I doubt it can be measured in real world setups. - Toke Eskildsen

SSD endurance

2015-03-12 Thread Toke Eskildsen
segments being immutable, the bird's eye view is that Lucene creates and deletes large files, which makes it possible for the SSD's wear-leveler to select the least-used flash sectors for new writes: The write pattern over time is not too far from the one that The Tech Report tested wit

Re: Performance on faceting using docValues

2015-03-05 Thread Toke Eskildsen
cet.mincount=1&sort=score+desc How large is your index in bytes, how many documents does it contain and is it single-shard or cloud? Could you paste the loglines containing "UnInverted field", which describes the number of unique values and size of your facet fields? - Toke Eskildsen, State and University Library, Denmark

Re: Cores and and ranking (search quality)

2015-03-05 Thread Toke Eskildsen
search wiki > (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does > the search and result merging (all I have to do is issue a search), is > this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark

RE: how to debug solr performance degradation

2015-02-25 Thread Toke Eskildsen
ields? My next step would be to disable parts of the query (highlight, faceting and collapsing one at a time) to check which part is the heaviest. - Toke Eskildsen From: Tang, Rebecca [rebecca.t...@ucsf.edu] Sent: 25 February 2015 20:44 To: solr

RE: Do Multiprocessing on Solr to search?

2015-02-25 Thread Toke Eskildsen
mend running 10.000 concurrent searches as it leads to congestion. You will probably get a higher throughput by queueing your requests and process then with 100 concurrent searches or so. Do test. - Toke Eskildsen

RE: how to debug solr performance degradation

2015-02-24 Thread Toke Eskildsen
ich should tell you where the time is spend resolving the queries. It it is IOWait then ensure a lot of free memory for disk cache and/or improve your storage speed (SSDs instead of spinning drives, local storage instead of remote). - Toke Eskildsen, State and University Library, Denmark.

Re: Solrcloud sizing

2015-02-18 Thread Toke Eskildsen
a small index (in bytes) and a high query rate, that probably won't help your throughput. - Toke Eskildsen, State and University Library, Denmark

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Toke Eskildsen
Solr or JVM? Can it > only be explained by the mass indexing? What is worrisome is that the > 4.10.2 shard reserves 8x times it uses. If you set your Xmx to a lot less, the JVM will probably favour more frequent garbage collections over extra heap allocation. - Toke Eskildsen, State and University Library, Denmark

<    1   2   3   4   5   6   >