Solr index size statistics

2017-12-02 Thread John Davis
Hello, Is there a way to get index size statistics for a given solr instance? For eg broken by each field stored or indexed. The only things I know of is running du on the index data files and getting counts per field indexed/stored, however each field can be quite different wrt size. Thanks John

Re: JVM GC Issue

2017-12-02 Thread S G
I am a bit curious on the docValues implementation. I understand that docValues do not use JVM memory and they make use of OS cache - that is why they are more performant. But to return any response from the docValues, the values in the docValues' column-oriented-structures would need to be brough

Re: Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread Shawn Heisey
On 12/2/2017 12:55 PM, David Lee wrote: {   "responseHeader":{     "status":0,     "QTime":798}} Though the status indicates there was no error, when I try to query on the the data using *:*, I get this: curl 'http://localhost:8983/solr/my_collection/select?q=*:*' {   "responseHeader":{

Re: Solr JVM best pratices

2017-12-02 Thread Shawn Heisey
On 12/2/2017 8:43 AM, Dominique Bejean wrote: I would like to have some advices on best practices related to Heap Size, MMap, direct memory, GC algorithm and OS Swap. For the most part, there is no generic advice we can give you for these things. What you need is going to be highly dependent

Re: Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee
Sorry about the formatting for the first part, hope this is clearer: {     "book_id": "1234",     "book_title": "The Martian Chronicles", "author": "Ray Bradbury", "reviews": [     { "reviewer": "John Smith",     "reviewer_background": { "highest_ra

Having trouble indexing nested docs using "split" feature.

2017-12-02 Thread David Lee
Hi all, I've been trying for some time now to find a suitable way to deal with json documents that have nested data. By suitable, I mean being able to index them and retrieve them so that they are in the same structure as when indexed. I'm using version 7.1 under linux Mint 18.3 with Oracle

Re: Solr JVM best pratices

2017-12-02 Thread Walter Underwood
We decided to go with modern technology for the new cluster. CMS has been around for a very long time, maybe more then ten years. These are the GC settings where we still use CMS. Instead of setting up a lot of ratios, I specify the sizes of the GC areas. That seems a lot more clear to me. We d

Re: Solr JVM best pratices

2017-12-02 Thread Dominique Bejean
Hi Walter, Thank you for this response. Did you use CMS before G1 ? Was there any GC issues fixed by G1 ? Dominique Le sam. 2 déc. 2017 à 17:13, Walter Underwood a écrit : > We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131. > > This has been solid in production with a

Re: Solr JVM best pratices

2017-12-02 Thread Walter Underwood
We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131. This has been solid in production with a 32 node Solr Cloud cluster. We do not do faceting. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 2, 2017, at 7:43 AM, Dominiqu

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean
Hi Toke, Nearly 30% of the requests are setting facet.limit=200 On 42000 requests the number of time each field is used for faceting is $ grep 'facet=true' select.log | grep -oP 'facet.field=([^&])*' | sort | uniq -c | sort -r 23119 facet.field=category_path 8643 facet.field=EUR_0_price_de

Solr JVM best pratices

2017-12-02 Thread Dominique Bejean
Hi, I would like to have some advices on best practices related to Heap Size, MMap, direct memory, GC algorithm and OS Swap. This is a waste subject and sorry for this long question but all these items are linked in order to have a stable Solr environment. My understanding and questions. About

Re: JVM GC Issue

2017-12-02 Thread Toke Eskildsen
Dominique Bejean wrote: > Hi, Thank you for the explanations about faceting. I was thinking the hit > count had a biggest impact on facet memory lifecycle. Only if you have a very high facet.limit. Could you provide us with a typical query, including all the parameters? - Toke Eskildsen

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean
Hi, Thank you for the explanations about faceting. I was thinking the hit count had a biggest impact on facet memory lifecycle. Regardless the hit cout there is a query peak at the time the issue occurs. This is relative in regard of what Solr is supposed be able to handle, but this should be suffi