Re: Long string in fq value parameter, more than 2000000 chars
Daniel, Is it worth saying that you have honkin' long queries and there must be a simpler way? ( I am a big fan of KISS . . Keep It Simple Stupid). I am not calling you names, just saying that this acronym comes up in just about every project I work on. It is akin to the Peter Principle, where design complexity inevitably increases to the breaking point, then I get cranky. And you probably can tell us a solid reason for having the long queries. Cheers -- Rick On May 30, 2017 9:22:24 AM EDT, Susheel Kumarwrote: >If you are able to draw gc logs in gcviewer when OOM happens, it can >give >you idea if it was sudden OOM or heap gets filled over a period of >time. >This may help to nail down if any particular query is causing the >problem >or something else... > >Thanks, >Susheel > >On Sat, May 27, 2017 at 5:36 PM, Daniel Angelov > >wrote: > >> Thanks for the support so far. >> I am going to analyze the logs in order to check the frequency of >such >> queries. BTW, I have forgot to mention, the soft and the hard commits >are >> each 60 sec. >> >> BR >> Daniel >> >> Am 27.05.2017 22:57 schrieb "Erik Hatcher" : >> >> > Another technique to consider is {!join}. Index the cross ref id >"sets" >> > to another core and use a short and sweet join, if there are stable >sets >> of >> > id's. >> > >> >Erik >> > >> > > On May 27, 2017, at 11:39, Alexandre Rafalovitch > >> > wrote: >> > > >> > > On top of Shawn's analysis, I am also wondering how often those >FQ >> > > queries are reused. Because they and the matching documents are >> > > getting cached, so there might be quite a bit of space taken with >that >> > > too. >> > > >> > > Regards, >> > >Alex. >> > > >> > > http://www.solr-start.com/ - Resources for Solr users, new and >> > experienced >> > > >> > > >> > >> On 27 May 2017 at 11:32, Shawn Heisey >wrote: >> > >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote: >> > On 5/27/2017 7:14 AM, Daniel Angelov wrote: >> > I would like to ask, what could be the memory/cpu impact, if >the fq >> > parameter in many of the queries is a long string (fq={!terms >> > f=...}..., ) around 200 chars. Most of the queries are >like: >> > "q={!frange l=Timestamp1 u=Timestamp2}... + some others >criteria". >> > This is with SolrCloud 4.1, on 10 hosts, 3 collections, >summary in >> > all collections are around 1000 docs. The queries are over >all 3 >> > collections. >> > >> >> > >> Followup after a little more thought: >> > >> >> > >> If we assume that the terms in your filter query are a generous >15 >> > >> characters each (plus a comma), that means there are in the >ballpark >> of >> > >> 125 thousand of them in a two million byte filter query. If >they're >> > >> smaller, then there would be more. Considering 56 bytes of >overhead >> for >> > >> each one, there's at least another 7 million bytes of memory for >> 125000 >> > >> terms when the terms parser divides that filter into multiple >String >> > >> objects, plus memory required for the data in each of those >small >> > >> strings, which will be just a little bit less than the original >four >> > >> million bytes, because it will exclude the commas. A fair >amount of >> > >> garbage will probably also be generated in order to parse the >filter >> ... >> > >> and then once the query is done, the 15 megabytes (or more) of >memory >> > >> for the strings will also be garbage. This is going to repeat >for >> every >> > >> shard. >> > >> >> > >> I haven't even discussed what happens for memory requirements on >the >> > >> Lucene frange parser, because I don't have any idea what those >are, >> and >> > >> you didn't describe the function you're using. I also don't >know how >> > >> much memory Lucene is going to require in order to execute a >terms >> > >> filter with at least 125K terms. I don't imagine it's going to >be >> > small. >> > >> >> > >> Thanks, >> > >> Shawn >> > >> >> > >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Long string in fq value parameter, more than 2000000 chars
If you are able to draw gc logs in gcviewer when OOM happens, it can give you idea if it was sudden OOM or heap gets filled over a period of time. This may help to nail down if any particular query is causing the problem or something else... Thanks, Susheel On Sat, May 27, 2017 at 5:36 PM, Daniel Angelovwrote: > Thanks for the support so far. > I am going to analyze the logs in order to check the frequency of such > queries. BTW, I have forgot to mention, the soft and the hard commits are > each 60 sec. > > BR > Daniel > > Am 27.05.2017 22:57 schrieb "Erik Hatcher" : > > > Another technique to consider is {!join}. Index the cross ref id "sets" > > to another core and use a short and sweet join, if there are stable sets > of > > id's. > > > >Erik > > > > > On May 27, 2017, at 11:39, Alexandre Rafalovitch > > wrote: > > > > > > On top of Shawn's analysis, I am also wondering how often those FQ > > > queries are reused. Because they and the matching documents are > > > getting cached, so there might be quite a bit of space taken with that > > > too. > > > > > > Regards, > > >Alex. > > > > > > http://www.solr-start.com/ - Resources for Solr users, new and > > experienced > > > > > > > > >> On 27 May 2017 at 11:32, Shawn Heisey wrote: > > >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote: > > On 5/27/2017 7:14 AM, Daniel Angelov wrote: > > I would like to ask, what could be the memory/cpu impact, if the fq > > parameter in many of the queries is a long string (fq={!terms > > f=...}..., ) around 200 chars. Most of the queries are like: > > "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". > > This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in > > all collections are around 1000 docs. The queries are over all 3 > > collections. > > >> > > >> Followup after a little more thought: > > >> > > >> If we assume that the terms in your filter query are a generous 15 > > >> characters each (plus a comma), that means there are in the ballpark > of > > >> 125 thousand of them in a two million byte filter query. If they're > > >> smaller, then there would be more. Considering 56 bytes of overhead > for > > >> each one, there's at least another 7 million bytes of memory for > 125000 > > >> terms when the terms parser divides that filter into multiple String > > >> objects, plus memory required for the data in each of those small > > >> strings, which will be just a little bit less than the original four > > >> million bytes, because it will exclude the commas. A fair amount of > > >> garbage will probably also be generated in order to parse the filter > ... > > >> and then once the query is done, the 15 megabytes (or more) of memory > > >> for the strings will also be garbage. This is going to repeat for > every > > >> shard. > > >> > > >> I haven't even discussed what happens for memory requirements on the > > >> Lucene frange parser, because I don't have any idea what those are, > and > > >> you didn't describe the function you're using. I also don't know how > > >> much memory Lucene is going to require in order to execute a terms > > >> filter with at least 125K terms. I don't imagine it's going to be > > small. > > >> > > >> Thanks, > > >> Shawn > > >> > > >
Re: Long string in fq value parameter, more than 2000000 chars
Thanks for the support so far. I am going to analyze the logs in order to check the frequency of such queries. BTW, I have forgot to mention, the soft and the hard commits are each 60 sec. BR Daniel Am 27.05.2017 22:57 schrieb "Erik Hatcher": > Another technique to consider is {!join}. Index the cross ref id "sets" > to another core and use a short and sweet join, if there are stable sets of > id's. > >Erik > > > On May 27, 2017, at 11:39, Alexandre Rafalovitch > wrote: > > > > On top of Shawn's analysis, I am also wondering how often those FQ > > queries are reused. Because they and the matching documents are > > getting cached, so there might be quite a bit of space taken with that > > too. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > >> On 27 May 2017 at 11:32, Shawn Heisey wrote: > >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote: > On 5/27/2017 7:14 AM, Daniel Angelov wrote: > I would like to ask, what could be the memory/cpu impact, if the fq > parameter in many of the queries is a long string (fq={!terms > f=...}..., ) around 200 chars. Most of the queries are like: > "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". > This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in > all collections are around 1000 docs. The queries are over all 3 > collections. > >> > >> Followup after a little more thought: > >> > >> If we assume that the terms in your filter query are a generous 15 > >> characters each (plus a comma), that means there are in the ballpark of > >> 125 thousand of them in a two million byte filter query. If they're > >> smaller, then there would be more. Considering 56 bytes of overhead for > >> each one, there's at least another 7 million bytes of memory for 125000 > >> terms when the terms parser divides that filter into multiple String > >> objects, plus memory required for the data in each of those small > >> strings, which will be just a little bit less than the original four > >> million bytes, because it will exclude the commas. A fair amount of > >> garbage will probably also be generated in order to parse the filter ... > >> and then once the query is done, the 15 megabytes (or more) of memory > >> for the strings will also be garbage. This is going to repeat for every > >> shard. > >> > >> I haven't even discussed what happens for memory requirements on the > >> Lucene frange parser, because I don't have any idea what those are, and > >> you didn't describe the function you're using. I also don't know how > >> much memory Lucene is going to require in order to execute a terms > >> filter with at least 125K terms. I don't imagine it's going to be > small. > >> > >> Thanks, > >> Shawn > >> >
Re: Long string in fq value parameter, more than 2000000 chars
Another technique to consider is {!join}. Index the cross ref id "sets" to another core and use a short and sweet join, if there are stable sets of id's. Erik > On May 27, 2017, at 11:39, Alexandre Rafalovitchwrote: > > On top of Shawn's analysis, I am also wondering how often those FQ > queries are reused. Because they and the matching documents are > getting cached, so there might be quite a bit of space taken with that > too. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > >> On 27 May 2017 at 11:32, Shawn Heisey wrote: >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote: On 5/27/2017 7:14 AM, Daniel Angelov wrote: I would like to ask, what could be the memory/cpu impact, if the fq parameter in many of the queries is a long string (fq={!terms f=...}..., ) around 200 chars. Most of the queries are like: "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in all collections are around 1000 docs. The queries are over all 3 collections. >> >> Followup after a little more thought: >> >> If we assume that the terms in your filter query are a generous 15 >> characters each (plus a comma), that means there are in the ballpark of >> 125 thousand of them in a two million byte filter query. If they're >> smaller, then there would be more. Considering 56 bytes of overhead for >> each one, there's at least another 7 million bytes of memory for 125000 >> terms when the terms parser divides that filter into multiple String >> objects, plus memory required for the data in each of those small >> strings, which will be just a little bit less than the original four >> million bytes, because it will exclude the commas. A fair amount of >> garbage will probably also be generated in order to parse the filter ... >> and then once the query is done, the 15 megabytes (or more) of memory >> for the strings will also be garbage. This is going to repeat for every >> shard. >> >> I haven't even discussed what happens for memory requirements on the >> Lucene frange parser, because I don't have any idea what those are, and >> you didn't describe the function you're using. I also don't know how >> much memory Lucene is going to require in order to execute a terms >> filter with at least 125K terms. I don't imagine it's going to be small. >> >> Thanks, >> Shawn >>
Re: Long string in fq value parameter, more than 2000000 chars
On top of Shawn's analysis, I am also wondering how often those FQ queries are reused. Because they and the matching documents are getting cached, so there might be quite a bit of space taken with that too. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 27 May 2017 at 11:32, Shawn Heiseywrote: > On 5/27/2017 9:05 AM, Shawn Heisey wrote: >> On 5/27/2017 7:14 AM, Daniel Angelov wrote: >>> I would like to ask, what could be the memory/cpu impact, if the fq >>> parameter in many of the queries is a long string (fq={!terms >>> f=...}..., ) around 200 chars. Most of the queries are like: >>> "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". >>> This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in >>> all collections are around 1000 docs. The queries are over all 3 >>> collections. > > Followup after a little more thought: > > If we assume that the terms in your filter query are a generous 15 > characters each (plus a comma), that means there are in the ballpark of > 125 thousand of them in a two million byte filter query. If they're > smaller, then there would be more. Considering 56 bytes of overhead for > each one, there's at least another 7 million bytes of memory for 125000 > terms when the terms parser divides that filter into multiple String > objects, plus memory required for the data in each of those small > strings, which will be just a little bit less than the original four > million bytes, because it will exclude the commas. A fair amount of > garbage will probably also be generated in order to parse the filter ... > and then once the query is done, the 15 megabytes (or more) of memory > for the strings will also be garbage. This is going to repeat for every > shard. > > I haven't even discussed what happens for memory requirements on the > Lucene frange parser, because I don't have any idea what those are, and > you didn't describe the function you're using. I also don't know how > much memory Lucene is going to require in order to execute a terms > filter with at least 125K terms. I don't imagine it's going to be small. > > Thanks, > Shawn >
Re: Long string in fq value parameter, more than 2000000 chars
On 5/27/2017 7:14 AM, Daniel Angelov wrote: > I would like to ask, what could be the memory/cpu impact, if the fq > parameter in many of the queries is a long string (fq={!terms > f=...}..., ) around 200 chars. Most of the queries are like: > "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". This is > with SolrCloud 4.1, on 10 hosts, 3 collections, summary in all collections > are around 1000 docs. The queries are over all 3 collections. > > I have sometimes OOM exceptions. And I can see GC times are pretty long. > The heap size is 64 GB on each host. The cache settings are the default. > > Is it possible the long fq parameter in the requests to cause OOM > exceptions? A two million character string in Java will take just over four million bytes of memory. This is because Java uses UTF-16 internally, and overhead on a String object is approximately 56 bytes. With multiple shards, that string is going to get copied for each shard. There might be other places in the Solr and Lucene code where the string will also get copied multiple times. At four megabytes for each copy, that's going to eat up memory quickly. It will also take a non-trivial amount of time to accomplish each copy. OOM exceptions on a 64GB heap? Even if we consider the info just mentioned and there are several copies of the two million character string floating around, it sounds like you are doing some massively complex queries, or that your index size is beyond gargantuan. I cannot imagine needing a 64GB heap for 30 million documents unless the system is handling some very unusual queries, and/or an enormous index, and/or some *extremely* large Solr caches. I suspect there are many details that we haven't heard yet. I'm not even sure exactly what to ask for, so I'll ask for the moon: On a per-server basis, can we see the following info? Total memory installed in the server. How many Solr instances are running on the server. The total amount of max heap memory allocated to Solr. A list of other things running on the server besides Solr. Total size of the solr home directory. How many documents does that solr home size represent? If there are multiple shards/replicas, all of them must be counted. solrconfig.xml and the schema would be useful. More general questions: What does a typical query involve? If there are facets, describe each field used in a facet -- term cardinality, typical contents, analysis, etc. If the system is running an OS with the "top" utility available, run top (not htop or any other variety), press shift-M to sort by memory, grab a screenshot, and put the information on the Internet somewhere we can access it with a URL. If it's on Windows, similar information can be obtained with Resource Monitor, sort by "Working Set" on the Memory tab. Thanks, Shawn
Long string in fq value parameter, more than 2000000 chars
Hello, I would like to ask, what could be the memory/cpu impact, if the fq parameter in many of the queries is a long string (fq={!terms f=...}..., ) around 200 chars. Most of the queries are like: "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in all collections are around 1000 docs. The queries are over all 3 collections. I have sometimes OOM exceptions. And I can see GC times are pretty long. The heap size is 64 GB on each host. The cache settings are the default. Is it possible the long fq parameter in the requests to cause OOM exceptions? Thank you Daniel