Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
In this case, for example:
http://host1:8983/solr/collName/admin/mbeans?stats=true
returns us stats in the contex of the shard of "collName", living on host1,
is not it?

BR
Daniel

Am 02.06.2017 20:00 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:

Sorry for the typos in the previous mail, "fg" should be "fq"

Am 02.06.2017 18:15 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:

> This means, that quering alias NNN pointing 3 collections, each 10 shards
> and each 2 replicas, a query with very long fg value, say 20 char
> string. First query with fq will cache all 20 chars 30 times (3 x 10
> cores). The next query with the same fg, could not use the same cores as
> the first time, i.e. could locate more mem in the unused replicas from the
> first query. And in my case the soft commint is each 60 sec. this means a
> lot of GC, is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>:
>
>> bq: This means, if we have a collection with 2 replicas, there is a
>> chance,
>> that 2 queries with identical fq values can be served from different
>> replicas of the same shards, this means, that the second query will not
>> use
>> the cached set from the first query, is not it?
>>
>> Yes. In practice autowarming is often used to pre-warm the caches, but
>> again that's local to each replica, i.e. the fqs used to autowarm
>> replica1 or shard1 may be different than the ones used to autowarm
>> replica2 of shard1. What tends to happen is that the replicas "level
>> out". Any fq clause that's common enough to be useful eventually hits
>> all the replicas. And the most common ones are run during autowarming
>> since it's an LRU queue.
>>
>> To understand why there isn't a common cache, consider that the
>> filterCache is conceptually a map. The key is the fq clause and the
>> value is a bitset where each bit corresponds to the _internal_ Lucene
>> document ID which is just an integer 0-maxDoc. There are two critical
>> points here:
>>
>> 1> the internal ID changes when segments are merged
>> 2> different replicas will have different _internal_ ids for the same
>> document. By "same" here I mean have the same .
>>
>> So completely sidestepping the question of the propagation delays of
>> trying to consult some kind of central filterCache, the nature of that
>> cache is such that you couldn't share it between replicas anyway.
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com>
>> wrote:
>> > Thanks for the answer!
>> > This means, if we have a collection with 2 replicas, there is a chance,
>> > that 2 queries with identical fq values can be served from different
>> > replicas of the same shards, this means, that the second query will not
>> use
>> > the cached set from the first query, is not it?
>> >
>> > Thanks
>> > Daniel
>> >
>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
>> >
>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>> settings
>> >> are per host/JVM.
>> >>
>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>> >>
>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> >> > > Is the filter cache separate for each host and then for each
>> >> > > collection and then for each shard and then for each replica in
>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>> caches,
>> >> > > i.e. separate memory for each core? If they are separated and for
>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>> later
>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>> >> > > that the later query will not use the result set cached from the
>> first
>> >> > > query...
>> >> >
>> >> > That is correct.
>> >> >
>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>> around
>> >> > collections.  Collections are made up of one or more shards.  Shards
>> a

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
Sorry for the typos in the previous mail, "fg" should be "fq"

Am 02.06.2017 18:15 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:

> This means, that quering alias NNN pointing 3 collections, each 10 shards
> and each 2 replicas, a query with very long fg value, say 20 char
> string. First query with fq will cache all 20 chars 30 times (3 x 10
> cores). The next query with the same fg, could not use the same cores as
> the first time, i.e. could locate more mem in the unused replicas from the
> first query. And in my case the soft commint is each 60 sec. this means a
> lot of GC, is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>:
>
>> bq: This means, if we have a collection with 2 replicas, there is a
>> chance,
>> that 2 queries with identical fq values can be served from different
>> replicas of the same shards, this means, that the second query will not
>> use
>> the cached set from the first query, is not it?
>>
>> Yes. In practice autowarming is often used to pre-warm the caches, but
>> again that's local to each replica, i.e. the fqs used to autowarm
>> replica1 or shard1 may be different than the ones used to autowarm
>> replica2 of shard1. What tends to happen is that the replicas "level
>> out". Any fq clause that's common enough to be useful eventually hits
>> all the replicas. And the most common ones are run during autowarming
>> since it's an LRU queue.
>>
>> To understand why there isn't a common cache, consider that the
>> filterCache is conceptually a map. The key is the fq clause and the
>> value is a bitset where each bit corresponds to the _internal_ Lucene
>> document ID which is just an integer 0-maxDoc. There are two critical
>> points here:
>>
>> 1> the internal ID changes when segments are merged
>> 2> different replicas will have different _internal_ ids for the same
>> document. By "same" here I mean have the same .
>>
>> So completely sidestepping the question of the propagation delays of
>> trying to consult some kind of central filterCache, the nature of that
>> cache is such that you couldn't share it between replicas anyway.
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com>
>> wrote:
>> > Thanks for the answer!
>> > This means, if we have a collection with 2 replicas, there is a chance,
>> > that 2 queries with identical fq values can be served from different
>> > replicas of the same shards, this means, that the second query will not
>> use
>> > the cached set from the first query, is not it?
>> >
>> > Thanks
>> > Daniel
>> >
>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
>> >
>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>> settings
>> >> are per host/JVM.
>> >>
>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>> >>
>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> >> > > Is the filter cache separate for each host and then for each
>> >> > > collection and then for each shard and then for each replica in
>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>> caches,
>> >> > > i.e. separate memory for each core? If they are separated and for
>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>> later
>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>> >> > > that the later query will not use the result set cached from the
>> first
>> >> > > query...
>> >> >
>> >> > That is correct.
>> >> >
>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>> around
>> >> > collections.  Collections are made up of one or more shards.  Shards
>> are
>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>> core
>> >> > contains one Lucene index.  It is not correct to say that a shard
>> has no
>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>> >> > follower, the shard has two replicas.
>> >> >
>> >> > Solr caches (including filterCache) exist at the core level, they
>> have
>> >> > no knowledge of other replicas, other shards, or the collection as a
>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>> >> > defined in the configuration for that core.
>> >> >
>> >> > Your query scenario has even more separation -- it asks about
>> querying
>> >> > two completely different collections, which don't use the same cores.
>> >> >
>> >> > Thanks,
>> >> > Shawn
>> >> >
>> >> >
>> >>
>>
>


Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
This means, that quering alias NNN pointing 3 collections, each 10 shards
and each 2 replicas, a query with very long fg value, say 20 char
string. First query with fq will cache all 20 chars 30 times (3 x 10
cores). The next query with the same fg, could not use the same cores as
the first time, i.e. could locate more mem in the unused replicas from the
first query. And in my case the soft commint is each 60 sec. this means a
lot of GC, is not it?

BR
Daniel

Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>:

> bq: This means, if we have a collection with 2 replicas, there is a chance,
> that 2 queries with identical fq values can be served from different
> replicas of the same shards, this means, that the second query will not use
> the cached set from the first query, is not it?
>
> Yes. In practice autowarming is often used to pre-warm the caches, but
> again that's local to each replica, i.e. the fqs used to autowarm
> replica1 or shard1 may be different than the ones used to autowarm
> replica2 of shard1. What tends to happen is that the replicas "level
> out". Any fq clause that's common enough to be useful eventually hits
> all the replicas. And the most common ones are run during autowarming
> since it's an LRU queue.
>
> To understand why there isn't a common cache, consider that the
> filterCache is conceptually a map. The key is the fq clause and the
> value is a bitset where each bit corresponds to the _internal_ Lucene
> document ID which is just an integer 0-maxDoc. There are two critical
> points here:
>
> 1> the internal ID changes when segments are merged
> 2> different replicas will have different _internal_ ids for the same
> document. By "same" here I mean have the same .
>
> So completely sidestepping the question of the propagation delays of
> trying to consult some kind of central filterCache, the nature of that
> cache is such that you couldn't share it between replicas anyway.
>
> Best,
> Erick
>
> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com>
> wrote:
> > Thanks for the answer!
> > This means, if we have a collection with 2 replicas, there is a chance,
> > that 2 queries with identical fq values can be served from different
> > replicas of the same shards, this means, that the second query will not
> use
> > the cached set from the first query, is not it?
> >
> > Thanks
> > Daniel
> >
> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
> >
> >> Thanks for the correction Shawn.  Yes its only the heap allocation
> settings
> >> are per host/JVM.
> >>
> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >>
> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> >> > > Is the filter cache separate for each host and then for each
> >> > > collection and then for each shard and then for each replica in
> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> >> > > i.e. separate memory for each core? If they are separated and for
> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
> later
> >> > > the same query is handling from coll2 shard1 replica1, this means,
> >> > > that the later query will not use the result set cached from the
> first
> >> > > query...
> >> >
> >> > That is correct.
> >> >
> >> > General notes about SolrCloud terminology: SolrCloud is organized
> around
> >> > collections.  Collections are made up of one or more shards.  Shards
> are
> >> > made up of one or more replicas.  Each replica is a Solr core.  A core
> >> > contains one Lucene index.  It is not correct to say that a shard has
> no
> >> > replicas.  The leader *is* a replica.  If you have a leader and one
> >> > follower, the shard has two replicas.
> >> >
> >> > Solr caches (including filterCache) exist at the core level, they have
> >> > no knowledge of other replicas, other shards, or the collection as a
> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
> >> > correct.  Every Solr core in a JVM has separate caches, if they are
> >> > defined in the configuration for that core.
> >> >
> >> > Your query scenario has even more separation -- it asks about querying
> >> > two completely different collections, which don't use the same cores.
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >> >
> >>
>


Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
Thanks for the answer!
This means, if we have a collection with 2 replicas, there is a chance,
that 2 queries with identical fq values can be served from different
replicas of the same shards, this means, that the second query will not use
the cached set from the first query, is not it?

Thanks
Daniel

Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:

> Thanks for the correction Shawn.  Yes its only the heap allocation settings
> are per host/JVM.
>
> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
> > > Is the filter cache separate for each host and then for each
> > > collection and then for each shard and then for each replica in
> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
> > > i.e. separate memory for each core? If they are separated and for
> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec later
> > > the same query is handling from coll2 shard1 replica1, this means,
> > > that the later query will not use the result set cached from the first
> > > query...
> >
> > That is correct.
> >
> > General notes about SolrCloud terminology: SolrCloud is organized around
> > collections.  Collections are made up of one or more shards.  Shards are
> > made up of one or more replicas.  Each replica is a Solr core.  A core
> > contains one Lucene index.  It is not correct to say that a shard has no
> > replicas.  The leader *is* a replica.  If you have a leader and one
> > follower, the shard has two replicas.
> >
> > Solr caches (including filterCache) exist at the core level, they have
> > no knowledge of other replicas, other shards, or the collection as a
> > whole.  Susheel says that the caches are per host/JVM -- that's not
> > correct.  Every Solr core in a JVM has separate caches, if they are
> > defined in the configuration for that core.
> >
> > Your query scenario has even more separation -- it asks about querying
> > two completely different collections, which don't use the same cores.
> >
> > Thanks,
> > Shawn
> >
> >
>


Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-01 Thread Daniel Angelov
Is the filter cache separate for each host and then for each collection and
then for each shard and then for each replica in SolrCloud?
For example, on host1 we have, coll1 shard1 replica1 and coll2 shard1
replica1, on host2 we have, coll1 shard2 replica2 and coll2 shard2
replica2. Does this mean, that we have 4 filter caches, i.e. separate
memory for each core?
If they are separated and for example, query1 is handling from coll1 shard1
replica1 and 1 sec later the same query is handling from coll2 shard1
replica1, this means, that the later query will not use the result set
cached from the first query...

BR
Daniel


Re: Long string in fq value parameter, more than 2000000 chars

2017-05-27 Thread Daniel Angelov
Thanks for the support so far.
I am going to analyze the logs in order to check the frequency of such
queries. BTW, I have forgot to mention, the soft and the hard commits are
each 60 sec.

BR
Daniel

Am 27.05.2017 22:57 schrieb "Erik Hatcher" <erik.hatc...@gmail.com>:

> Another technique to consider is {!join}.  Index the cross ref id "sets"
> to another core and use a short and sweet join, if there are stable sets of
> id's.
>
>Erik
>
> > On May 27, 2017, at 11:39, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> >
> > On top of Shawn's analysis, I am also wondering how often those FQ
> > queries are reused. Because they and the matching documents are
> > getting cached, so there might be quite a bit of space taken with that
> > too.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> >> On 27 May 2017 at 11:32, Shawn Heisey <apa...@elyograg.org> wrote:
> >>> On 5/27/2017 9:05 AM, Shawn Heisey wrote:
> >>>> On 5/27/2017 7:14 AM, Daniel Angelov wrote:
> >>>> I would like to ask, what could be the memory/cpu impact, if the fq
> >>>> parameter in many of the queries is a long string (fq={!terms
> >>>> f=...}..., ) around 200 chars. Most of the queries are like:
> >>>> "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria".
> >>>> This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in
> >>>> all collections are around 1000 docs. The queries are over all 3
> >>>> collections.
> >>
> >> Followup after a little more thought:
> >>
> >> If we assume that the terms in your filter query are a generous 15
> >> characters each (plus a comma), that means there are in the ballpark of
> >> 125 thousand of them in a two million byte filter query.  If they're
> >> smaller, then there would be more.  Considering 56 bytes of overhead for
> >> each one, there's at least another 7 million bytes of memory for 125000
> >> terms when the terms parser divides that filter into multiple String
> >> objects, plus memory required for the data in each of those small
> >> strings, which will be just a little bit less than the original four
> >> million bytes, because it will exclude the commas.  A fair amount of
> >> garbage will probably also be generated in order to parse the filter ...
> >> and then once the query is done, the 15 megabytes (or more) of memory
> >> for the strings will also be garbage.  This is going to repeat for every
> >> shard.
> >>
> >> I haven't even discussed what happens for memory requirements on the
> >> Lucene frange parser, because I don't have any idea what those are, and
> >> you didn't describe the function you're using.  I also don't know how
> >> much memory Lucene is going to require in order to execute a terms
> >> filter with at least 125K terms.  I don't imagine it's going to be
> small.
> >>
> >> Thanks,
> >> Shawn
> >>
>


Long string in fq value parameter, more than 2000000 chars

2017-05-27 Thread Daniel Angelov
Hello,

I would like to ask, what could be the memory/cpu impact, if the fq
parameter in many of the queries is a long string (fq={!terms
f=...}..., ) around 200 chars. Most of the queries are like:
"q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". This is
with SolrCloud 4.1, on 10 hosts, 3 collections, summary in all collections
are around 1000 docs. The queries are over all 3 collections.

I have sometimes OOM exceptions. And I can see GC times are pretty long.
The heap size is 64 GB on each host. The cache settings are the default.

Is it possible the long fq parameter in the requests to cause OOM
exceptions?


Thank you

Daniel


Is it posible to set maximum indexed documents in solr?

2010-01-21 Thread Daniel Angelov
Is it posible to set maximum indexed documents in solr? For example, I want
to insert in solr max 5000 document, after that solr must refuse unserting.


servlet forwarding solrj request/response

2010-01-21 Thread Daniel Angelov
Is it possible to make a servlet which take some information/statistic about
solrj request/response between another web application and solr server? For
example I have a JBOSS web appl for add/select documents from solr, but I
want to take some information about this operations in another web appl
under tomcat where solr war is in the same tomcat container.


Re: servlet forwarding solrj request/response

2010-01-21 Thread Daniel Angelov
thanks Erik,
or perhaps a proxy in the middle that forwards requests on to Solr,
but captures however you like. 
That is what I am lookin for.
How I can to implement this kind of proxy, I try with RequestDispatcher
forward method od servlet API, but, when jboss appl requests doc via
solrj(from tomcat solr), tomcat servlet does not capture the jboss appl
request. In fact, my target is to control a select/add in solr.

On Thu, Jan 21, 2010 at 1:45 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 sure, you could put a servlet filter in Solr's web.xml to capture whatever
 you like.

 another option would be to hook into Solr's logging and fire events/data
 off elsewhere.

 or perhaps a proxy in the middle that forwards requests on to Solr, but
 captures however you like.

Erik


 On Jan 21, 2010, at 6:35 AM, Daniel Angelov wrote:

  Is it possible to make a servlet which take some information/statistic
 about
 solrj request/response between another web application and solr server?
 For
 example I have a JBOSS web appl for add/select documents from solr, but I
 want to take some information about this operations in another web appl
 under tomcat where solr war is in the same tomcat container.





Re: question

2010-01-21 Thread Daniel Angelov
Thanks Shalin,
your proposal is good. Could you give me some link, where I can read some
documantation about your idea. If I write some class extending
UpdateRequestProcessor, where I have to put it, so the requests to solr go
through that new class.

Daniel Angelov


On Thu, Jan 21, 2010 at 3:05 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Thu, Jan 21, 2010 at 1:21 PM, Daniel Angelov 
 dani.b.ange...@googlemail.com wrote:

  Is it posible to set maximum indexed documents in solr? For example, I
 want
  to insert in solr max 5000 document, after that solr must refuse
 unserting.
 

 No but you can do it in your indexing application or write a custom
 UpdateRequestProcessor to count the number of adds and throw an exception
 once the limit is reached. Though the latter gets slightly tricky when you
 delete by query or when you replace docs.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: question

2010-01-21 Thread Daniel Angelov
My case is:
I have 2 web appl, first in jboss, second - tomcat.
The second knows how is max docs, but first make a new docs, so I wander ,
how I can control the indexing (from jboss) through tomcat appl. The solr
server is in tomcat

thanks

On Thu, Jan 21, 2010 at 3:05 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Thu, Jan 21, 2010 at 1:21 PM, Daniel Angelov 
 dani.b.ange...@googlemail.com wrote:

  Is it posible to set maximum indexed documents in solr? For example, I
 want
  to insert in solr max 5000 document, after that solr must refuse
 unserting.
 

 No but you can do it in your indexing application or write a custom
 UpdateRequestProcessor to count the number of adds and throw an exception
 once the limit is reached. Though the latter gets slightly tricky when you
 delete by query or when you replace docs.

 --
 Regards,
 Shalin Shekhar Mangar.



question

2010-01-20 Thread Daniel Angelov
Is it posible to set maximum indexed documents in solr? For example, I want
to insert in solr max 5000 document, after that solr must refuse unserting.