Re: Inconsistent results for facet queries

2017-10-12 Thread Chris Ulicny
I'm not sure if that method is viable for reindexing and fetching the whole
collection at once for us, but unless there is something inherent in that
process which happens at the collection level, we could do it a few shards
at a time since it is a multi-tenant setup.

I'll see if we can setup a small test in QA for this, and test it out. This
facet issue is the only one we've noticed and is able to be worked around,
so we might end up just waiting until we reindex for version 7.X to
permanently fix it.

Thanks
Chris

On Thu, Oct 12, 2017 at 1:41 PM Erick Erickson 
wrote:

> (1) It doesn't matter whether it "affect only segments being merged".
> You can't get accurate information if different segments have
> different expectations.
>
> (2) I strongly doubt it. The problem is that the "tainted" segments'
> meta-data is still read when merging. If the segment consisted of
> _only_ deleted documents you'd probably lose it, but it'll be
> re-merged long before it consists of exclusively deleted documents.
>
> Really, you have to re-index to be sure, I suspect you can find some
> way to do this faster than exploring undefined behavior and hoping.
>
> If you can re-index _anywhere_ to a collection with the same number of
> shards you can get this done, it'll take some tricky dancing but
>
> 0> copy one index directory from each shard someplace safe.
> 1> reindex somewhere, single-replica will do.
> 2> Delete all replicas except one for your current collection
> 3> issue an admin API command fetchindex for each replica in old
> collection, pulling the index "from the right place" in the new
> collection. It's important that there only be a single replica for
> each shard active at this point. These two collection do _not_ need to
> be part of the same SolrCloud, the fetchindex command just takes a URL
> of the core to fetch from.
> 4> add the replicas back and let them replicate.
>
> Your installation would be unavailable for searching during steps 2-4 of
> course.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 9:01 AM, Chris Ulicny  wrote:
> > We tested the query on all replicas for the given shard, and they all
> have
> > the same issue. So deleting and adding another replica won't fix the
> > problem since the leader is exhibiting the behavior as well. I believe
> the
> > second replica was moved (new one added, old one deleted) between nodes
> and
> > so was just a copy of the leader's index after the problematic merge
> > happened.
> >
> > bq: Anything that didn't merge old segments, just threw them
> > away when empty (which was my idea) would possibly require as much
> > disk space as the index currently occupied, so doesn't help your
> > disk-constrained situation.
> >
> > Something like this was originally what I thought might fix the issue. If
> > we reindex the data for the affected shard, it would possibly delete all
> > docs from the old segments and just drop them instead of merging them. As
> > mentioned, you'd expect the problems to persist through subsequent
> merges.
> > So I've got two questions
> >
> > 1) If the problem persists through merges, does it only affect the
> segments
> > being merged, and then when solr goes looking for the values, it comes up
> > empty? Instead of all segments being affected by a single merge they
> > weren't a part of.
> >
> > 2) Is it expected that any large tainted segments will eventually merge
> > with clean segments resulting in more tainted segments as enough docs are
> > deleted on the large segments?
> >
> > Also, we aren't disk constrained as much as previously. Reindexing a
> subset
> > of docs is possible, but a full clean collection reindex isn't.
> >
> > Thanks,
> > Chris
> >
> >
> > On Thu, Oct 12, 2017 at 11:13 AM Erick Erickson  >
> > wrote:
> >
> >> Never mind. Anything that didn't merge old segments, just threw them
> >> away when empty (which was my idea) would possibly require as much
> >> disk space as the index currently occupied, so doesn't help your
> >> disk-constrained situation.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >> > If it's _only_ on a particular replica, here's what you could do:
> >> > Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
> >> > define the "node" parameter on ADDREPLICA to get it back on the same
> >> > node. Then the normal replication process would pull the entire index
> >> > down from the leader.
> >> >
> >> > My bet, though, is that this wouldn't really fix things. While it
> fixes
> >> the
> >> > particular case you've noticed I'd guess others would pop up. You can
> >> > see what replicas return what by firing individual queries at the
> >> > particular replica in question with =false, something like
> >> >
> >>
> solr_server:port/solr/collection1_shard1_replica1/query?distrib=false
> >> > blah blah
> >> >
> >> >
> >> > bq: It 

Re: Inconsistent results for facet queries

2017-10-12 Thread Erick Erickson
(1) It doesn't matter whether it "affect only segments being merged".
You can't get accurate information if different segments have
different expectations.

(2) I strongly doubt it. The problem is that the "tainted" segments'
meta-data is still read when merging. If the segment consisted of
_only_ deleted documents you'd probably lose it, but it'll be
re-merged long before it consists of exclusively deleted documents.

Really, you have to re-index to be sure, I suspect you can find some
way to do this faster than exploring undefined behavior and hoping.

If you can re-index _anywhere_ to a collection with the same number of
shards you can get this done, it'll take some tricky dancing but

0> copy one index directory from each shard someplace safe.
1> reindex somewhere, single-replica will do.
2> Delete all replicas except one for your current collection
3> issue an admin API command fetchindex for each replica in old
collection, pulling the index "from the right place" in the new
collection. It's important that there only be a single replica for
each shard active at this point. These two collection do _not_ need to
be part of the same SolrCloud, the fetchindex command just takes a URL
of the core to fetch from.
4> add the replicas back and let them replicate.

Your installation would be unavailable for searching during steps 2-4 of course.

Best,
Erick

On Thu, Oct 12, 2017 at 9:01 AM, Chris Ulicny  wrote:
> We tested the query on all replicas for the given shard, and they all have
> the same issue. So deleting and adding another replica won't fix the
> problem since the leader is exhibiting the behavior as well. I believe the
> second replica was moved (new one added, old one deleted) between nodes and
> so was just a copy of the leader's index after the problematic merge
> happened.
>
> bq: Anything that didn't merge old segments, just threw them
> away when empty (which was my idea) would possibly require as much
> disk space as the index currently occupied, so doesn't help your
> disk-constrained situation.
>
> Something like this was originally what I thought might fix the issue. If
> we reindex the data for the affected shard, it would possibly delete all
> docs from the old segments and just drop them instead of merging them. As
> mentioned, you'd expect the problems to persist through subsequent merges.
> So I've got two questions
>
> 1) If the problem persists through merges, does it only affect the segments
> being merged, and then when solr goes looking for the values, it comes up
> empty? Instead of all segments being affected by a single merge they
> weren't a part of.
>
> 2) Is it expected that any large tainted segments will eventually merge
> with clean segments resulting in more tainted segments as enough docs are
> deleted on the large segments?
>
> Also, we aren't disk constrained as much as previously. Reindexing a subset
> of docs is possible, but a full clean collection reindex isn't.
>
> Thanks,
> Chris
>
>
> On Thu, Oct 12, 2017 at 11:13 AM Erick Erickson 
> wrote:
>
>> Never mind. Anything that didn't merge old segments, just threw them
>> away when empty (which was my idea) would possibly require as much
>> disk space as the index currently occupied, so doesn't help your
>> disk-constrained situation.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson 
>> wrote:
>> > If it's _only_ on a particular replica, here's what you could do:
>> > Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
>> > define the "node" parameter on ADDREPLICA to get it back on the same
>> > node. Then the normal replication process would pull the entire index
>> > down from the leader.
>> >
>> > My bet, though, is that this wouldn't really fix things. While it fixes
>> the
>> > particular case you've noticed I'd guess others would pop up. You can
>> > see what replicas return what by firing individual queries at the
>> > particular replica in question with =false, something like
>> >
>> solr_server:port/solr/collection1_shard1_replica1/query?distrib=false
>> > blah blah
>> >
>> >
>> > bq: It is exceedingly unfortunate that reindexing the data on that shard
>> only
>> > probably won't end up fixing the problem
>> >
>> > Well, we've been working on the DWIM (Do What I Mean) feature for years,
>> > but progress has stalled.
>> >
>> > How would that work? You have two segments with vastly different
>> > characteristics for a field. You could change the type, the
>> multiValued-ness,
>> > the analysis chain, there's no end to the things that could go wrong.
>> Fixing
>> > them actually _is_ impossible given how Lucene is structured.
>> >
>> > H, you've now given me a brainstorm I'll suggest on the JIRA
>> > system after I talk to the dev list
>> >
>> > Consider indexed=true stored=false. After stemming, "running" can be
>> > indexed as "run". At merge time you have no way of knowing that

Re: Inconsistent results for facet queries

2017-10-12 Thread Chris Ulicny
We tested the query on all replicas for the given shard, and they all have
the same issue. So deleting and adding another replica won't fix the
problem since the leader is exhibiting the behavior as well. I believe the
second replica was moved (new one added, old one deleted) between nodes and
so was just a copy of the leader's index after the problematic merge
happened.

bq: Anything that didn't merge old segments, just threw them
away when empty (which was my idea) would possibly require as much
disk space as the index currently occupied, so doesn't help your
disk-constrained situation.

Something like this was originally what I thought might fix the issue. If
we reindex the data for the affected shard, it would possibly delete all
docs from the old segments and just drop them instead of merging them. As
mentioned, you'd expect the problems to persist through subsequent merges.
So I've got two questions

1) If the problem persists through merges, does it only affect the segments
being merged, and then when solr goes looking for the values, it comes up
empty? Instead of all segments being affected by a single merge they
weren't a part of.

2) Is it expected that any large tainted segments will eventually merge
with clean segments resulting in more tainted segments as enough docs are
deleted on the large segments?

Also, we aren't disk constrained as much as previously. Reindexing a subset
of docs is possible, but a full clean collection reindex isn't.

Thanks,
Chris


On Thu, Oct 12, 2017 at 11:13 AM Erick Erickson 
wrote:

> Never mind. Anything that didn't merge old segments, just threw them
> away when empty (which was my idea) would possibly require as much
> disk space as the index currently occupied, so doesn't help your
> disk-constrained situation.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson 
> wrote:
> > If it's _only_ on a particular replica, here's what you could do:
> > Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
> > define the "node" parameter on ADDREPLICA to get it back on the same
> > node. Then the normal replication process would pull the entire index
> > down from the leader.
> >
> > My bet, though, is that this wouldn't really fix things. While it fixes
> the
> > particular case you've noticed I'd guess others would pop up. You can
> > see what replicas return what by firing individual queries at the
> > particular replica in question with =false, something like
> >
> solr_server:port/solr/collection1_shard1_replica1/query?distrib=false
> > blah blah
> >
> >
> > bq: It is exceedingly unfortunate that reindexing the data on that shard
> only
> > probably won't end up fixing the problem
> >
> > Well, we've been working on the DWIM (Do What I Mean) feature for years,
> > but progress has stalled.
> >
> > How would that work? You have two segments with vastly different
> > characteristics for a field. You could change the type, the
> multiValued-ness,
> > the analysis chain, there's no end to the things that could go wrong.
> Fixing
> > them actually _is_ impossible given how Lucene is structured.
> >
> > H, you've now given me a brainstorm I'll suggest on the JIRA
> > system after I talk to the dev list
> >
> > Consider indexed=true stored=false. After stemming, "running" can be
> > indexed as "run". At merge time you have no way of knowing that
> > "running" was the original term so you simply couldn't fix it on merge,
> > not to mention that the performance penalty would be...er...
> > severe.
> >
> > Best,
> > Erick
> >
> > On Thu, Oct 12, 2017 at 5:53 AM, Chris Ulicny  wrote:
> >> I thought that decision would come back to bite us somehow. At the
> time, we
> >> didn't have enough space available to do a fresh reindex alongside the
> old
> >> collection, so the only course of action available was to index over the
> >> old one, and the vast majority of its use worked as expected.
> >>
> >> We're planning on upgrading to version 7 at some point in the near
> future
> >> and will have enough space to do a full, clean reindex at that time.
> >>
> >> bq: This can propagate through all following segment merges IIUC.
> >>
> >> It is exceedingly unfortunate that reindexing the data on that shard
> only
> >> probably won't end up fixing the problem.
> >>
> >> Out of curiosity, are there any good write-ups or documentation on how
> two
> >> (or more) lucene segments are merged, or is it just worth looking at the
> >> source code to figure that out?
> >>
> >> Thanks,
> >> Chris
> >>
> >> On Wed, Oct 11, 2017 at 6:55 PM Erick Erickson  >
> >> wrote:
> >>
> >>> bq: ...but the collection wasn't emptied first
> >>>
> >>> This is what I'd suspect is the problem. Here's the issue: Segments
> >>> aren't merged identically on all replicas. So at some point you had
> >>> this field indexed without docValues, changed that and re-indexed. But
> >>> 

Re: Inconsistent results for facet queries

2017-10-12 Thread Erick Erickson
Never mind. Anything that didn't merge old segments, just threw them
away when empty (which was my idea) would possibly require as much
disk space as the index currently occupied, so doesn't help your
disk-constrained situation.

Best,
Erick

On Thu, Oct 12, 2017 at 8:06 AM, Erick Erickson  wrote:
> If it's _only_ on a particular replica, here's what you could do:
> Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
> define the "node" parameter on ADDREPLICA to get it back on the same
> node. Then the normal replication process would pull the entire index
> down from the leader.
>
> My bet, though, is that this wouldn't really fix things. While it fixes the
> particular case you've noticed I'd guess others would pop up. You can
> see what replicas return what by firing individual queries at the
> particular replica in question with =false, something like
> solr_server:port/solr/collection1_shard1_replica1/query?distrib=false
> blah blah
>
>
> bq: It is exceedingly unfortunate that reindexing the data on that shard only
> probably won't end up fixing the problem
>
> Well, we've been working on the DWIM (Do What I Mean) feature for years,
> but progress has stalled.
>
> How would that work? You have two segments with vastly different
> characteristics for a field. You could change the type, the multiValued-ness,
> the analysis chain, there's no end to the things that could go wrong. Fixing
> them actually _is_ impossible given how Lucene is structured.
>
> H, you've now given me a brainstorm I'll suggest on the JIRA
> system after I talk to the dev list
>
> Consider indexed=true stored=false. After stemming, "running" can be
> indexed as "run". At merge time you have no way of knowing that
> "running" was the original term so you simply couldn't fix it on merge,
> not to mention that the performance penalty would be...er...
> severe.
>
> Best,
> Erick
>
> On Thu, Oct 12, 2017 at 5:53 AM, Chris Ulicny  wrote:
>> I thought that decision would come back to bite us somehow. At the time, we
>> didn't have enough space available to do a fresh reindex alongside the old
>> collection, so the only course of action available was to index over the
>> old one, and the vast majority of its use worked as expected.
>>
>> We're planning on upgrading to version 7 at some point in the near future
>> and will have enough space to do a full, clean reindex at that time.
>>
>> bq: This can propagate through all following segment merges IIUC.
>>
>> It is exceedingly unfortunate that reindexing the data on that shard only
>> probably won't end up fixing the problem.
>>
>> Out of curiosity, are there any good write-ups or documentation on how two
>> (or more) lucene segments are merged, or is it just worth looking at the
>> source code to figure that out?
>>
>> Thanks,
>> Chris
>>
>> On Wed, Oct 11, 2017 at 6:55 PM Erick Erickson 
>> wrote:
>>
>>> bq: ...but the collection wasn't emptied first
>>>
>>> This is what I'd suspect is the problem. Here's the issue: Segments
>>> aren't merged identically on all replicas. So at some point you had
>>> this field indexed without docValues, changed that and re-indexed. But
>>> the segment merging could "read" the first segment it's going to merge
>>> and think it knows about docValues for that field, when in fact that
>>> segment had the old (non-DV) definition.
>>>
>>> This would not necessarily be the same on all replicas even on the _same_
>>> shard.
>>>
>>> This can propagate through all following segment merges IIUC.
>>>
>>> So my bet is that if you index into a new collection, everything will
>>> be fine. You can also just delete everything first, but I usually
>>> prefer a new collection so I'm absolutely and positively sure that the
>>> above can't happen.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny  wrote:
>>> > Hi,
>>> >
>>> > We've run into a strange issue with our deployment of solrcloud 6.3.0.
>>> > Essentially, a standard facet query on a string field usually comes back
>>> > empty when it shouldn't. However, every now and again the query actually
>>> > returns the correct values. This is only affecting a single shard in our
>>> > setup.
>>> >
>>> > The behavior pattern generally looks like the query works properly when
>>> it
>>> > hasn't been run recently, and then returns nothing after the query seems
>>> to
>>> > have been cached (< 50ms QTime). Wait a while and you get the correct
>>> > result followed by blanks. It doesn't matter which replica of the shard
>>> is
>>> > queried; the results are the same.
>>> >
>>> > The general query in question looks like
>>> > /select?q=*:*=true=market=0=
>>> >
>>> > The field is defined in the schema as >> > docValues="true"/>
>>> >
>>> > There are numerous other fields defined similarly, and they do not
>>> exhibit
>>> > the same behavior when used as the facet.field value. They 

Re: Inconsistent results for facet queries

2017-10-12 Thread Erick Erickson
If it's _only_ on a particular replica, here's what you could do:
Just DELETEREPLICA on it, then ADDREPLICA to bring it back. You can
define the "node" parameter on ADDREPLICA to get it back on the same
node. Then the normal replication process would pull the entire index
down from the leader.

My bet, though, is that this wouldn't really fix things. While it fixes the
particular case you've noticed I'd guess others would pop up. You can
see what replicas return what by firing individual queries at the
particular replica in question with =false, something like
solr_server:port/solr/collection1_shard1_replica1/query?distrib=false
blah blah


bq: It is exceedingly unfortunate that reindexing the data on that shard only
probably won't end up fixing the problem

Well, we've been working on the DWIM (Do What I Mean) feature for years,
but progress has stalled.

How would that work? You have two segments with vastly different
characteristics for a field. You could change the type, the multiValued-ness,
the analysis chain, there's no end to the things that could go wrong. Fixing
them actually _is_ impossible given how Lucene is structured.

H, you've now given me a brainstorm I'll suggest on the JIRA
system after I talk to the dev list

Consider indexed=true stored=false. After stemming, "running" can be
indexed as "run". At merge time you have no way of knowing that
"running" was the original term so you simply couldn't fix it on merge,
not to mention that the performance penalty would be...er...
severe.

Best,
Erick

On Thu, Oct 12, 2017 at 5:53 AM, Chris Ulicny  wrote:
> I thought that decision would come back to bite us somehow. At the time, we
> didn't have enough space available to do a fresh reindex alongside the old
> collection, so the only course of action available was to index over the
> old one, and the vast majority of its use worked as expected.
>
> We're planning on upgrading to version 7 at some point in the near future
> and will have enough space to do a full, clean reindex at that time.
>
> bq: This can propagate through all following segment merges IIUC.
>
> It is exceedingly unfortunate that reindexing the data on that shard only
> probably won't end up fixing the problem.
>
> Out of curiosity, are there any good write-ups or documentation on how two
> (or more) lucene segments are merged, or is it just worth looking at the
> source code to figure that out?
>
> Thanks,
> Chris
>
> On Wed, Oct 11, 2017 at 6:55 PM Erick Erickson 
> wrote:
>
>> bq: ...but the collection wasn't emptied first
>>
>> This is what I'd suspect is the problem. Here's the issue: Segments
>> aren't merged identically on all replicas. So at some point you had
>> this field indexed without docValues, changed that and re-indexed. But
>> the segment merging could "read" the first segment it's going to merge
>> and think it knows about docValues for that field, when in fact that
>> segment had the old (non-DV) definition.
>>
>> This would not necessarily be the same on all replicas even on the _same_
>> shard.
>>
>> This can propagate through all following segment merges IIUC.
>>
>> So my bet is that if you index into a new collection, everything will
>> be fine. You can also just delete everything first, but I usually
>> prefer a new collection so I'm absolutely and positively sure that the
>> above can't happen.
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny  wrote:
>> > Hi,
>> >
>> > We've run into a strange issue with our deployment of solrcloud 6.3.0.
>> > Essentially, a standard facet query on a string field usually comes back
>> > empty when it shouldn't. However, every now and again the query actually
>> > returns the correct values. This is only affecting a single shard in our
>> > setup.
>> >
>> > The behavior pattern generally looks like the query works properly when
>> it
>> > hasn't been run recently, and then returns nothing after the query seems
>> to
>> > have been cached (< 50ms QTime). Wait a while and you get the correct
>> > result followed by blanks. It doesn't matter which replica of the shard
>> is
>> > queried; the results are the same.
>> >
>> > The general query in question looks like
>> > /select?q=*:*=true=market=0=
>> >
>> > The field is defined in the schema as > > docValues="true"/>
>> >
>> > There are numerous other fields defined similarly, and they do not
>> exhibit
>> > the same behavior when used as the facet.field value. They consistently
>> > return the right results on the shard in question.
>> >
>> > If we add facet.method=enum to the query, we get the correct results
>> every
>> > time (though slower. So our assumption is that something is sporadically
>> > working when the fc method is chosen by default.
>> >
>> > A few other notes about the collection. This collection is not freshly
>> > indexed, but has not had any particularly bad failures beyond follower
>> > replicas going down 

Re: Inconsistent results for facet queries

2017-10-12 Thread Chris Ulicny
I thought that decision would come back to bite us somehow. At the time, we
didn't have enough space available to do a fresh reindex alongside the old
collection, so the only course of action available was to index over the
old one, and the vast majority of its use worked as expected.

We're planning on upgrading to version 7 at some point in the near future
and will have enough space to do a full, clean reindex at that time.

bq: This can propagate through all following segment merges IIUC.

It is exceedingly unfortunate that reindexing the data on that shard only
probably won't end up fixing the problem.

Out of curiosity, are there any good write-ups or documentation on how two
(or more) lucene segments are merged, or is it just worth looking at the
source code to figure that out?

Thanks,
Chris

On Wed, Oct 11, 2017 at 6:55 PM Erick Erickson 
wrote:

> bq: ...but the collection wasn't emptied first
>
> This is what I'd suspect is the problem. Here's the issue: Segments
> aren't merged identically on all replicas. So at some point you had
> this field indexed without docValues, changed that and re-indexed. But
> the segment merging could "read" the first segment it's going to merge
> and think it knows about docValues for that field, when in fact that
> segment had the old (non-DV) definition.
>
> This would not necessarily be the same on all replicas even on the _same_
> shard.
>
> This can propagate through all following segment merges IIUC.
>
> So my bet is that if you index into a new collection, everything will
> be fine. You can also just delete everything first, but I usually
> prefer a new collection so I'm absolutely and positively sure that the
> above can't happen.
>
> Best,
> Erick
>
> On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny  wrote:
> > Hi,
> >
> > We've run into a strange issue with our deployment of solrcloud 6.3.0.
> > Essentially, a standard facet query on a string field usually comes back
> > empty when it shouldn't. However, every now and again the query actually
> > returns the correct values. This is only affecting a single shard in our
> > setup.
> >
> > The behavior pattern generally looks like the query works properly when
> it
> > hasn't been run recently, and then returns nothing after the query seems
> to
> > have been cached (< 50ms QTime). Wait a while and you get the correct
> > result followed by blanks. It doesn't matter which replica of the shard
> is
> > queried; the results are the same.
> >
> > The general query in question looks like
> > /select?q=*:*=true=market=0=
> >
> > The field is defined in the schema as  > docValues="true"/>
> >
> > There are numerous other fields defined similarly, and they do not
> exhibit
> > the same behavior when used as the facet.field value. They consistently
> > return the right results on the shard in question.
> >
> > If we add facet.method=enum to the query, we get the correct results
> every
> > time (though slower. So our assumption is that something is sporadically
> > working when the fc method is chosen by default.
> >
> > A few other notes about the collection. This collection is not freshly
> > indexed, but has not had any particularly bad failures beyond follower
> > replicas going down due to PKIAuthentication timeouts (has been fixed).
> It
> > has also had a full reindex after a schema change added docValues some
> > fields (including the one above), but the collection wasn't emptied
> first.
> > We are using the composite router to co-locate documents.
> >
> > Currently, our plan is just to reindex all of the documents on the
> affected
> > shard to see if that fixes the problem. Any ideas on what might be
> > happening or ways to troubleshoot this are appreciated.
> >
> > Thanks,
> > Chris
>


Re: Inconsistent results for facet queries

2017-10-11 Thread Erick Erickson
bq: ...but the collection wasn't emptied first

This is what I'd suspect is the problem. Here's the issue: Segments
aren't merged identically on all replicas. So at some point you had
this field indexed without docValues, changed that and re-indexed. But
the segment merging could "read" the first segment it's going to merge
and think it knows about docValues for that field, when in fact that
segment had the old (non-DV) definition.

This would not necessarily be the same on all replicas even on the _same_ shard.

This can propagate through all following segment merges IIUC.

So my bet is that if you index into a new collection, everything will
be fine. You can also just delete everything first, but I usually
prefer a new collection so I'm absolutely and positively sure that the
above can't happen.

Best,
Erick

On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny  wrote:
> Hi,
>
> We've run into a strange issue with our deployment of solrcloud 6.3.0.
> Essentially, a standard facet query on a string field usually comes back
> empty when it shouldn't. However, every now and again the query actually
> returns the correct values. This is only affecting a single shard in our
> setup.
>
> The behavior pattern generally looks like the query works properly when it
> hasn't been run recently, and then returns nothing after the query seems to
> have been cached (< 50ms QTime). Wait a while and you get the correct
> result followed by blanks. It doesn't matter which replica of the shard is
> queried; the results are the same.
>
> The general query in question looks like
> /select?q=*:*=true=market=0=
>
> The field is defined in the schema as  docValues="true"/>
>
> There are numerous other fields defined similarly, and they do not exhibit
> the same behavior when used as the facet.field value. They consistently
> return the right results on the shard in question.
>
> If we add facet.method=enum to the query, we get the correct results every
> time (though slower. So our assumption is that something is sporadically
> working when the fc method is chosen by default.
>
> A few other notes about the collection. This collection is not freshly
> indexed, but has not had any particularly bad failures beyond follower
> replicas going down due to PKIAuthentication timeouts (has been fixed). It
> has also had a full reindex after a schema change added docValues some
> fields (including the one above), but the collection wasn't emptied first.
> We are using the composite router to co-locate documents.
>
> Currently, our plan is just to reindex all of the documents on the affected
> shard to see if that fixes the problem. Any ideas on what might be
> happening or ways to troubleshoot this are appreciated.
>
> Thanks,
> Chris


Inconsistent results for facet queries

2017-10-11 Thread Chris Ulicny
Hi,

We've run into a strange issue with our deployment of solrcloud 6.3.0.
Essentially, a standard facet query on a string field usually comes back
empty when it shouldn't. However, every now and again the query actually
returns the correct values. This is only affecting a single shard in our
setup.

The behavior pattern generally looks like the query works properly when it
hasn't been run recently, and then returns nothing after the query seems to
have been cached (< 50ms QTime). Wait a while and you get the correct
result followed by blanks. It doesn't matter which replica of the shard is
queried; the results are the same.

The general query in question looks like
/select?q=*:*=true=market=0=

The field is defined in the schema as 

There are numerous other fields defined similarly, and they do not exhibit
the same behavior when used as the facet.field value. They consistently
return the right results on the shard in question.

If we add facet.method=enum to the query, we get the correct results every
time (though slower. So our assumption is that something is sporadically
working when the fc method is chosen by default.

A few other notes about the collection. This collection is not freshly
indexed, but has not had any particularly bad failures beyond follower
replicas going down due to PKIAuthentication timeouts (has been fixed). It
has also had a full reindex after a schema change added docValues some
fields (including the one above), but the collection wasn't emptied first.
We are using the composite router to co-locate documents.

Currently, our plan is just to reindex all of the documents on the affected
shard to see if that fixes the problem. Any ideas on what might be
happening or ways to troubleshoot this are appreciated.

Thanks,
Chris


Re: Facet queries blow out the filterCache

2015-10-28 Thread Jeff Wartes

FWIW, since it seemed like there was at least one bug here (and possibly
more), I filed
https://issues.apache.org/jira/browse/SOLR-8171



On 10/6/15, 3:58 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote:

>
>I dug far enough yesterday to find the GET_DOCSET, but not far enough to
>find why. Thanks, a little context is really helpful sometimes.
>
>
>So, starting with an empty filterCache...
>
>http://localhost:8983/solr/techproducts/select?q=name:foo=1=tru
>e
>=popularity
>
>New values:lookups: 0, hits: 0, inserts: 1, size: 1
>
>So for the reasons you explained, "inserts" is incremented for this new
>search
>
>http://localhost:8983/solr/techproducts/select?q=name:boo=1=tru
>e
>=popularity
>
>New values: inserts:   lookups: 0, hits: 0, inserts 2, size: 2
>
>
>Another new search, another new insert. No "lookups" though, so how does
>it know name:boo wasn’t cached?
>
>http://localhost:8983/solr/techproducts/select?q=name:boo=1=tru
>e
>=popularity
>New values: inserts:   lookups: 1, hits: 1, inserts: 2, size: 2
>
>
>But it clearly does know - when I repeat the search, I get both a lookup
>and a hit. (and no insert) So is this just
>a bug in the stats reporting, perhaps?
>
>
>When I first started looking at this, it was in a solrcloud cluster, and
>one interesting thing about that cluster is that it was configured with
>the queryResultCache turned off, so let’s repeat the above experiment
>without the queryResultCache. (I’m just commenting it out in the
>techproducts config for this run.)
>
>
>Starting with an empty filterCache...
>
>http://localhost:8983/solr/techproducts/select?q=name:foo=1=tru
>e
>=popularity
>New values:lookups: 0, hits: 0, inserts: 1, size: 1
>
>Same as before...
>
>http://localhost:8983/solr/techproducts/select?q=name:boo=1=tru
>e
>=popularity
>New values: inserts:   lookups: 0, hits: 0, inserts 2, size: 2
>
>Same as before...
>
>http://localhost:8983/solr/techproducts/select?q=name:boo=1=tru
>e
>=popularity
>New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2
>
>No cache hit! We get an insert instead, but it’s already in there, so the
>size doesn’t change. So disabling the queryResultCache apparently causes
>facet queries to be unable to use the filterCache?
>
>
>
>
>I’m increasingly thinking that different use cases need different
>filterCaches, rather than try to bundle every explicit or unexpected
>use-case under one cache with one size and one regenerator.
>
>
>
>
>
>
>On 10/6/15, 2:45 PM, "Chris Hostetter" <hossman_luc...@fucit.org> wrote:
>
>>: So, no SolrCloud, default example config, about as basic as you get. I
>>: didn’t even bother indexing any docs. Then I issued this query:
>>: 
>>: 
>>http://localhost:8983/solr/techproducts/select?q=name:foo=1=tr
>>u
>>e
>>: =popularity=0=-1
>>
>>: This still causes an insert into the filterCache.
>>
>>the faceting component is a type of operation that indicates in the
>>QueryCommand that it needs to GET_DOCSET for the set of all documents
>>matching the query (independent of pagination) -- the point of this
>>DocSet 
>>is so the faceting logic can then compute the intersection of the set of
>>all matching documents with the set of documents matching each facet
>>constraint.  the cached DocSet will be re-used both within the context
>>of the current request, and in future facet requests over the
>>same query+filters.
>>
>>: The only real difference I’m noticing vs my solrcloud collection is
>>that
>>: repeating the query increments cache lookups and hits. It’s still odd
>>: though, because issuing new distinct queries causes a reported insert,
>>but
>>: not a lookup, so the cache hit ratio is always exactly 1.
>>
>>i'm not following what you are saying at all ... can you give some
>>concrete examples (ie: "starting with an empty cache i do this request,
>>then i see these cache stats, then i do this identical/different query
>>and 
>>then the cache stats look like this...")
>>
>>
>>
>>-Hoss
>>http://www.lucidworks.com/
>



Re: Facet queries blow out the filterCache

2015-10-20 Thread Mikhail Khludnev
Jeff,
so far tests routine is reasonable, but since we count a facet, we expect
that filtering by one of this values is used at the following requests. I
suppose the next request with fq=popularity:1 or so might show reuse that
cached filter, but it's just my speculation.

On Tue, Oct 6, 2015 at 3:58 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> I dug far enough yesterday to find the GET_DOCSET, but not far enough to
> find why. Thanks, a little context is really helpful sometimes.
>
>
> So, starting with an empty filterCache...
>
> http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
> =popularity
>
> New values: lookups: 0, hits: 0, inserts: 1, size: 1
>
> So for the reasons you explained, "inserts" is incremented for this new
> search
>
> http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
> =popularity
>
> New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2
>
>
> Another new search, another new insert. No "lookups" though, so how does
> it know name:boo wasn’t cached?
>
> http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
> =popularity
> New values: inserts:lookups: 1, hits: 1, inserts: 2, size: 2
>
>
> But it clearly does know - when I repeat the search, I get both a lookup
> and a hit. (and no insert) So is this just
> a bug in the stats reporting, perhaps?
>
>
> When I first started looking at this, it was in a solrcloud cluster, and
> one interesting thing about that cluster is that it was configured with
> the queryResultCache turned off, so let’s repeat the above experiment
> without the queryResultCache. (I’m just commenting it out in the
> techproducts config for this run.)
>
>
> Starting with an empty filterCache...
>
> http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
> =popularity
> New values: lookups: 0, hits: 0, inserts: 1, size: 1
>
> Same as before...
>
> http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
> =popularity
> New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2
>
> Same as before...
>
> http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
> =popularity
> New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2
>
> No cache hit! We get an insert instead, but it’s already in there, so the
> size doesn’t change. So disabling the queryResultCache apparently causes
> facet queries to be unable to use the filterCache?
>
>
>
>
> I’m increasingly thinking that different use cases need different
> filterCaches, rather than try to bundle every explicit or unexpected
> use-case under one cache with one size and one regenerator.
>
>
>
>
>
>
> On 10/6/15, 2:45 PM, "Chris Hostetter" <hossman_luc...@fucit.org> wrote:
>
> >: So, no SolrCloud, default example config, about as basic as you get. I
> >: didn’t even bother indexing any docs. Then I issued this query:
> >:
> >:
> >
> http://localhost:8983/solr/techproducts/select?q=name:foo=1=tru
> >e
> >: =popularity=0=-1
> >
> >: This still causes an insert into the filterCache.
> >
> >the faceting component is a type of operation that indicates in the
> >QueryCommand that it needs to GET_DOCSET for the set of all documents
> >matching the query (independent of pagination) -- the point of this
> >DocSet
> >is so the faceting logic can then compute the intersection of the set of
> >all matching documents with the set of documents matching each facet
> >constraint.  the cached DocSet will be re-used both within the context
> >of the current request, and in future facet requests over the
> >same query+filters.
> >
> >: The only real difference I’m noticing vs my solrcloud collection is that
> >: repeating the query increments cache lookups and hits. It’s still odd
> >: though, because issuing new distinct queries causes a reported insert,
> >but
> >: not a lookup, so the cache hit ratio is always exactly 1.
> >
> >i'm not following what you are saying at all ... can you give some
> >concrete examples (ie: "starting with an empty cache i do this request,
> >then i see these cache stats, then i do this identical/different query
> >and
> >then the cache stats look like this...")
> >
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>


Re: Facet queries blow out the filterCache

2015-10-06 Thread Chris Hostetter
: So, no SolrCloud, default example config, about as basic as you get. I
: didn’t even bother indexing any docs. Then I issued this query:
: 
: http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
: =popularity=0=-1

: This still causes an insert into the filterCache.

the faceting component is a type of operation that indicates in the 
QueryCommand that it needs to GET_DOCSET for the set of all documents 
matching the query (independent of pagination) -- the point of this DocSet 
is so the faceting logic can then compute the intersection of the set of 
all matching documents with the set of documents matching each facet 
constraint.  the cached DocSet will be re-used both within the context 
of the current request, and in future facet requests over the 
same query+filters.

: The only real difference I’m noticing vs my solrcloud collection is that
: repeating the query increments cache lookups and hits. It’s still odd
: though, because issuing new distinct queries causes a reported insert, but
: not a lookup, so the cache hit ratio is always exactly 1.

i'm not following what you are saying at all ... can you give some 
concrete examples (ie: "starting with an empty cache i do this request, 
then i see these cache stats, then i do this identical/different query and 
then the cache stats look like this...")



-Hoss
http://www.lucidworks.com/

Re: Facet queries blow out the filterCache

2015-10-06 Thread Jeff Wartes

I dug far enough yesterday to find the GET_DOCSET, but not far enough to
find why. Thanks, a little context is really helpful sometimes.


So, starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
=popularity

New values: lookups: 0, hits: 0, inserts: 1, size: 1

So for the reasons you explained, "inserts" is incremented for this new
search

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity

New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2


Another new search, another new insert. No "lookups" though, so how does
it know name:boo wasn’t cached?

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts:lookups: 1, hits: 1, inserts: 2, size: 2


But it clearly does know - when I repeat the search, I get both a lookup
and a hit. (and no insert) So is this just
a bug in the stats reporting, perhaps?


When I first started looking at this, it was in a solrcloud cluster, and
one interesting thing about that cluster is that it was configured with
the queryResultCache turned off, so let’s repeat the above experiment
without the queryResultCache. (I’m just commenting it out in the
techproducts config for this run.)


Starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
=popularity
New values: lookups: 0, hits: 0, inserts: 1, size: 1

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2

No cache hit! We get an insert instead, but it’s already in there, so the
size doesn’t change. So disabling the queryResultCache apparently causes
facet queries to be unable to use the filterCache?




I’m increasingly thinking that different use cases need different
filterCaches, rather than try to bundle every explicit or unexpected
use-case under one cache with one size and one regenerator.






On 10/6/15, 2:45 PM, "Chris Hostetter" <hossman_luc...@fucit.org> wrote:

>: So, no SolrCloud, default example config, about as basic as you get. I
>: didn’t even bother indexing any docs. Then I issued this query:
>: 
>: 
>http://localhost:8983/solr/techproducts/select?q=name:foo=1=tru
>e
>: =popularity=0=-1
>
>: This still causes an insert into the filterCache.
>
>the faceting component is a type of operation that indicates in the
>QueryCommand that it needs to GET_DOCSET for the set of all documents
>matching the query (independent of pagination) -- the point of this
>DocSet 
>is so the faceting logic can then compute the intersection of the set of
>all matching documents with the set of documents matching each facet
>constraint.  the cached DocSet will be re-used both within the context
>of the current request, and in future facet requests over the
>same query+filters.
>
>: The only real difference I’m noticing vs my solrcloud collection is that
>: repeating the query increments cache lookups and hits. It’s still odd
>: though, because issuing new distinct queries causes a reported insert,
>but
>: not a lookup, so the cache hit ratio is always exactly 1.
>
>i'm not following what you are saying at all ... can you give some
>concrete examples (ie: "starting with an empty cache i do this request,
>then i see these cache stats, then i do this identical/different query
>and 
>then the cache stats look like this...")
>
>
>
>-Hoss
>http://www.lucidworks.com/



Re: Facet queries blow out the filterCache

2015-10-03 Thread Mikhail Khludnev
this insert is caused by
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1505

off-top thought:
showItems are useless, because now it looks like

   - item_name:foo:org.apache.solr.search.SortedIntDocSet@​2e1fbd46

   Shouldn't it be improved?


On Fri, Oct 2, 2015 at 11:58 PM, Jeff Wartes  wrote:

>
> I backed up a bit. I took the stock solr download and did this:
>
> solr-5.3.1>$ bin/solr -e techproducts
>
> So, no SolrCloud, default example config, about as basic as you get. I
> didn’t even bother indexing any docs. Then I issued this query:
>
> http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
> =popularity=0=-1
>
>
> This still causes an insert into the filterCache.
>
> The only real difference I’m noticing vs my solrcloud collection is that
> repeating the query increments cache lookups and hits. It’s still odd
> though, because issuing new distinct queries causes a reported insert, but
> not a lookup, so the cache hit ratio is always exactly 1.
>
>
>
> On 10/2/15, 4:18 AM, "Toke Eskildsen"  wrote:
>
> >On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote:
> >> It still inserts if I address the core directly and use distrib=false.
> >
> >It is quite strange that is is triggered with the direct access. If that
> >can be reproduced in test, it looks like a performance optimization to
> >be done.
> >
> >Anyway, operating under the assumption that the single-core facet
> >request for some reason acts as a distributed call, the key to avoid the
> >fine-counting is to ensure that _all_ possibly relevant term counts has
> >been returned in the first facet phase.
> >
> >Try setting both facet.mincount=0 and facet.limit=-1.
> >
> >- Toke Eskildsen, State and University Library, Denmark
> >
> >
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Facet queries blow out the filterCache

2015-10-02 Thread Charlie Hull

On 01/10/2015 23:31, Jeff Wartes wrote:

It still inserts if I address the core directly and use distrib=false.

I’ve got a few collections sharing the same config, so it’s surprisingly
annoying to
change solrconfig.xml right now, but it seemed pretty clear the query is
the thing being cached, since
the cache size only changes when the query does.


Hi Jeff,

I think you may be hitting the same issue we found:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/%3ccage-mlj+6y1at+ounk3sgacff6zgtjq_nin9_3shn0kfuqx...@mail.gmail.com%3E

Distributed faceting uses the filter cache, where you wouldn't expect it 
to. The solution was to set facet.limit to -1.


Best

Charlie




On 10/1/15, 3:01 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote:


hm..
This option was useful for introspecting cache content
https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
find-out a cause.
I'm still blaming distributed requests, it expained here
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re
questParameters
eg does it happen if you run with distrib=false?

On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jwar...@whitepages.com>
wrote:



No change, still shows an insert per-request. As does a simplified
request
with only the facet params
"=city=true"


by default it's 100
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Theface
t.limitParameter
and can cause filtering by values, it can be seen in logs, btw.



It’s definitely facet related though, facet=false eliminates the insert.



On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
wrote:


what if you set f.city.facet.limit=-1 ?

On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com>
wrote:



I’m doing some fairly simple facet queries in a two-shard 5.3

SolrCloud

index on fields like this:




q=...=id,score=city=true=1
.c
it
y.facet.limit=50=0=0=fc

(no, NOT facet.method=enum - the usage of the filterCache there is
pretty
well documented)

Watching the filterCache stats, it appears that every one of these
queries
causes the "inserts" counter to be incremented by one. Distinct "q="
queries also increase the "size", and eviction happens as normal. If

I

repeat the same query a few times, "lookups" is not incremented, so
these
entries generally appear to be completely wasted. (Although when
running a
lot of these queries, it appears as though a very small set also
increment
the "lookups" counter, but only a small set, and I haven’t figured

out

why
some are special.)

So the question is, why does this facet query have anything to do

with

the
filterCache? This causes a huge amount of filterCache churn with no
apparent benefit.





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>






--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>





--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Facet queries blow out the filterCache

2015-10-02 Thread Toke Eskildsen
On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote:
> It still inserts if I address the core directly and use distrib=false.

It is quite strange that is is triggered with the direct access. If that
can be reproduced in test, it looks like a performance optimization to
be done.

Anyway, operating under the assumption that the single-core facet
request for some reason acts as a distributed call, the key to avoid the
fine-counting is to ensure that _all_ possibly relevant term counts has
been returned in the first facet phase. 

Try setting both facet.mincount=0 and facet.limit=-1.

- Toke Eskildsen, State and University Library, Denmark




Re: Facet queries blow out the filterCache

2015-10-02 Thread Jeff Wartes

I backed up a bit. I took the stock solr download and did this:

solr-5.3.1>$ bin/solr -e techproducts

So, no SolrCloud, default example config, about as basic as you get. I
didn’t even bother indexing any docs. Then I issued this query:

http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
=popularity=0=-1


This still causes an insert into the filterCache.

The only real difference I’m noticing vs my solrcloud collection is that
repeating the query increments cache lookups and hits. It’s still odd
though, because issuing new distinct queries causes a reported insert, but
not a lookup, so the cache hit ratio is always exactly 1.



On 10/2/15, 4:18 AM, "Toke Eskildsen"  wrote:

>On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote:
>> It still inserts if I address the core directly and use distrib=false.
>
>It is quite strange that is is triggered with the direct access. If that
>can be reproduced in test, it looks like a performance optimization to
>be done.
>
>Anyway, operating under the assumption that the single-core facet
>request for some reason acts as a distributed call, the key to avoid the
>fine-counting is to ensure that _all_ possibly relevant term counts has
>been returned in the first facet phase.
>
>Try setting both facet.mincount=0 and facet.limit=-1.
>
>- Toke Eskildsen, State and University Library, Denmark
>
>



Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes

I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
index on fields like this:



Re: Facet queries blow out the filterCache

2015-10-01 Thread Mikhail Khludnev
what if you set f.city.facet.limit=-1 ?

On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
> index on fields like this:
>
>  docValues="true”/>
>
> that look something like this:
> q=...=id,score=city=true=1
> y.facet.limit=50=0=0=fc
>
> (no, NOT facet.method=enum - the usage of the filterCache there is pretty
> well documented)
>
> Watching the filterCache stats, it appears that every one of these queries
> causes the "inserts" counter to be incremented by one. Distinct "q="
> queries also increase the "size", and eviction happens as normal. If I
> repeat the same query a few times, "lookups" is not incremented, so these
> entries generally appear to be completely wasted. (Although when running a
> lot of these queries, it appears as though a very small set also increment
> the "lookups" counter, but only a small set, and I haven’t figured out why
> some are special.)
>
> So the question is, why does this facet query have anything to do with the
> filterCache? This causes a huge amount of filterCache churn with no
> apparent benefit.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>


Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes

No change, still shows an insert per-request. As does a simplified request
with only the facet params
"=city=true"

It’s definitely facet related though, facet=false eliminates the insert.



On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote:

>what if you set f.city.facet.limit=-1 ?
>
>On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com>
>wrote:
>
>>
>> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
>> index on fields like this:
>>
>> > docValues="true”/>
>>
>> that look something like this:
>> 
>>q=...=id,score=city=true=1
>>it
>> y.facet.limit=50=0=0=fc
>>
>> (no, NOT facet.method=enum - the usage of the filterCache there is
>>pretty
>> well documented)
>>
>> Watching the filterCache stats, it appears that every one of these
>>queries
>> causes the "inserts" counter to be incremented by one. Distinct "q="
>> queries also increase the "size", and eviction happens as normal. If I
>> repeat the same query a few times, "lookups" is not incremented, so
>>these
>> entries generally appear to be completely wasted. (Although when
>>running a
>> lot of these queries, it appears as though a very small set also
>>increment
>> the "lookups" counter, but only a small set, and I haven’t figured out
>>why
>> some are special.)
>>
>> So the question is, why does this facet query have anything to do with
>>the
>> filterCache? This causes a huge amount of filterCache churn with no
>> apparent benefit.
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
><http://www.griddynamics.com>
><mkhlud...@griddynamics.com>



Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
It still inserts if I address the core directly and use distrib=false.

I’ve got a few collections sharing the same config, so it’s surprisingly
annoying to
change solrconfig.xml right now, but it seemed pretty clear the query is
the thing being cached, since
the cache size only changes when the query does.



On 10/1/15, 3:01 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote:

>hm..
>This option was useful for introspecting cache content
>https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
>find-out a cause.
>I'm still blaming distributed requests, it expained here
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re
>questParameters
>eg does it happen if you run with distrib=false?
>
>On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jwar...@whitepages.com>
>wrote:
>
>>
>> No change, still shows an insert per-request. As does a simplified
>>request
>> with only the facet params
>> "=city=true"
>>
>by default it's 100
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Theface
>t.limitParameter
>and can cause filtering by values, it can be seen in logs, btw.
>
>>
>> It’s definitely facet related though, facet=false eliminates the insert.
>>
>>
>>
>> On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
>> wrote:
>>
>> >what if you set f.city.facet.limit=-1 ?
>> >
>> >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com>
>> >wrote:
>> >
>> >>
>> >> I’m doing some fairly simple facet queries in a two-shard 5.3
>>SolrCloud
>> >> index on fields like this:
>> >>
>> >> > >> docValues="true”/>
>> >>
>> >> that look something like this:
>> >>
>> 
>>>>q=...=id,score=city=true=1
>>>>.c
>> >>it
>> >> y.facet.limit=50=0=0=fc
>> >>
>> >> (no, NOT facet.method=enum - the usage of the filterCache there is
>> >>pretty
>> >> well documented)
>> >>
>> >> Watching the filterCache stats, it appears that every one of these
>> >>queries
>> >> causes the "inserts" counter to be incremented by one. Distinct "q="
>> >> queries also increase the "size", and eviction happens as normal. If
>>I
>> >> repeat the same query a few times, "lookups" is not incremented, so
>> >>these
>> >> entries generally appear to be completely wasted. (Although when
>> >>running a
>> >> lot of these queries, it appears as though a very small set also
>> >>increment
>> >> the "lookups" counter, but only a small set, and I haven’t figured
>>out
>> >>why
>> >> some are special.)
>> >>
>> >> So the question is, why does this facet query have anything to do
>>with
>> >>the
>> >> filterCache? This causes a huge amount of filterCache churn with no
>> >> apparent benefit.
>> >>
>> >>
>> >
>> >
>> >--
>> >Sincerely yours
>> >Mikhail Khludnev
>> >Principal Engineer,
>> >Grid Dynamics
>> >
>> ><http://www.griddynamics.com>
>> ><mkhlud...@griddynamics.com>
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
><http://www.griddynamics.com>
><mkhlud...@griddynamics.com>



Re: Facet queries blow out the filterCache

2015-10-01 Thread Mikhail Khludnev
hm..
This option was useful for introspecting cache content
https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
find-out a cause.
I'm still blaming distributed requests, it expained here
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-RequestParameters
eg does it happen if you run with distrib=false?

On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> No change, still shows an insert per-request. As does a simplified request
> with only the facet params
> "=city=true"
>
by default it's 100
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.limitParameter
and can cause filtering by values, it can be seen in logs, btw.

>
> It’s definitely facet related though, facet=false eliminates the insert.
>
>
>
> On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mkhlud...@griddynamics.com>
> wrote:
>
> >what if you set f.city.facet.limit=-1 ?
> >
> >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com>
> >wrote:
> >
> >>
> >> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
> >> index on fields like this:
> >>
> >>  >> docValues="true”/>
> >>
> >> that look something like this:
> >>
> >>q=...=id,score=city=true=1
> >>it
> >> y.facet.limit=50=0=0=fc
> >>
> >> (no, NOT facet.method=enum - the usage of the filterCache there is
> >>pretty
> >> well documented)
> >>
> >> Watching the filterCache stats, it appears that every one of these
> >>queries
> >> causes the "inserts" counter to be incremented by one. Distinct "q="
> >> queries also increase the "size", and eviction happens as normal. If I
> >> repeat the same query a few times, "lookups" is not incremented, so
> >>these
> >> entries generally appear to be completely wasted. (Although when
> >>running a
> >> lot of these queries, it appears as though a very small set also
> >>increment
> >> the "lookups" counter, but only a small set, and I haven’t figured out
> >>why
> >> some are special.)
> >>
> >> So the question is, why does this facet query have anything to do with
> >>the
> >> filterCache? This causes a huge amount of filterCache churn with no
> >> apparent benefit.
> >>
> >>
> >
> >
> >--
> >Sincerely yours
> >Mikhail Khludnev
> >Principal Engineer,
> >Grid Dynamics
> >
> ><http://www.griddynamics.com>
> ><mkhlud...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>


Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-18 Thread JoeSmith
Thank you.  That helped

On Tue, Jul 14, 2015 at 5:02 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Are there any examples/documentation for IntervalFaceting using dates
 that
 : I could refer to?

 You just specify the interval set start  end as properly formated date
 values.  This example shows some range faceting and interval faceting on
 the same field of the bin/solr -e techproducts example..


 http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.interval.set=[2006-01-01T00:00:00Z,2007-01-01T00:00:00Z]facet.interval.set=[2005-01-01T00:00:00Z,2006-01-01T00:00:00Z]facet.interval.set=[2005-01-01T00:00:00Z,2007-01-01T00:00:00Z]facet.interval=manufacturedate_dtfacet.range=manufacturedate_dtfacet.range.start=2005-01-01T00:00:00Zfacet.range.end=2007-01-01T00:00:00Zfacet.range.gap=%2B2MONTHS



 -Hoss
 http://www.lucidworks.com/



Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-14 Thread Chris Hostetter

: Are there any examples/documentation for IntervalFaceting using dates that
: I could refer to?

You just specify the interval set start  end as properly formated date 
values.  This example shows some range faceting and interval faceting on 
the same field of the bin/solr -e techproducts example..

http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.interval.set=[2006-01-01T00:00:00Z,2007-01-01T00:00:00Z]facet.interval.set=[2005-01-01T00:00:00Z,2006-01-01T00:00:00Z]facet.interval.set=[2005-01-01T00:00:00Z,2007-01-01T00:00:00Z]facet.interval=manufacturedate_dtfacet.range=manufacturedate_dtfacet.range.start=2005-01-01T00:00:00Zfacet.range.end=2007-01-01T00:00:00Zfacet.range.gap=%2B2MONTHS



-Hoss
http://www.lucidworks.com/


Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread JoeSmith
Are there any examples/documentation for IntervalFaceting using dates that
I could refer to?

On Mon, Jul 13, 2015 at 6:36 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Some of the buckets return with a count of ‘0’ in the bucket even though
 : the facet.range.min is set to ‘1’.  That is not the primary issue

 facet.range.min has never been a supported (or documented) param -- you
 are most likeley trying to use facet.mincount (which can be specified
 per field as a top level f.my_field_name.facet.mincount, or as a
 localparam, ex: facet.range={!facet.mincount=1}my_field_name

 : though. What I would like to get back are buckets of unevenly spaced
 : gaps.  For example, counts for the last 7 days, last 30 days, last 90
 : days.

 what you are describing is exactly what the Interval Faceting feature
 provides...


 https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting


 -Hoss
 http://www.lucidworks.com/


Re: Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread Chris Hostetter

: Some of the buckets return with a count of ‘0’ in the bucket even though 
: the facet.range.min is set to ‘1’.  That is not the primary issue 

facet.range.min has never been a supported (or documented) param -- you 
are most likeley trying to use facet.mincount (which can be specified 
per field as a top level f.my_field_name.facet.mincount, or as a 
localparam, ex: facet.range={!facet.mincount=1}my_field_name

: though. What I would like to get back are buckets of unevenly spaced 
: gaps.  For example, counts for the last 7 days, last 30 days, last 90 
: days.

what you are describing is exactly what the Interval Faceting feature 
provides...

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting


-Hoss
http://www.lucidworks.com/

Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread JoeSmith
I am trying to do a range facet query for on date ranges.  The query below
executes and returns results (almost) as desired for 60DAY buckets.



http://localhost:8983/solr/mykeyspace2.user_data/select?wt=jsonfq:id=7465033q=*:*rows=0indent=truefacet=onfacet.range=login_eventfacet.range.gap=%2B60DAYfacet.range.start=NOW/YEARfacet.range.end=NOW/MONTH%2B1MONTHfacet.range.min=1



Some of the buckets return with a count of  ‘0’ in the bucket even though
the facet.range.min is set to ‘1’.  That is not the primary issue though.
What I would like to get back are buckets of unevenly spaced gaps.  For
example,  counts for the last 7 days, last 30 days, last 90 days.


What would be the best way to accomplish this?And is there something
wrong with facet.range.min usage?


RE: Range Facet queries for date ranges with with non-constant gaps

2015-07-13 Thread Reitzel, Charles
Try facet.mincount=1.   It will still apply to range facets.

-Original Message-
From: JoeSmith [mailto:fidw...@gmail.com] 
Sent: Monday, July 13, 2015 5:56 PM
To: solr-user
Subject: Range Facet queries for date ranges with with non-constant gaps

I am trying to do a range facet query for on date ranges.  The query below 
executes and returns results (almost) as desired for 60DAY buckets.



http://localhost:8983/solr/mykeyspace2.user_data/select?wt=jsonfq:id=7465033q=*:*rows=0indent=truefacet=onfacet.range=login_eventfacet.range.gap=%2B60DAYfacet.range.start=NOW/YEARfacet.range.end=NOW/MONTH%2B1MONTHfacet.range.min=1



Some of the buckets return with a count of  ‘0’ in the bucket even though the 
facet.range.min is set to ‘1’.  That is not the primary issue though.
What I would like to get back are buckets of unevenly spaced gaps.  For 
example,  counts for the last 7 days, last 30 days, last 90 days.


What would be the best way to accomplish this?And is there something
wrong with facet.range.min usage?

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Sum as a Projection for Facet Queries

2013-07-01 Thread samarth s
Hi,

We have a need of finding the sum of a field for each facet.query. We have
looked at StatsComponent http://wiki.apache.org/solr/StatsComponent but
that supports only facet.field. Has anyone written a patch over
StatsComponent that supports the same along with some performance measures?

Is there any way we can do this using the Function Query -
Sumhttp://wiki.apache.org/solr/FunctionQuery#sum
?

-- 
Regards,
Samarth


Re: separation of indexes to optimize facet queries without fulltext

2012-08-03 Thread Mark Miller
Yes, you can have multiple indexes with solrcloud, same as with stand
alone. We call them collections.

On Thu, Jul 26, 2012 at 3:40 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Hi Chris,

 thanks for the answer.

 the plan is that in lots of queries I just need faceted values and
 don't even do a fulltext search.
 And on the other hand I need the fulltext search for exactly one
 task in my application, which is search documents and returning them.
 Here no faceting at all is need, but only filtering with fields,
 which i also use for the other queries.
 So if 95% of the queries don't use the fulltext i thought it would
 make sense to split them.

 Your suggestion to have one main master index and several slave indexes
 sounds promising. Is it possible to have this replication in SolrCloud e.g
 with different kind of schemas etc?

 Thanks. Daniel

 On Thu, Jul 26, 2012 at 9:05 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:

 
  : My thought was, that I could separate indexes. So for the facet queries
  : where I don't need
  : fulltext search (so also no indexed fulltext field) I can use a
  completely
  : new setup of a
  : sharded Solr which doesn't include the indexed fulltext, so the index
 is
  : kept small containing
  : just the few fields I have.
  :
  : And for the fulltext queries I have the current Solr configuration
 which
  : includes as mentioned
  : above all the fields incl. the index fulltext field.
  :
  : Is this a normal way of handling these requirements. That there are
  : different kind of
  : Solr configurations for the different needs? Because the huge
 redundancy
 
  It's definitley doable -- one thing i'm not clear on is why, if your
  faceting queries don't care about the full text, you would need to
 leave
  those small fields in your full index ... is your plan to do
  faceting and drill down using the smaller index, but then display docs
  resulting from those queries by using the same fq params when querying
  the full index ?
 
  if so then it should work, if not -- you may not need those fields in
 that
  index.
 
  In general there is nothing wrong with having multiple indexes to solve
  multiple usecases -- an index is usually an inverted denormalization of
  some structured source data designed for fast queries/retrieval.  If
 there
  are multiple distinct ways you want to query/retrieve data that don't
 lend
  themselves to the same denormalization, there's nothing wrong with
  multiple denormalizations.
 
  Something else to consider is an approach i've used many times: having a
  single index, but using special purpose replicas.  You can have a master
  index that you update at the rate of change, one set of slaves that are
  used for one type of query pattern (faceting on X, Y, and Z for example)
  and a differnet set of slaves that are used for a different query pattern
  (faceting on A, B, and C) so each set of slaves gets a higher cahce hit
  rate then if the queries were randomized across all machines
 
  -Hoss
 




-- 
- Mark

http://www.lucidimagination.com


Re: separation of indexes to optimize facet queries without fulltext

2012-07-26 Thread Chris Hostetter

: My thought was, that I could separate indexes. So for the facet queries
: where I don't need
: fulltext search (so also no indexed fulltext field) I can use a completely
: new setup of a
: sharded Solr which doesn't include the indexed fulltext, so the index is
: kept small containing
: just the few fields I have.
: 
: And for the fulltext queries I have the current Solr configuration which
: includes as mentioned
: above all the fields incl. the index fulltext field.
: 
: Is this a normal way of handling these requirements. That there are
: different kind of
: Solr configurations for the different needs? Because the huge redundancy

It's definitley doable -- one thing i'm not clear on is why, if your 
faceting queries don't care about the full text, you would need to leave 
those small fields in your full index ... is your plan to do 
faceting and drill down using the smaller index, but then display docs 
resulting from those queries by using the same fq params when querying 
the full index ?  

if so then it should work, if not -- you may not need those fields in that 
index.

In general there is nothing wrong with having multiple indexes to solve 
multiple usecases -- an index is usually an inverted denormalization of 
some structured source data designed for fast queries/retrieval.  If there 
are multiple distinct ways you want to query/retrieve data that don't lend 
themselves to the same denormalization, there's nothing wrong with 
multiple denormalizations.

Something else to consider is an approach i've used many times: having a 
single index, but using special purpose replicas.  You can have a master 
index that you update at the rate of change, one set of slaves that are 
used for one type of query pattern (faceting on X, Y, and Z for example) 
and a differnet set of slaves that are used for a different query pattern 
(faceting on A, B, and C) so each set of slaves gets a higher cahce hit 
rate then if the queries were randomized across all machines

-Hoss


Re: separation of indexes to optimize facet queries without fulltext

2012-07-26 Thread Daniel Brügge
Hi Chris,

thanks for the answer.

the plan is that in lots of queries I just need faceted values and
don't even do a fulltext search.
And on the other hand I need the fulltext search for exactly one
task in my application, which is search documents and returning them.
Here no faceting at all is need, but only filtering with fields,
which i also use for the other queries.
So if 95% of the queries don't use the fulltext i thought it would
make sense to split them.

Your suggestion to have one main master index and several slave indexes
sounds promising. Is it possible to have this replication in SolrCloud e.g
with different kind of schemas etc?

Thanks. Daniel

On Thu, Jul 26, 2012 at 9:05 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : My thought was, that I could separate indexes. So for the facet queries
 : where I don't need
 : fulltext search (so also no indexed fulltext field) I can use a
 completely
 : new setup of a
 : sharded Solr which doesn't include the indexed fulltext, so the index is
 : kept small containing
 : just the few fields I have.
 :
 : And for the fulltext queries I have the current Solr configuration which
 : includes as mentioned
 : above all the fields incl. the index fulltext field.
 :
 : Is this a normal way of handling these requirements. That there are
 : different kind of
 : Solr configurations for the different needs? Because the huge redundancy

 It's definitley doable -- one thing i'm not clear on is why, if your
 faceting queries don't care about the full text, you would need to leave
 those small fields in your full index ... is your plan to do
 faceting and drill down using the smaller index, but then display docs
 resulting from those queries by using the same fq params when querying
 the full index ?

 if so then it should work, if not -- you may not need those fields in that
 index.

 In general there is nothing wrong with having multiple indexes to solve
 multiple usecases -- an index is usually an inverted denormalization of
 some structured source data designed for fast queries/retrieval.  If there
 are multiple distinct ways you want to query/retrieve data that don't lend
 themselves to the same denormalization, there's nothing wrong with
 multiple denormalizations.

 Something else to consider is an approach i've used many times: having a
 single index, but using special purpose replicas.  You can have a master
 index that you update at the rate of change, one set of slaves that are
 used for one type of query pattern (faceting on X, Y, and Z for example)
 and a differnet set of slaves that are used for a different query pattern
 (faceting on A, B, and C) so each set of slaves gets a higher cahce hit
 rate then if the queries were randomized across all machines

 -Hoss



separation of indexes to optimize facet queries without fulltext

2012-07-25 Thread Daniel Brügge
Hi,

I have currently one big sharded Solr setup storing couple of million
documents
with some 'small' fields and one fulltext field in each doc. The latter
blows up the index.
My thought was, that I could separate indexes. So for the facet queries
where I don't need
fulltext search (so also no indexed fulltext field) I can use a completely
new setup of a
sharded Solr which doesn't include the indexed fulltext, so the index is
kept small containing
just the few fields I have.

And for the fulltext queries I have the current Solr configuration which
includes as mentioned
above all the fields incl. the index fulltext field.

Is this a normal way of handling these requirements. That there are
different kind of
Solr configurations for the different needs? Because the huge redundancy
scares
me a bit. I will have the fields twice.

Thanks in advance  greetings

Daniel


Re: UI support for Multi-Select Facet queries?

2011-12-09 Thread Erik Hatcher
No, multiselect is not wired into /browse.  That'd be a nice addition though.

Erik

On Dec 8, 2011, at 16:30 , PJ Shimmer wrote:

 Greetings,
 
 I see that we can query multiple facets for a search with a syntax like 
 fq=grade:A OR grade:B.  However, I only know how to do this by modifying 
 the URL parameter.  Is there a UI component that allows you to select 
 multiple facet values?  I'm thinking something like a checkbox next to each 
 value.  I'm using the default UI @ http://localhost:8983/solr/browse, and 
 didn't see an obvious way to do this.



Re: UI support for Multi-Select Facet queries?

2011-12-09 Thread PJ Shimmer
Thanks Erik.

Is there any available UI that supports multi-select faceting?




 From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Friday, December 9, 2011 9:46 AM
Subject: Re: UI support for Multi-Select Facet queries?
 
No, multiselect is not wired into /browse.  That'd be a nice addition though.

    Erik

On Dec 8, 2011, at 16:30 , PJ Shimmer wrote:

 Greetings,
 
 I see that we can query multiple facets for a search with a syntax like 
 fq=grade:A OR grade:B.  However, I only know how to do this by modifying 
 the URL parameter.  Is there a UI component that allows you to select 
 multiple facet values?  I'm thinking something like a checkbox next to each 
 value.  I'm using the default UI @ http://localhost:8983/solr/browse, and 
 didn't see an obvious way to do this.

Re: UI support for Multi-Select Facet queries?

2011-12-09 Thread Erik Hatcher
Nothing that I'm aware of, open-source-wise.  It'd be a fairly straightforward 
set of changes to the /browse templates though.

Blacklight - http://projectblacklight.org - is a nice UI on top of Solr, and 
I've seen some demos where multi-select faceting was customized into it, but I 
don't think the core of it yet has that.

We have it on our search system (in fact, multiselect was implemented 
specifically for this initially) - http://www.lucidimagination.com/search/

Are you looking for just something to play around with or something to go into 
production with?  No UI technology constraints?

Erik


On Dec 9, 2011, at 10:15 , PJ Shimmer wrote:

 Thanks Erik.
 
 Is there any available UI that supports multi-select faceting?
 
 
 
 
 From: Erik Hatcher erik.hatc...@gmail.com
 To: solr-user@lucene.apache.org 
 Sent: Friday, December 9, 2011 9:46 AM
 Subject: Re: UI support for Multi-Select Facet queries?
 
 No, multiselect is not wired into /browse.  That'd be a nice addition though.
 
 Erik
 
 On Dec 8, 2011, at 16:30 , PJ Shimmer wrote:
 
 Greetings,
 
 I see that we can query multiple facets for a search with a syntax like 
 fq=grade:A OR grade:B.  However, I only know how to do this by modifying 
 the URL parameter.  Is there a UI component that allows you to select 
 multiple facet values?  I'm thinking something like a checkbox next to each 
 value.  I'm using the default UI @ http://localhost:8983/solr/browse, and 
 didn't see an obvious way to do this.



UI support for Multi-Select Facet queries?

2011-12-08 Thread PJ Shimmer
Greetings,

I see that we can query multiple facets for a search with a syntax like 
fq=grade:A OR grade:B.  However, I only know how to do this by modifying the 
URL parameter.  Is there a UI component that allows you to select multiple 
facet values?  I'm thinking something like a checkbox next to each value.  I'm 
using the default UI @ http://localhost:8983/solr/browse, and didn't see an 
obvious way to do this.


naming facet queries?

2011-11-15 Thread Robert Stewart
Is there any way to give a name to a facet query, so you can pick
facet values from results using some name as a key (rather than
looking for match via the query itself)?

For example, in request handler I have:

str name=facet.querypublish_date:[NOW-7DAY TO NOW]/str
str name=facet.querypublish_date:[NOW-1MONTH TO NOW]/str

I'd like results to have names such as last_week and last_month.
Otherwise client code needs to know to lookup values using the actual
query as the key, and that can be subject to change in solrconfig.xml.

I'd like to be able to something like this in solr config:

str name=facet.query{!name=last_week}publish_date:[NOW-7DAY TO NOW]/str
str name=facet.query{!name=last_month}publish_date:[NOW-1MONTH TO NOW]/str

And then get this in results:

lst name=facet_counts
 lst name=facet_queries
   int name=last_week1/int
   int name=last_month15000/int
  /lst


Thanks
Bob


Re: naming facet queries?

2011-11-15 Thread Erik Hatcher
Yes... use key instead of name in your example below :)


http://wiki.apache.org/solr/SimpleFacetParameters#key_:_Changing_the_output_key


On Nov 15, 2011, at 15:12 , Robert Stewart wrote:

 Is there any way to give a name to a facet query, so you can pick
 facet values from results using some name as a key (rather than
 looking for match via the query itself)?
 
 For example, in request handler I have:
 
 str name=facet.querypublish_date:[NOW-7DAY TO NOW]/str
 str name=facet.querypublish_date:[NOW-1MONTH TO NOW]/str
 
 I'd like results to have names such as last_week and last_month.
 Otherwise client code needs to know to lookup values using the actual
 query as the key, and that can be subject to change in solrconfig.xml.
 
 I'd like to be able to something like this in solr config:
 
 str name=facet.query{!name=last_week}publish_date:[NOW-7DAY TO NOW]/str
 str name=facet.query{!name=last_month}publish_date:[NOW-1MONTH TO 
 NOW]/str
 
 And then get this in results:
 
 lst name=facet_counts
 lst name=facet_queries
   int name=last_week1/int
   int name=last_month15000/int
  /lst
 
 
 Thanks
 Bob



Solr 3.4 group.truncate does not work with facet queries

2011-10-28 Thread Ian Grainger
Hi, I'm using Grouping with group.truncate=true, The following simple facet
query:

facet.query=Monitor_id:[38 TO 40]

Doesn't give the same number as the nGroups result (with
grouping.ngroups=true) for the equivalent filter query:

fq=Monitor_id:[38 TO 40]

I thought they should be the same - from the Wiki page: 'group.truncate: If
true, facet counts are based on the most relevant document of each group
matching the query.'

What am I doing wrong?

If I turn off group.truncate then the counts are the same, as I'd expect -
but unfortunately I'm only interested in the grouped results.

- I have also asked this question on StackOverflow, here:
http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries

Thanks!

-- 
Ian

i...@isfluent.com a...@endissolutions.com
+44 (0)1223 257903


Re: Solr 3.4 group.truncate does not work with facet queries

2011-10-28 Thread Martijn v Groningen
Hi Ian,

I think this is a bug. After looking into the code the facet.query
feature doesn't take into account the group.truncate option.
This needs to be fixed. You can open a new issue in Jira if you want to.

Martijn

On 28 October 2011 12:09, Ian Grainger i...@isfluent.com wrote:
 Hi, I'm using Grouping with group.truncate=true, The following simple facet
 query:

 facet.query=Monitor_id:[38 TO 40]

 Doesn't give the same number as the nGroups result (with
 grouping.ngroups=true) for the equivalent filter query:

 fq=Monitor_id:[38 TO 40]

 I thought they should be the same - from the Wiki page: 'group.truncate: If
 true, facet counts are based on the most relevant document of each group
 matching the query.'

 What am I doing wrong?

 If I turn off group.truncate then the counts are the same, as I'd expect -
 but unfortunately I'm only interested in the grouped results.

 - I have also asked this question on StackOverflow, here:
 http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries

 Thanks!

 --
 Ian

 i...@isfluent.com a...@endissolutions.com
 +44 (0)1223 257903




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Solr 3.4 group.truncate does not work with facet queries

2011-10-28 Thread Ian Grainger
Thanks, Marijn. I have logged the bug here:
https://issues.apache.org/jira/browse/SOLR-2863

Is there any chance of a workaround for this issue before the bug is fixed?

If you want to answer the question on StackOverflow:
http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries
I'll
accept your answer.


On Fri, Oct 28, 2011 at 12:14 PM, Martijn v Groningen 
martijn.v.gronin...@gmail.com wrote:

 Hi Ian,

 I think this is a bug. After looking into the code the facet.query
 feature doesn't take into account the group.truncate option.
 This needs to be fixed. You can open a new issue in Jira if you want to.

 Martijn

 On 28 October 2011 12:09, Ian Grainger i...@isfluent.com wrote:
  Hi, I'm using Grouping with group.truncate=true, The following simple
 facet
  query:
 
  facet.query=Monitor_id:[38 TO 40]
 
  Doesn't give the same number as the nGroups result (with
  grouping.ngroups=true) for the equivalent filter query:
 
  fq=Monitor_id:[38 TO 40]
 
  I thought they should be the same - from the Wiki page: 'group.truncate:
 If
  true, facet counts are based on the most relevant document of each group
  matching the query.'
 
  What am I doing wrong?
 
  If I turn off group.truncate then the counts are the same, as I'd expect
 -
  but unfortunately I'm only interested in the grouped results.
 
  - I have also asked this question on StackOverflow, here:
 
 http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries
 
  Thanks!
 
  --
  Ian
 
  i...@isfluent.com a...@endissolutions.com
  +44 (0)1223 257903
 



 --
 Met vriendelijke groet,

 Martijn van Groningen




-- 
Ian

i...@isfluent.com a...@endissolutions.com
+44 (0)1223 257903


Memory used by facet queries

2010-11-11 Thread Charlie Gildawie
Hello All.

My first time post so be kind. Developing a document store with lots and lots 
of very small documents. (200 million at the moment. Final size will probably 
be double this at 400 million documents). This is Proof of concept development 
so we are seeing what a single code can do for us before we consider sharding. 
We'd rather not shard if we don't have to.

I'm using SOLR 4.0 (for the simple facet pivots and groups which work well).

We're into week 4 of our development and have the production servers etc set 
up. Everything working very well until we start to test queries with production 
volumes of data.

I'm running into Java Heap Space exceptions during simple faceting on inverted 
fields. The fields we are currently faceting on are names - Country / Continent 
/ City names all stored as a Solr.StringField (there are other fields using 
tokenization to provide initial search but we want to use the simple 
StringFields to provide faceted navigation). In total we have 10 fields we'd 
ever want to facet on (8 names fields that are strings and 2 Datepart fields 
(year and yearMonth) that are also strings)).

This is our first time using SOLR and I didn't realise that we'd need so much 
heap for facets!

Solr is running in tomcat container and I've currently set tomcat to use a max 
of

JAVA_OPTS=$JAVA_OPTS -server -Xms512m -Xmx3m

I've been reading all I can find online and have seen advice to populate the 
facets caches first as soon as we've started the solr service. However I'd 
really like to know if there are ways to reduce the memory footprint. We 
currently have 32g of physical ram. Adding more ram is an option but I'm being 
asked the (completely reasonable) question -- Why do you need so much?

Please help!

Charlie.


-Original Message-
From: Robert Gründler [mailto:rob...@dubture.com]
Sent: 11 November 2010 18:14
To: solr-user@lucene.apache.org
Subject: Re: Concatenate multiple tokens into one

I've posted a ConcaFilter in my previous mail which does concatenate tokens. 
This works fine, but i realized that what i wanted to achieve is implemented 
easier in another way (by using 2 separate field types).

Have a look at a previous mail i wrote to the list and the reply from Ahmet 
Arslan (topic: EdgeNGram relevancy).


best


-robert




See
On Nov 11, 2010, at 5:27 PM, Nick Martin wrote:

 Hi Robert, All,

 I have a similar problem, here is my fieldType,
 http://paste.pocoo.org/show/289910/
 I want to include stopword removal and lowercase the incoming terms. The idea 
 being to take, Foo Bar Baz Ltd and turn it into foobarbaz for the 
 EdgeNgram filter factory.
 If anyone can tell me a simple way to concatenate tokens into one token 
 again, similar too the KeyWordTokenizer that would be super helpful.

 Many thanks

 Nick

 On 11 Nov 2010, at 00:23, Robert Gründler wrote:


 On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote:

 Are you sure you really want to throw out stopwords for your use case?  I 
 don't think autocompletion will work how you want if you do.

 in our case i think it makes sense. the content is targetting the
 electronic music / dj scene, so we have a lot of words like DJ or 
 featuring which make sense to throw out of the query. Also searches for 
 the beastie boys and beastie boys should return a match in the 
 autocompletion.


 And if you don't... then why use the WhitespaceTokenizer and then try to 
 jam the tokens back together? Why not just NOT tokenize in the first place. 
 Use the KeywordTokenizer, which really should be called the 
 NonTokenizingTokenizer, becaues it doesn't tokenize at all, it just creates 
 one token from the entire input string.

 I started out with the KeywordTokenizer, which worked well, except the 
 StopWord problem.

 For now, i've come up with a quick-and-dirty custom ConcatFilter, which 
 does what i'm after:

 public class ConcatFilter extends TokenFilter {

  private TokenStream tstream;

  protected ConcatFilter(TokenStream input) {
  super(input);
  this.tstream = input;
  }

  @Override
  public Token next() throws IOException {

  Token token = new Token();
  StringBuilder builder = new StringBuilder();

  TermAttribute termAttribute = (TermAttribute) 
 tstream.getAttribute(TermAttribute.class);
  TypeAttribute typeAttribute = (TypeAttribute)
 tstream.getAttribute(TypeAttribute.class);

  boolean incremented = false;

  while (tstream.incrementToken()) {

  if (typeAttribute.type().equals(word)) {
  builder.append(termAttribute.term());
  }
  incremented = true;
  }

  token.setTermBuffer(builder.toString());

  if (incremented == true)
  return token;

  return null;
  }
 }

 I'm not sure if this is a safe way to do this, as i'm not 

OR facet queries?

2010-10-09 Thread Andy
I want to enable users to select multiple facet values for a specific facet 
fields. For example, if color is a facet field, I'd like to let users to 
select red OR blue.

Please note, I've set
solrQueryParser defaultOperator=AND /
because I want q=hello+world means hello and world are AND'ed together.

1) What is the syntax of doing that? Can I implement that by putting OR 
within the fq clause?
E.g.
facet=onfacet.field=colorfacet.field=size
fq=color:(red OR blue)
fq=size:(M OR L)

2) Is there a performance penalty associated with using OR on the facet 
values like that? If so how much of a penalty?

Thanks





  

Re: OR facet queries?

2010-10-09 Thread Ahmet Arslan
 I want to enable users to select
 multiple facet values for a specific facet fields. For
 example, if color is a facet field, I'd like to let users
 to select red OR blue.
 
 Please note, I've set
 solrQueryParser defaultOperator=AND /
 because I want q=hello+world means hello and world
 are AND'ed together.
 
 1) What is the syntax of doing that? Can I implement that
 by putting OR within the fq clause?
 E.g.
 facet=onfacet.field=colorfacet.field=size
 fq=color:(red OR blue)
 fq=size:(M OR L)

Yes you can do that filter queries.  

You may find this interesting. 
http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams





  


How to OR facet queries

2010-08-11 Thread Frank A
Hi,  I have 3 facet fields (A,B,C) the values of each facet field will
be shown as check boxes to users:

Field A
[x]  Val1a
[x]  Val2a
[]  Val3a

Field B
[x] Val1b
[] Val2b
[] Val3b

Within a field if the user selects two items I want the queries to be
an OR query.  Currently I'm generating something like:

fq=FieldA%3AVal1afq=FieldA%3AVal2afq=FieldB%3AVal1b

This is not working as the first two filter queries are 'and'ing.
What is the proper syntax to accomplish what I'm trying to do?

Thanks.


Re: How to OR facet queries

2010-08-11 Thread Geek Gamer
On Thu, Aug 12, 2010 at 7:12 AM, Frank A fsa...@gmail.com wrote:

 Hi,  I have 3 facet fields (A,B,C) the values of each facet field will
 be shown as check boxes to users:

 Field A
 [x]  Val1a
 [x]  Val2a
 []  Val3a

 Field B
 [x] Val1b
 [] Val2b
 [] Val3b

 Within a field if the user selects two items I want the queries to be
 an OR query.  Currently I'm generating something like:

 fq=FieldA%3AVal1afq=FieldA%3AVal2afq=FieldB%3AVal1b

fq=FieldA%3AVal1a%20OR%20FieldA%3AVal2afq=FieldB%3AVal1b


 This is not working as the first two filter queries are 'and'ing.
 What is the proper syntax to accomplish what I'm trying to do?

 Thanks.



Tagging Facet Queries -- Urgent Help Required

2010-05-28 Thread Ninad Raut
Hi All,
I have a use case where I have to tag facet queries.

Here is the code snippet for what I tried:
query.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
query.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
query.addFilterQuery({!tag=P}med:Review AND slev:neutral);
query.addFacetQuery({!tag=P key=Review}med:Review AND slev:neutral);

The result was {BLOG=0, Review=0}

but when I run separate queries :

query1.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
query1.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
 and
query2.addFilterQuery({!tag=P}med:Review AND slev:neutral);
query2.addFacetQuery({!tag=P key=Forum}med:Review AND slev:neutral);

I get correct results.
{BLOG=98} and {Forum=830} respectively.

I want to do this in a single query (with multiple facets). Is there some
other way of tagging facet queries?

Can any one help me with this?

Regards,
Ninad R


Re: Tagging Facet Queries -- Urgent Help Required

2010-05-28 Thread Erik Hatcher
You've tagged facet queries, but looks like you might want to use the  
exclude capability on your filter queries also.  Filter queries are  
additive, constraining the results further for each one, and by  
default faceting is based off the search results.  Use excl to have  
facets count outside the actual constrained search results.


Erik

On May 28, 2010, at 4:17 AM, Ninad Raut wrote:


Hi All,
I have a use case where I have to tag facet queries.

Here is the code snippet for what I tried:
query.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
query.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
query.addFilterQuery({!tag=P}med:Review AND slev:neutral);
query.addFacetQuery({!tag=P key=Review}med:Review AND slev:neutral);

The result was {BLOG=0, Review=0}

but when I run separate queries :

query1.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
query1.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
and
query2.addFilterQuery({!tag=P}med:Review AND slev:neutral);
query2.addFacetQuery({!tag=P key=Forum}med:Review AND slev:neutral);

I get correct results.
{BLOG=98} and {Forum=830} respectively.

I want to do this in a single query (with multiple facets). Is there  
some

other way of tagging facet queries?

Can any one help me with this?

Regards,
Ninad R




Re: Tagging Facet Queries -- Urgent Help Required

2010-05-28 Thread Ninad Raut
Thanks Erick,


On Fri, May 28, 2010 at 2:17 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 You've tagged facet queries, but looks like you might want to use the
 exclude capability on your filter queries also.  Filter queries are
 additive, constraining the results further for each one, and by default
 faceting is based off the search results.  Use excl to have facets count
 outside the actual constrained search results.

Erik


 On May 28, 2010, at 4:17 AM, Ninad Raut wrote:

  Hi All,
 I have a use case where I have to tag facet queries.

 Here is the code snippet for what I tried:
 query.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
 query.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
 query.addFilterQuery({!tag=P}med:Review AND slev:neutral);
 query.addFacetQuery({!tag=P key=Review}med:Review AND slev:neutral);

 The result was {BLOG=0, Review=0}

 but when I run separate queries :

 query1.addFilterQuery({!tag=NE}med:Blog AND slev:neutral);
 query1.addFacetQuery({!tag=NE key=BLOG}med:Blog AND slev:neutral);
 and
 query2.addFilterQuery({!tag=P}med:Review AND slev:neutral);
 query2.addFacetQuery({!tag=P key=Forum}med:Review AND slev:neutral);

 I get correct results.
 {BLOG=98} and {Forum=830} respectively.

 I want to do this in a single query (with multiple facets). Is there some
 other way of tagging facet queries?

 Can any one help me with this?

 Regards,
 Ninad R





Facet Queries

2010-05-14 Thread Rakhi Khatwani
Hi,
whn i use facet queries, whats the default size of the results
returned? how do we configure if we want all the results shown?

Regards
Raakhi


Re: Facet Queries

2010-05-14 Thread Leonardo Menezes
Hey,
there´s plenty of documentation about that...
http://wiki.apache.org/solr/SimpleFacetParameters#Field_Value_Faceting_Parameters

On Fri, May 14, 2010 at 10:38 AM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
whn i use facet queries, whats the default size of the results
 returned? how do we configure if we want all the results shown?

 Regards
 Raakhi



Re: Facet Queries

2010-05-14 Thread Rakhi Khatwani
Hi,
   Thanks a lot...had a look @ tht... it solved my problem

Thanks once again
Regards
Raakhi

On Fri, May 14, 2010 at 2:13 PM, Leonardo Menezes 
leonardo.menez...@googlemail.com wrote:

 Hey,
there´s plenty of documentation about that...

 http://wiki.apache.org/solr/SimpleFacetParameters#Field_Value_Faceting_Parameters

 On Fri, May 14, 2010 at 10:38 AM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:

  Hi,
 whn i use facet queries, whats the default size of the results
  returned? how do we configure if we want all the results shown?
 
  Regards
  Raakhi
 



Re: How does one sort facet queries?

2010-02-22 Thread Chris Hostetter
: All sorting of facets works great at the field level (count/index)...all good
: there...but how is sorting accomplished with range queries? The solrj
: response doesn't seem to maintain the order the queries are sent in, and the

The facet_queries section of the facet_counts is in the order that the 
facet.query params were provided to hte server -- if you're seeing them 
come back from SolrJ in a different order, then that may be a bug in SOlrJ 
(but that seems unlikely)

Honestly though: i'm not sure why the order should really matter that 
much: in almost any client app you have to know how to make sense of the 
query string (possibly using the key feature mentioned by gwk) in order 
to do anything with it, so the client app could just slurp them all in and 
re-order them however it's most convinient.


-Hoss



Re: How does one sort facet queries?

2010-02-19 Thread gwk

On 2/19/2010 2:15 AM, Kelly Taylor wrote:

All sorting of facets works great at the field level (count/index)...all good
there...but how is sorting accomplished with range queries? The solrj
response doesn't seem to maintain the order the queries are sent in, and the
order is not in index or count order. What's the trick?

http://localhost:8983/solr/select?q=someterm
   rows=0
   facet=true
   facet.limit=-1
   facet.query=price:[* TO 100]
   facet.query=price:[100 TO 200]
   facet.query=price:[200 TO 300]
   facet.query=price:[300 TO 400]
   facet.query=price:[400 TO 500]
   facet.query=price:[500 TO 600]
   facet.query=price:[600 TO 700]
   facet.query=price:[700 TO *]
   facet.mincount=1
   collapse.field=dedupe_hash
   collapse.threshold=1
   collapse.type=normal
   collapse.facet=before

   
The trick I use is to use LocalParams to give eacht facet query a well 
defined name. Afterwards you can loop through the names in whatever 
order you want.

so basically facet.query={!key=price_0}[* TO 100] etc.

N.B. the facet queries in your example will lead to some documents to be 
counted double (i.e. when the price is exactly 100, 200, 300).


Regards,

gwk


Re: Default Query Type For Facet Queries

2009-09-20 Thread Chris Hostetter

: You are right, SimpleFacets#getFacetQueryCounts has the following comment:
: 
: /* Ignore SolrParams.DF - could have init param facet.query assuming
:  * the schema default with query param DF intented to only affect Q.
:  * If user doesn't want schema default for facet.query, they should be
:  * explicit.
:  */

that comment is a red-herring...
 1) DF is the df param which changes the defaultField
 2) that comment refers to a line of code that is commented out because it 
predates the Qparser stuff

However, Stephen's comments that changing defType didn't affect 
facet.query jogged my memory about something...

  https://issues.apache.org/jira/browse/SOLR-1025

...based on yonik's comments there, it sems intentional that defType isn't 
inherited by more parsers the the main one (either in other top level 
params, or in nested parsers)

The sub parsers trick could be useful here, hardcode...
  facet.query={!custom v=$custom.facet.query}
...as an invariant in solrconfig and then have your users pass the 
custom.facet.query param instead of facet.query ... btu that won't 
work for multivalued params.

For facet.query we could add a new param to the FacetComponent to 
default/invariant the parser, in the more general case of all params that 
might want parsing perhaps we could think of a new option that would let 
people declare a prefix on all params of a given name -- or a new option 
on the sub-parser syntax that would force the main param to be cloned for 
each instance of the sub param, so that something like...
  a={!foo s=t v=$$x}x=1x=2b={!bar v=$y}y=8y=9
...would be equivilent to...
  a={!foo s=t}1a={!foo s=t}2b={!bar}8

(note the $$ in a and the $ in b)

...something like that would be pretty cool, but a pain in the ass to 
implement (because right now the pultiparam code is completely isolated 
from the query parser code) ... it's pie in the sky, but maybel it will 
help people think of simpler/better alternatives.



-Hoss



Re: Default Query Type For Facet Queries

2009-09-17 Thread Lance Norskog
There are also filter queries. Also, in the future we will also add
other query types for other features. Do we want them all to change?
Whatever we do, it should be consistent across all of the query types.

On Fri, Sep 11, 2009 at 9:33 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Sat, Sep 12, 2009 at 12:18 AM, Stephen Duncan Jr 
 stephen.dun...@gmail.com wrote:

 
 My experience (which is on a trunk build from a few weeks back of Solr
 2.4),
 is that changing the default parser for the handler does NOT change it for
 facet.query.  I had expected it would, but was disappointed.


 You are right, SimpleFacets#getFacetQueryCounts has the following comment:

 /* Ignore SolrParams.DF - could have init param facet.query assuming
     * the schema default with query param DF intented to only affect Q.
     * If user doesn't want schema default for facet.query, they should be
     * explicit.
     */

 I'm not sure if this should be changed. Hoss, what do you think?

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Lance Norskog
goks...@gmail.com


Re: Default Query Type For Facet Queries

2009-09-11 Thread Stephen Duncan Jr
I haven't experienced any such problems; it's just a query-parser plugin
that adds some behavior on top of the normal query parsing.  In any case,
even if I use a custom request handler with my custom parser, can I get
facet-queries to use this custom parser by default as well?

-Stephen

On Thu, Sep 10, 2009 at 11:30 PM, Lance Norskog goks...@gmail.com wrote:

 Changing basic defaults like this makes it very confusing to work with
 successive solr releases, to read the wiki, etc.

 You can make custom search requesthandlers - an example:

  requestHandler name=/custom class=solr.SearchHandler 
 lst name=invariants
 str name=defTypecustomparser/str

 http://localhost:8983/solr/custom?q=string_in_my_custom_language

 On 9/10/09, Stephen Duncan Jr stephen.dun...@gmail.com wrote:
  If using {!type=customparser} is the only way now, should I file an issue
 to
  make the default configurable?
 
  --
  Stephen Duncan Jr
  www.stephenduncanjr.com
 
  On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr 
 stephen.dun...@gmail.com
   wrote:
 
   We have a custom query parser plugin registered as the default for
   searches, and we'd like to have the same parser used for facet.query.
  
   Is there a way to register it as the default for FacetComponent in
   solrconfig.xml?
  
   I know I can add {!type=customparser} to each query as a workaround,
 but
   I'd rather register it in the config that make my code send that and
 strip
   it off on every facet query.
  
   --
   Stephen Duncan Jr
   www.stephenduncanjr.com
  
 



 --
 Lance Norskog
 goks...@gmail.com



Re: Default Query Type For Facet Queries

2009-09-11 Thread Chris Hostetter

: I haven't experienced any such problems; it's just a query-parser plugin
: that adds some behavior on top of the normal query parsing.  In any case,
: even if I use a custom request handler with my custom parser, can I get
: facet-queries to use this custom parser by default as well?

if you change teh default parser for the entire handler, it should be used 
for all query parsing that doesn't use the {!foo} syntax ... but to answer 
your orriginal question there is no way to set the default for facet.query 
independently of the main default -- that would require a patch to the 
FacetComponent to look at init params (where it could find some 
default/invarrient params that would override the main ones)


:Is there a way to register it as the default for FacetComponent in
:solrconfig.xml?



-Hoss



Re: Default Query Type For Facet Queries

2009-09-11 Thread Stephen Duncan Jr
On Fri, Sep 11, 2009 at 2:36 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I haven't experienced any such problems; it's just a query-parser plugin
 : that adds some behavior on top of the normal query parsing.  In any case,
 : even if I use a custom request handler with my custom parser, can I get
 : facet-queries to use this custom parser by default as well?

 if you change teh default parser for the entire handler, it should be used
 for all query parsing that doesn't use the {!foo} syntax ... but to answer
 your orriginal question there is no way to set the default for facet.query
 independently of the main default -- that would require a patch to the
 FacetComponent to look at init params (where it could find some
 default/invarrient params that would override the main ones)


 :Is there a way to register it as the default for FacetComponent in
 :solrconfig.xml?



 -Hoss


My experience (which is on a trunk build from a few weeks back of Solr 2.4),
is that changing the default parser for the handler does NOT change it for
facet.query.  I had expected it would, but was disappointed.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: Default Query Type For Facet Queries

2009-09-11 Thread Shalin Shekhar Mangar
On Sat, Sep 12, 2009 at 12:18 AM, Stephen Duncan Jr 
stephen.dun...@gmail.com wrote:

 
 My experience (which is on a trunk build from a few weeks back of Solr
 2.4),
 is that changing the default parser for the handler does NOT change it for
 facet.query.  I had expected it would, but was disappointed.


You are right, SimpleFacets#getFacetQueryCounts has the following comment:

/* Ignore SolrParams.DF - could have init param facet.query assuming
 * the schema default with query param DF intented to only affect Q.
 * If user doesn't want schema default for facet.query, they should be
 * explicit.
 */

I'm not sure if this should be changed. Hoss, what do you think?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Default Query Type For Facet Queries

2009-09-10 Thread Stephen Duncan Jr
If using {!type=customparser} is the only way now, should I file an issue to
make the default configurable?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr stephen.dun...@gmail.com
 wrote:

 We have a custom query parser plugin registered as the default for
 searches, and we'd like to have the same parser used for facet.query.

 Is there a way to register it as the default for FacetComponent in
 solrconfig.xml?

 I know I can add {!type=customparser} to each query as a workaround, but
 I'd rather register it in the config that make my code send that and strip
 it off on every facet query.

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com



Re: Default Query Type For Facet Queries

2009-09-10 Thread Lance Norskog
Changing basic defaults like this makes it very confusing to work with
successive solr releases, to read the wiki, etc.

You can make custom search requesthandlers - an example:

 requestHandler name=/custom class=solr.SearchHandler 
    lst name=invariants
    str name=defTypecustomparser/str

http://localhost:8983/solr/custom?q=string_in_my_custom_language

On 9/10/09, Stephen Duncan Jr stephen.dun...@gmail.com wrote:
 If using {!type=customparser} is the only way now, should I file an issue to
 make the default configurable?

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com

 On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr stephen.dun...@gmail.com
  wrote:

  We have a custom query parser plugin registered as the default for
  searches, and we'd like to have the same parser used for facet.query.
 
  Is there a way to register it as the default for FacetComponent in
  solrconfig.xml?
 
  I know I can add {!type=customparser} to each query as a workaround, but
  I'd rather register it in the config that make my code send that and strip
  it off on every facet query.
 
  --
  Stephen Duncan Jr
  www.stephenduncanjr.com
 




-- 
Lance Norskog
goks...@gmail.com


Default Query Type For Facet Queries

2009-09-03 Thread Stephen Duncan Jr
We have a custom query parser plugin registered as the default for searches,
and we'd like to have the same parser used for facet.query.

Is there a way to register it as the default for FacetComponent in
solrconfig.xml?

I know I can add {!type=customparser} to each query as a workaround, but I'd
rather register it in the config that make my code send that and strip it
off on every facet query.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com