RE: match count per shard and across shards

2011-01-31 Thread Chris Hostetter

: Interesting idea. I must investigate if this is a possibility - eg. how often
: will a document be reindexed from one shard to another - this is actually a
: possibility as a consequence of the way we configure our shards :-/
: 
: Thanks for the input! I was still hoping for a way to get that info from
: Solr. The idea is the same: facet the Solr-shard position of each
: document... 

you could configure this field with a 'default' attribute in the 
schema.xml which is differnet per shard and then never worry about it -- 
whatever machine it indexed on it will get that vlaue.

managing the different schema.xml's might be a pain (does system property 
substitution work on schema.xml? i can't remember) but the same thing 
could be done with a simple little UpdateProcessor.


-Hoss


Re: match count per shard and across shards

2011-01-30 Thread csj

Hi,

FYI:
I figured out a solution my self. I wanted a smart way to get the shard
count for a query (how many documents were found in each shard). The "smart"
consisted in having all these counts in just one query using faceting. I was
asking if Solr could help with this, e.g. had some smart info for shards, I
could facet out of the box. But apparently it does not.

But in my situation I can use my knowledge of how the shards are organised.
They are organised chronologically, and I happen to know the date
boundaries. 

My solution is simply to facet those boundaries. In this way I can query
once and include all known shards and have their count for the search. This
may have a performance penalty, but it is at least for now a simple way.

Christian Sonne Jensen
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2385061.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: match count per shard and across shards

2011-01-30 Thread Upayavira
Brilliant. So obvious.

Upayavira

On Sat, 29 Jan 2011 18:53 -0700, "Bob Sandiford"
 wrote:
> Or - you could add a standard field to each shard, populate with a
> distinct value for each shard, and facet on that field.  Then look at the
> facet counts of the value that corresponds to a shard, and, hey-presto,
> you're done...
> 
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com 
> 
> 
> > -Original Message-
> > From: Upayavira [mailto:u...@odoko.co.uk]
> > Sent: Saturday, January 29, 2011 6:52 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: match count per shard and across shards
> > 
> > To my knowledge, the distributed search functionality is intended to be
> > transparent, thus no details deriving from it are exposed (e.g. what
> > docs come from which shard), so, no, I don't believe it to be possible.
> > 
> > The only way I know right now that you could achieve it is by two (sets
> > of) queries. One would be a distributed search across all shards, and
> > the other would be a single hit to every shard. To fake such a facet,
> > this second set of queries would only need to ask for totals, so it
> > could use a rows=0.
> > 
> > Otherwise you'd have to enhance the distributed indexing code to expose
> > some of this information in its response.
> > 
> > Upayavira
> > 
> > On Sat, 29 Jan 2011 03:48 -0800, "csj" 
> > wrote:
> > >
> > > Hi,
> > >
> > > Is it possible to construct a Solr query that will return the total
> > > number
> > > of hits there across all shards, and at the same time getting the
> > number
> > > of
> > > hits per shard?
> > >
> > > I was thinking along the lines of a faceted search, but I'm not deep
> > > enough
> > > into Solr capabilities and query parameters to figure it out.
> > >
> > > Regards,
> > >
> > > Christian Sonne Jensen
> > >
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-
> > shards-tp2369627p2369627.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > ---
> > Enterprise Search Consultant at Sourcesense UK,
> > Making Sense of Open Source
> > 
> 
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: match count per shard and across shards

2011-01-29 Thread csj

I'm not sure I understand this. What is the difference between multible
indexes and multible shards?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382499.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: match count per shard and across shards

2011-01-29 Thread csj

Interesting idea. I must investigate if this is a possibility - eg. how often
will a document be reindexed from one shard to another - this is actually a
possibility as a consequence of the way we configure our shards :-/

Thanks for the input! I was still hoping for a way to get that info from
Solr. The idea is the same: facet the Solr-shard position of each
document... 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: match count per shard and across shards

2011-01-29 Thread csj

Indeed the distribution across shards should be transparent. In fact, as a
client I should not need to know anything about any shard. But as the
current state of Solr (1.4) dictate an interface where you - as a client -
must provide a list of shards, then the responsibility has been shiftet over
to the client.

Since we get so much data that we must add a new shard per month, we have to
be shard-aware on the client side. My understanding of Solr is that the
final reponse of a query is only finished when every shard in the querys
shard list has been consulted. This mean that the slowest ship defines the
speed, so to speak. Or worse - if any shard in the list fails, then the
response fails!

What I hope to achieve is a way of cutting shards off the list for a query.
If I more or less know how many hits a given query have in a shard, then I
could control paging myself, and only include shards I know will have the
documents in the shardlist for the query. Otherwise I'm afraid of
performance when we get to have dusins of shards.

So to summerise: We are developing a system where a given search will be
performed again and again over time on an ever-increasing document base. The
first time a search is done, it will be distributed across every shard in
order to get a total from beginning of time till the current timestamp of
the querys debute. This total is cached and hereafter maintained by querying
the most recent shards from the last date until now.
Mostly the documents come in a chronological order, but occasionally they
arrive out of order. The shards are organised by date intervals, and this
mean that every shard from time to time will be the target of more
documents. This will induce a slight discrepency between the cached total
and the actual total. But this is a discrepency that we can live with.
But I would also like to know how many hits there are in each individual
shard. If I know this, then I can tailormake a precise shardlist for the
query: Because I know the offset and pagesize of the query, and I know how
many documents are in each shard, then I can calculate which shards to
include. This is a lot of client side administration - I know, but I quess -
I hope - it will performe quite well...

Is this idea crazy or what?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382411.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: match count per shard and across shards

2011-01-29 Thread Bob Sandiford
Or - you could add a standard field to each shard, populate with a distinct 
value for each shard, and facet on that field.  Then look at the facet counts 
of the value that corresponds to a shard, and, hey-presto, you're done...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: Saturday, January 29, 2011 6:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: match count per shard and across shards
> 
> To my knowledge, the distributed search functionality is intended to be
> transparent, thus no details deriving from it are exposed (e.g. what
> docs come from which shard), so, no, I don't believe it to be possible.
> 
> The only way I know right now that you could achieve it is by two (sets
> of) queries. One would be a distributed search across all shards, and
> the other would be a single hit to every shard. To fake such a facet,
> this second set of queries would only need to ask for totals, so it
> could use a rows=0.
> 
> Otherwise you'd have to enhance the distributed indexing code to expose
> some of this information in its response.
> 
> Upayavira
> 
> On Sat, 29 Jan 2011 03:48 -0800, "csj" 
> wrote:
> >
> > Hi,
> >
> > Is it possible to construct a Solr query that will return the total
> > number
> > of hits there across all shards, and at the same time getting the
> number
> > of
> > hits per shard?
> >
> > I was thinking along the lines of a faceted search, but I'm not deep
> > enough
> > into Solr capabilities and query parameters to figure it out.
> >
> > Regards,
> >
> > Christian Sonne Jensen
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-
> shards-tp2369627p2369627.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
> 




Re: match count per shard and across shards

2011-01-29 Thread Dennis Gearon
Sounds like the interface level to achieve this is multiple indexes.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Upayavira 
To: solr-user@lucene.apache.org
Sent: Sat, January 29, 2011 3:51:45 PM
Subject: Re: match count per shard and across shards

To my knowledge, the distributed search functionality is intended to be
transparent, thus no details deriving from it are exposed (e.g. what
docs come from which shard), so, no, I don't believe it to be possible.

The only way I know right now that you could achieve it is by two (sets
of) queries. One would be a distributed search across all shards, and
the other would be a single hit to every shard. To fake such a facet,
this second set of queries would only need to ask for totals, so it
could use a rows=0.

Otherwise you'd have to enhance the distributed indexing code to expose
some of this information in its response.

Upayavira

On Sat, 29 Jan 2011 03:48 -0800, "csj" 
wrote:
> 
> Hi,
> 
> Is it possible to construct a Solr query that will return the total
> number
> of hits there across all shards, and at the same time getting the number
> of
> hits per shard?
> 
> I was thinking along the lines of a faceted search, but I'm not deep
> enough
> into Solr capabilities and query parameters to figure it out.
> 
> Regards,
> 
> Christian Sonne Jensen
> 
> -- 
> View this message in context:
>http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2369627.html
>l
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


Re: match count per shard and across shards

2011-01-29 Thread Upayavira
To my knowledge, the distributed search functionality is intended to be
transparent, thus no details deriving from it are exposed (e.g. what
docs come from which shard), so, no, I don't believe it to be possible.

The only way I know right now that you could achieve it is by two (sets
of) queries. One would be a distributed search across all shards, and
the other would be a single hit to every shard. To fake such a facet,
this second set of queries would only need to ask for totals, so it
could use a rows=0.

Otherwise you'd have to enhance the distributed indexing code to expose
some of this information in its response.

Upayavira

On Sat, 29 Jan 2011 03:48 -0800, "csj" 
wrote:
> 
> Hi,
> 
> Is it possible to construct a Solr query that will return the total
> number
> of hits there across all shards, and at the same time getting the number
> of
> hits per shard?
> 
> I was thinking along the lines of a faceted search, but I'm not deep
> enough
> into Solr capabilities and query parameters to figure it out.
> 
> Regards,
> 
> Christian Sonne Jensen
> 
> -- 
> View this message in context:
> http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2369627.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source