RE: match count per shard and across shards
: Interesting idea. I must investigate if this is a possibility - eg. how often : will a document be reindexed from one shard to another - this is actually a : possibility as a consequence of the way we configure our shards :-/ : : Thanks for the input! I was still hoping for a way to get that info from : Solr. The idea is the same: facet the Solr-shard position of each : document... you could configure this field with a 'default' attribute in the schema.xml which is differnet per shard and then never worry about it -- whatever machine it indexed on it will get that vlaue. managing the different schema.xml's might be a pain (does system property substitution work on schema.xml? i can't remember) but the same thing could be done with a simple little UpdateProcessor. -Hoss
Re: match count per shard and across shards
Hi, FYI: I figured out a solution my self. I wanted a smart way to get the shard count for a query (how many documents were found in each shard). The "smart" consisted in having all these counts in just one query using faceting. I was asking if Solr could help with this, e.g. had some smart info for shards, I could facet out of the box. But apparently it does not. But in my situation I can use my knowledge of how the shards are organised. They are organised chronologically, and I happen to know the date boundaries. My solution is simply to facet those boundaries. In this way I can query once and include all known shards and have their count for the search. This may have a performance penalty, but it is at least for now a simple way. Christian Sonne Jensen -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2385061.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: match count per shard and across shards
Brilliant. So obvious. Upayavira On Sat, 29 Jan 2011 18:53 -0700, "Bob Sandiford" wrote: > Or - you could add a standard field to each shard, populate with a > distinct value for each shard, and facet on that field. Then look at the > facet counts of the value that corresponds to a shard, and, hey-presto, > you're done... > > Bob Sandiford | Lead Software Engineer | SirsiDynix > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com > www.sirsidynix.com > > > > -Original Message- > > From: Upayavira [mailto:u...@odoko.co.uk] > > Sent: Saturday, January 29, 2011 6:52 PM > > To: solr-user@lucene.apache.org > > Subject: Re: match count per shard and across shards > > > > To my knowledge, the distributed search functionality is intended to be > > transparent, thus no details deriving from it are exposed (e.g. what > > docs come from which shard), so, no, I don't believe it to be possible. > > > > The only way I know right now that you could achieve it is by two (sets > > of) queries. One would be a distributed search across all shards, and > > the other would be a single hit to every shard. To fake such a facet, > > this second set of queries would only need to ask for totals, so it > > could use a rows=0. > > > > Otherwise you'd have to enhance the distributed indexing code to expose > > some of this information in its response. > > > > Upayavira > > > > On Sat, 29 Jan 2011 03:48 -0800, "csj" > > wrote: > > > > > > Hi, > > > > > > Is it possible to construct a Solr query that will return the total > > > number > > > of hits there across all shards, and at the same time getting the > > number > > > of > > > hits per shard? > > > > > > I was thinking along the lines of a faceted search, but I'm not deep > > > enough > > > into Solr capabilities and query parameters to figure it out. > > > > > > Regards, > > > > > > Christian Sonne Jensen > > > > > > -- > > > View this message in context: > > > http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across- > > shards-tp2369627p2369627.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > --- > > Enterprise Search Consultant at Sourcesense UK, > > Making Sense of Open Source > > > > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: match count per shard and across shards
I'm not sure I understand this. What is the difference between multible indexes and multible shards? -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382499.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: match count per shard and across shards
Interesting idea. I must investigate if this is a possibility - eg. how often will a document be reindexed from one shard to another - this is actually a possibility as a consequence of the way we configure our shards :-/ Thanks for the input! I was still hoping for a way to get that info from Solr. The idea is the same: facet the Solr-shard position of each document... -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: match count per shard and across shards
Indeed the distribution across shards should be transparent. In fact, as a client I should not need to know anything about any shard. But as the current state of Solr (1.4) dictate an interface where you - as a client - must provide a list of shards, then the responsibility has been shiftet over to the client. Since we get so much data that we must add a new shard per month, we have to be shard-aware on the client side. My understanding of Solr is that the final reponse of a query is only finished when every shard in the querys shard list has been consulted. This mean that the slowest ship defines the speed, so to speak. Or worse - if any shard in the list fails, then the response fails! What I hope to achieve is a way of cutting shards off the list for a query. If I more or less know how many hits a given query have in a shard, then I could control paging myself, and only include shards I know will have the documents in the shardlist for the query. Otherwise I'm afraid of performance when we get to have dusins of shards. So to summerise: We are developing a system where a given search will be performed again and again over time on an ever-increasing document base. The first time a search is done, it will be distributed across every shard in order to get a total from beginning of time till the current timestamp of the querys debute. This total is cached and hereafter maintained by querying the most recent shards from the last date until now. Mostly the documents come in a chronological order, but occasionally they arrive out of order. The shards are organised by date intervals, and this mean that every shard from time to time will be the target of more documents. This will induce a slight discrepency between the cached total and the actual total. But this is a discrepency that we can live with. But I would also like to know how many hits there are in each individual shard. If I know this, then I can tailormake a precise shardlist for the query: Because I know the offset and pagesize of the query, and I know how many documents are in each shard, then I can calculate which shards to include. This is a lot of client side administration - I know, but I quess - I hope - it will performe quite well... Is this idea crazy or what? -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382411.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: match count per shard and across shards
Or - you could add a standard field to each shard, populate with a distinct value for each shard, and facet on that field. Then look at the facet counts of the value that corresponds to a shard, and, hey-presto, you're done... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com > -Original Message- > From: Upayavira [mailto:u...@odoko.co.uk] > Sent: Saturday, January 29, 2011 6:52 PM > To: solr-user@lucene.apache.org > Subject: Re: match count per shard and across shards > > To my knowledge, the distributed search functionality is intended to be > transparent, thus no details deriving from it are exposed (e.g. what > docs come from which shard), so, no, I don't believe it to be possible. > > The only way I know right now that you could achieve it is by two (sets > of) queries. One would be a distributed search across all shards, and > the other would be a single hit to every shard. To fake such a facet, > this second set of queries would only need to ask for totals, so it > could use a rows=0. > > Otherwise you'd have to enhance the distributed indexing code to expose > some of this information in its response. > > Upayavira > > On Sat, 29 Jan 2011 03:48 -0800, "csj" > wrote: > > > > Hi, > > > > Is it possible to construct a Solr query that will return the total > > number > > of hits there across all shards, and at the same time getting the > number > > of > > hits per shard? > > > > I was thinking along the lines of a faceted search, but I'm not deep > > enough > > into Solr capabilities and query parameters to figure it out. > > > > Regards, > > > > Christian Sonne Jensen > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across- > shards-tp2369627p2369627.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > --- > Enterprise Search Consultant at Sourcesense UK, > Making Sense of Open Source >
Re: match count per shard and across shards
Sounds like the interface level to achieve this is multiple indexes. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Upayavira To: solr-user@lucene.apache.org Sent: Sat, January 29, 2011 3:51:45 PM Subject: Re: match count per shard and across shards To my knowledge, the distributed search functionality is intended to be transparent, thus no details deriving from it are exposed (e.g. what docs come from which shard), so, no, I don't believe it to be possible. The only way I know right now that you could achieve it is by two (sets of) queries. One would be a distributed search across all shards, and the other would be a single hit to every shard. To fake such a facet, this second set of queries would only need to ask for totals, so it could use a rows=0. Otherwise you'd have to enhance the distributed indexing code to expose some of this information in its response. Upayavira On Sat, 29 Jan 2011 03:48 -0800, "csj" wrote: > > Hi, > > Is it possible to construct a Solr query that will return the total > number > of hits there across all shards, and at the same time getting the number > of > hits per shard? > > I was thinking along the lines of a faceted search, but I'm not deep > enough > into Solr capabilities and query parameters to figure it out. > > Regards, > > Christian Sonne Jensen > > -- > View this message in context: >http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2369627.html >l > Sent from the Solr - User mailing list archive at Nabble.com. > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: match count per shard and across shards
To my knowledge, the distributed search functionality is intended to be transparent, thus no details deriving from it are exposed (e.g. what docs come from which shard), so, no, I don't believe it to be possible. The only way I know right now that you could achieve it is by two (sets of) queries. One would be a distributed search across all shards, and the other would be a single hit to every shard. To fake such a facet, this second set of queries would only need to ask for totals, so it could use a rows=0. Otherwise you'd have to enhance the distributed indexing code to expose some of this information in its response. Upayavira On Sat, 29 Jan 2011 03:48 -0800, "csj" wrote: > > Hi, > > Is it possible to construct a Solr query that will return the total > number > of hits there across all shards, and at the same time getting the number > of > hits per shard? > > I was thinking along the lines of a faceted search, but I'm not deep > enough > into Solr capabilities and query parameters to figure it out. > > Regards, > > Christian Sonne Jensen > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2369627.html > Sent from the Solr - User mailing list archive at Nabble.com. > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source