Re: schema design question
(albums:query OR tracks:query) AND NOT(tracks:query -> albums:query) Is this it? That last clause does sound like a join. How do you shard? Is it possible to put all associated albums and tracks in one shard? You can then do a join query against each shard and merge the output yourself. On Fri, Apr 6, 2012 at 9:59 AM, Neal Tucker wrote: > Thanks, but I don't want to exclude all tracks that are associated > with albums, I want to exclude tracks that are associated with albums > *which match the query* (tracks and their associated albums may have > different tags). I don't think your suggestion covers that. > > On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson > wrote: >> I'd consider a field like "associated_with_album", and a >> field that identifies the kind of record this is "track or album". >> >> Then you can form a query like -associated_with_album:true >> (where '-' is the Lucene or NOT). >> >> And then group by kind to get separate groups of albums and >> tracks. >> >> Hope this helps >> Erick >> >> On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker >> wrote: >>> Apologies if this is a very straightforward schema design problem that >>> should be fairly obvious, but I'm not seeing a good way to do it. >>> Let's say I have an index that wants to model Albums and Tracks, and >>> they all have arbitrary tags attached to them (represented by >>> multivalue string type fields). Tracks also have an album id field >>> which can be used to associate them with an album. I'd like to >>> perform a query which shows both Track and Album results, but >>> suppresses Tracks that are associated with Albums in the result set. >>> >>> I am tempted to use a "join" here, but I have reservations because it >>> is my understanding that joins cannot work across shards, and I'm not >>> sure it's a good idea to limit myself in that way if possible. Any >>> suggestions? Is there a standard solution to this type of problem >>> where you've got hierarchical items and you don't want children shown >>> in the same result as the parent? -- Lance Norskog goks...@gmail.com
Re: schema design question
Thanks, but I don't want to exclude all tracks that are associated with albums, I want to exclude tracks that are associated with albums *which match the query* (tracks and their associated albums may have different tags). I don't think your suggestion covers that. On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson wrote: > I'd consider a field like "associated_with_album", and a > field that identifies the kind of record this is "track or album". > > Then you can form a query like -associated_with_album:true > (where '-' is the Lucene or NOT). > > And then group by kind to get separate groups of albums and > tracks. > > Hope this helps > Erick > > On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker > wrote: >> Apologies if this is a very straightforward schema design problem that >> should be fairly obvious, but I'm not seeing a good way to do it. >> Let's say I have an index that wants to model Albums and Tracks, and >> they all have arbitrary tags attached to them (represented by >> multivalue string type fields). Tracks also have an album id field >> which can be used to associate them with an album. I'd like to >> perform a query which shows both Track and Album results, but >> suppresses Tracks that are associated with Albums in the result set. >> >> I am tempted to use a "join" here, but I have reservations because it >> is my understanding that joins cannot work across shards, and I'm not >> sure it's a good idea to limit myself in that way if possible. Any >> suggestions? Is there a standard solution to this type of problem >> where you've got hierarchical items and you don't want children shown >> in the same result as the parent?
Re: schema design question
I'd consider a field like "associated_with_album", and a field that identifies the kind of record this is "track or album". Then you can form a query like -associated_with_album:true (where '-' is the Lucene or NOT). And then group by kind to get separate groups of albums and tracks. Hope this helps Erick On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker wrote: > Apologies if this is a very straightforward schema design problem that > should be fairly obvious, but I'm not seeing a good way to do it. > Let's say I have an index that wants to model Albums and Tracks, and > they all have arbitrary tags attached to them (represented by > multivalue string type fields). Tracks also have an album id field > which can be used to associate them with an album. I'd like to > perform a query which shows both Track and Album results, but > suppresses Tracks that are associated with Albums in the result set. > > I am tempted to use a "join" here, but I have reservations because it > is my understanding that joins cannot work across shards, and I'm not > sure it's a good idea to limit myself in that way if possible. Any > suggestions? Is there a standard solution to this type of problem > where you've got hierarchical items and you don't want children shown > in the same result as the parent?
Re: schema design question
I admit I just glanced at your problem statement, but three things come to mind... 1> have you looked at the "limited join" patch and would that work? 2> try searching the list for "hierarchical", very similar questions have been discussed before, although I don't quite remember the answers Best Erick On Sun, Aug 28, 2011 at 5:52 PM, Adeel Qureshi wrote: > Hi there > > I have a question regarding how to setup schema for some data. This data is > basically parent-child data for different types of records .. so > > a bunch of records representing projects and subprojects where each > subproject has a parent project .. and a project has many child sub projects > another bunch of records reprensenting data for projects and linked projects > .. same parent child relationship here > another bunch representing project and linked people .. > > so there are two ways I was thinking this kind of data can be indexed > > 1. create a single store called lets say CollectionData. use dynamic fields > to post all this different data but use a type field to identify the type of > records . e.g. to post two docs one representing project->linkedproject and > another project->linkedpeople info > > > 123 > LinkedProjects > child project name > child project status > ... > parent info > ... > > > > 123 > LinkedPeople > child person name > ... > parent info > ... > > > now from the same store I can run queries to get the different data while > restricting the resultset on one type of records using the fq param .. > > 2. approach would be to create multiple stores for each different type of > records .. with pretty much the same schema but now we dont need the type > field because linkedProjects are in a linkedProjects store and linkedPeople > are in linkedPeople store .. only drawback i guess is that you could have a > few stores > > my question to you guys is which approach makes more sense. I would > appreciate any comments. > > Thanks > Adeel >
RE: Schema Design Question
Ok thanks for the responses. My option #2 will be easier to implement than having the new doc with combinations so will give it a try. But that has opened my eyes to different possibilities! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, May 15, 2011 8:55 AM To: solr-user@lucene.apache.org Subject: Re: Schema Design Question Of your first two options, I'd go with a multi-valued field for each book (1). But kenf_nc's suggestion is a good one too. On Sun, May 15, 2011 at 3:54 AM, kenf_nc wrote: > create a separate document for each book-bookshelf combination. > doc 1 = book 1,shelf 1 > doc 2 = book 1,shelf 3 > doc 3 = book 2,shelf 1 > etc. > > then your queries are q=book_id to get all bookshelfs a given book > is on or q=shelf_id to get all books on a given bookshelf. > > Biggest problem people face with Solr schema design is thinking either > object orientedly or RDBMs orientedly. You need to think differently. > Solr/Lucene find text and they find it very fast over huge amounts of data. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p29 > 42809.html Sent from the Solr - User mailing list archive at > Nabble.com. >
Re: Schema Design Question
Of your first two options, I'd go with a multi-valued field for each book (1). But kenf_nc's suggestion is a good one too. On Sun, May 15, 2011 at 3:54 AM, kenf_nc wrote: > create a separate document for each book-bookshelf combination. > doc 1 = book 1,shelf 1 > doc 2 = book 1,shelf 3 > doc 3 = book 2,shelf 1 > etc. > > then your queries are q=book_id to get all bookshelfs a given book is on > or q=shelf_id to get all books on a given bookshelf. > > Biggest problem people face with Solr schema design is thinking either > object orientedly or RDBMs orientedly. You need to think differently. > Solr/Lucene find text and they find it very fast over huge amounts of data. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p2942809.html > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: Schema Design Question
create a separate document for each book-bookshelf combination. doc 1 = book 1,shelf 1 doc 2 = book 1,shelf 3 doc 3 = book 2,shelf 1 etc. then your queries are q=book_id to get all bookshelfs a given book is on or q=shelf_id to get all books on a given bookshelf. Biggest problem people face with Solr schema design is thinking either object orientedly or RDBMs orientedly. You need to think differently. Solr/Lucene find text and they find it very fast over huge amounts of data. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p2942809.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Schema Design Question
Thanks that looks interesting. Don't think it helps my situation though as I would have to index all the bookshelves and will still end up having to put thousands of Book ID values in a multi-value field. I guess the question I have is: Is it more appropriate to load a multi-value field with a large number of values or should you pass a large number of values in as a Boolean clause? Zac -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, May 13, 2011 10:37 AM To: solr-user@lucene.apache.org Subject: Re: Schema Design Question Hi Zac, Solr 4.0 (trunk) has support for relationships/JOIN. Have a look: http://search-lucene.com/?q=solr+join Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Zac Smith > To: "solr-user@lucene.apache.org" > Sent: Fri, May 13, 2011 12:28:35 PM > Subject: Schema Design Question > > Let's say I have a data model that involves books and bookshelves. I >have tens of thousands of books and thousands of bookshelves. There is >a many-many relationship between books & bookshelves. All of the books are >indexed by SOLR. > > I need to be able to query SOLR and get all the books for a given >bookshelf. I see two schema design options here: > > > 1) Each book has a multi-value field that contains a list of all the >bookshelf ID's. Many books will have thousands of bookshelf ID's. In >this case the query is simple, I just send solr the bookshelf ID. > > 2) I send solr a query with each book on the bookshelf e.g. >q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of >book ID's so the query can get rather large. > > Right now I am using option 2 and it seems to be working fine. I have >had to crank 'maxBooleanClauses' right up but it does seem to be pretty fast. > > Anyone have an opinion? > >
Re: Schema Design Question
Hi Zac, Solr 4.0 (trunk) has support for relationships/JOIN. Have a look: http://search-lucene.com/?q=solr+join Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Zac Smith > To: "solr-user@lucene.apache.org" > Sent: Fri, May 13, 2011 12:28:35 PM > Subject: Schema Design Question > > Let's say I have a data model that involves books and bookshelves. I have > tens >of thousands of books and thousands of bookshelves. There is a many-many >relationship between books & bookshelves. All of the books are indexed by >SOLR. > > I need to be able to query SOLR and get all the books for a given bookshelf. > I >see two schema design options here: > > > 1) Each book has a multi-value field that contains a list of all the >bookshelf ID's. Many books will have thousands of bookshelf ID's. In this case > >the query is simple, I just send solr the bookshelf ID. > > 2) I send solr a query with each book on the bookshelf e.g. >q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of book >ID's >so the query can get rather large. > > Right now I am using option 2 and it seems to be working fine. I have had to >crank 'maxBooleanClauses' right up but it does seem to be pretty fast. > > Anyone have an opinion? > >