Re: schema design question

2012-04-06 Thread Lance Norskog
(albums:query OR tracks:query) AND NOT(tracks:query -> albums:query)

Is this it? That last clause does sound like a join.

How do you shard? Is it possible to put all associated albums and
tracks in one shard? You can then do a join query against each shard
and merge the output yourself.

On Fri, Apr 6, 2012 at 9:59 AM, Neal Tucker  wrote:
> Thanks, but I don't want to exclude all tracks that are associated
> with albums, I want to exclude tracks that are associated with albums
> *which match the query* (tracks and their associated albums may have
> different tags).  I don't think your suggestion covers that.
>
> On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson  
> wrote:
>> I'd consider a field like "associated_with_album", and a
>> field that identifies the kind of record this is "track or album".
>>
>> Then you can form a query like -associated_with_album:true
>> (where '-' is the Lucene or NOT).
>>
>> And then group by kind to get separate groups of albums and
>> tracks.
>>
>> Hope this helps
>> Erick
>>
>> On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
>>  wrote:
>>> Apologies if this is a very straightforward schema design problem that
>>> should be fairly obvious, but I'm not seeing a good way to do it.
>>> Let's say I have an index that wants to model Albums and Tracks, and
>>> they all have arbitrary tags attached to them (represented by
>>> multivalue string type fields).  Tracks also have an album id field
>>> which can be used to associate them with an album.  I'd like to
>>> perform a query which shows both Track and Album results, but
>>> suppresses Tracks that are associated with Albums in the result set.
>>>
>>> I am tempted to use a "join" here, but I have reservations because it
>>> is my understanding that joins cannot work across shards, and I'm not
>>> sure it's a good idea to limit myself in that way if possible.  Any
>>> suggestions?  Is there a standard solution to this type of problem
>>> where you've got hierarchical items and you don't want children shown
>>> in the same result as the parent?



-- 
Lance Norskog
goks...@gmail.com


Re: schema design question

2012-04-06 Thread Neal Tucker
Thanks, but I don't want to exclude all tracks that are associated
with albums, I want to exclude tracks that are associated with albums
*which match the query* (tracks and their associated albums may have
different tags).  I don't think your suggestion covers that.

On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson  wrote:
> I'd consider a field like "associated_with_album", and a
> field that identifies the kind of record this is "track or album".
>
> Then you can form a query like -associated_with_album:true
> (where '-' is the Lucene or NOT).
>
> And then group by kind to get separate groups of albums and
> tracks.
>
> Hope this helps
> Erick
>
> On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
>  wrote:
>> Apologies if this is a very straightforward schema design problem that
>> should be fairly obvious, but I'm not seeing a good way to do it.
>> Let's say I have an index that wants to model Albums and Tracks, and
>> they all have arbitrary tags attached to them (represented by
>> multivalue string type fields).  Tracks also have an album id field
>> which can be used to associate them with an album.  I'd like to
>> perform a query which shows both Track and Album results, but
>> suppresses Tracks that are associated with Albums in the result set.
>>
>> I am tempted to use a "join" here, but I have reservations because it
>> is my understanding that joins cannot work across shards, and I'm not
>> sure it's a good idea to limit myself in that way if possible.  Any
>> suggestions?  Is there a standard solution to this type of problem
>> where you've got hierarchical items and you don't want children shown
>> in the same result as the parent?


Re: schema design question

2012-04-06 Thread Erick Erickson
I'd consider a field like "associated_with_album", and a
field that identifies the kind of record this is "track or album".

Then you can form a query like -associated_with_album:true
(where '-' is the Lucene or NOT).

And then group by kind to get separate groups of albums and
tracks.

Hope this helps
Erick

On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
 wrote:
> Apologies if this is a very straightforward schema design problem that
> should be fairly obvious, but I'm not seeing a good way to do it.
> Let's say I have an index that wants to model Albums and Tracks, and
> they all have arbitrary tags attached to them (represented by
> multivalue string type fields).  Tracks also have an album id field
> which can be used to associate them with an album.  I'd like to
> perform a query which shows both Track and Album results, but
> suppresses Tracks that are associated with Albums in the result set.
>
> I am tempted to use a "join" here, but I have reservations because it
> is my understanding that joins cannot work across shards, and I'm not
> sure it's a good idea to limit myself in that way if possible.  Any
> suggestions?  Is there a standard solution to this type of problem
> where you've got hierarchical items and you don't want children shown
> in the same result as the parent?


Re: schema design question

2011-08-29 Thread Erick Erickson
I admit I just glanced at your problem statement, but
three things come to mind...

1> have you looked at the "limited join" patch and would
that work?

2> try searching the list for "hierarchical", very similar
questions have been discussed before, although I
don't quite remember the answers

Best
Erick

On Sun, Aug 28, 2011 at 5:52 PM, Adeel Qureshi  wrote:
> Hi there
>
> I have a question regarding how to setup schema for some data. This data is
> basically parent-child data for different types of records .. so
>
> a bunch of records representing projects and subprojects where each
> subproject has a parent project .. and a project has many child sub projects
> another bunch of records reprensenting data for projects and linked projects
> .. same parent child relationship here
> another bunch representing project and linked people ..
>
> so there are two ways I was thinking this kind of data can be indexed
>
> 1. create a single store called lets say CollectionData. use dynamic fields
> to post all this different data but use a type field to identify the type of
> records . e.g. to post two docs one representing project->linkedproject and
> another project->linkedpeople info
>
> 
> 123
> LinkedProjects
> child project name
> child project status
> ...
> parent info
> ...
> 
>
> 
> 123
> LinkedPeople
> child person name
> ...
> parent info
> ...
> 
>
> now from the same store I can run queries to get the different data while
> restricting the resultset on one type of records using the fq param ..
>
> 2. approach would be to create multiple stores for each different type of
> records .. with pretty much the same schema but now we dont need the type
> field because linkedProjects are in a linkedProjects store and linkedPeople
> are in linkedPeople store .. only drawback i guess is that you could have a
> few stores
>
> my question to you guys is which approach makes more sense. I would
> appreciate any comments.
>
> Thanks
> Adeel
>


RE: Schema Design Question

2011-05-15 Thread Zac Smith
Ok thanks for the responses. My option #2 will be easier to implement than 
having the new doc with combinations so will give it a try. But that has opened 
my eyes to different possibilities!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, May 15, 2011 8:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Of your first two options, I'd go with a multi-valued field for each book (1).

But kenf_nc's suggestion is a good one too.

On Sun, May 15, 2011 at 3:54 AM, kenf_nc  wrote:
> create a separate document for each book-bookshelf combination.
> doc 1 = book 1,shelf 1
> doc 2 = book 1,shelf 3
> doc 3 = book 2,shelf 1
> etc.
>
> then your queries are q=book_id   to get all bookshelfs a given book 
> is on or q=shelf_id to get all books on a given bookshelf.
>
> Biggest problem people face with Solr schema design is thinking either 
> object orientedly or RDBMs orientedly. You need to think differently.
> Solr/Lucene find text and they find it very fast over huge amounts of data.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p29
> 42809.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>


Re: Schema Design Question

2011-05-15 Thread Erick Erickson
Of your first two options, I'd go with a multi-valued field for each book (1).

But kenf_nc's suggestion is a good one too.

On Sun, May 15, 2011 at 3:54 AM, kenf_nc  wrote:
> create a separate document for each book-bookshelf combination.
> doc 1 = book 1,shelf 1
> doc 2 = book 1,shelf 3
> doc 3 = book 2,shelf 1
> etc.
>
> then your queries are q=book_id   to get all bookshelfs a given book is on
> or q=shelf_id to get all books on a given bookshelf.
>
> Biggest problem people face with Solr schema design is thinking either
> object orientedly or RDBMs orientedly. You need to think differently.
> Solr/Lucene find text and they find it very fast over huge amounts of data.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p2942809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Schema Design Question

2011-05-15 Thread kenf_nc
create a separate document for each book-bookshelf combination.
doc 1 = book 1,shelf 1
doc 2 = book 1,shelf 3
doc 3 = book 2,shelf 1
etc.

then your queries are q=book_id   to get all bookshelfs a given book is on
or q=shelf_id to get all books on a given bookshelf.

Biggest problem people face with Solr schema design is thinking either
object orientedly or RDBMs orientedly. You need to think differently.
Solr/Lucene find text and they find it very fast over huge amounts of data. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p2942809.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Schema Design Question

2011-05-13 Thread Zac Smith
Thanks that looks interesting. Don't think it helps my situation though as I 
would have to index all the bookshelves and will still end up having to put 
thousands of Book ID values in a multi-value field.

I guess the question I have is: Is it more appropriate to load a multi-value 
field with a large number of values or should you pass a large number of values 
in as a Boolean clause?

Zac

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, May 13, 2011 10:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Hi Zac,

Solr 4.0 (trunk) has support for relationships/JOIN.  Have a look: 
http://search-lucene.com/?q=solr+join

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/



- Original Message 
> From: Zac Smith 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, May 13, 2011 12:28:35 PM
> Subject: Schema Design Question
> 
> Let's say I have a data model that involves books and bookshelves. I 
>have tens of thousands of books and thousands of bookshelves. There is 
>a many-many relationship between books & bookshelves. All of the books are 
>indexed by  SOLR.
> 
> I need to be able to query SOLR and get all the books for a given  
>bookshelf. I see two schema design options here:
> 
> 
> 1)   Each book has a multi-value field that contains a list of all the  
>bookshelf ID's. Many books will have thousands of bookshelf ID's. In 
>this case the query is simple, I just send solr the bookshelf ID.
> 
> 2)   I send solr a query with each book on the bookshelf e.g.  
>q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of 
>book ID's so the query can get rather large.
> 
> Right now I am using option 2 and it  seems to be working fine. I have 
>had to crank 'maxBooleanClauses' right up but  it does seem to be pretty fast.
> 
> Anyone have an opinion?
> 
> 


Re: Schema Design Question

2011-05-13 Thread Otis Gospodnetic
Hi Zac,

Solr 4.0 (trunk) has support for relationships/JOIN.  Have a look: 
http://search-lucene.com/?q=solr+join

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Zac Smith 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, May 13, 2011 12:28:35 PM
> Subject: Schema Design Question
> 
> Let's say I have a data model that involves books and bookshelves. I have 
> tens  
>of thousands of books and thousands of bookshelves. There is a many-many  
>relationship between books & bookshelves. All of the books are indexed by  
>SOLR.
> 
> I need to be able to query SOLR and get all the books for a given  bookshelf. 
> I 
>see two schema design options here:
> 
> 
> 1)   Each book has a multi-value field that contains a list of all the  
>bookshelf ID's. Many books will have thousands of bookshelf ID's. In this case 
> 
>the query is simple, I just send solr the bookshelf ID.
> 
> 2)   I send solr a query with each book on the bookshelf e.g.  
>q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of book 
>ID's  
>so the query can get rather large.
> 
> Right now I am using option 2 and it  seems to be working fine. I have had to 
>crank 'maxBooleanClauses' right up but  it does seem to be pretty fast.
> 
> Anyone have an opinion?
> 
>