Re: Searching across multivalued fields

2009-06-19 Thread Michael Ludwig

MilkDud schrieb:

Michael Ludwig-4 wrote:

What do you expect the user to enter?

* "dream theater innocence faded" - certainly wrong
* dream theater "innocence faded" - much better


Most likely they would just enter dream theater innocence faded, no
quotes.  Without any quotes around any fields, which is a large cause
of the problem.  Now if i index on the track level, than all those
words would have to show up in just one track (including the album,
artist, and track name), which is expected.  If i index on the album
level however, now, those words just need to show up anywhere
throughout the entire album.


Give the user separate form fields, in this case, don't use DisMax, and
route each form field value to the appropriate field.

Or go with DisMax, it has the "mm" option to fine-tune how multiple
terms in the query should influence matching.


So, while it will match dream theater - innocence faded, it will also
match an album that has all the words dream theater innocence faded
mentioned across all tracks, which for small queries can be very
common.

Basically, I'm looking for a way to say match all the words in the
search query across the artist, album, and track name, but only
looking at one track (a multivalued field) at a time given a query
without any quotes. Does that make sense at all?


If that's your use case (which I may have been unable to see up to now),
then your approach of splitting up albums in tiny track documents makes
sense.


That is why I was leaning towards the track level index, such as: id,
artist, album, track (all single valued)


Yes, that makes sense. Good luck! (Off for a week now.)

Michael Ludwig


Re: Searching across multivalued fields

2009-06-18 Thread MilkDud


Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
> What do you expect the user to enter?
> 
> * "dream theater innocence faded" - certainly wrong
> * dream theater "innocence faded" - much better
> 
> Most likely they would just enter dream theater innocence faded, no
> quotes.  Without any quotes around any fields, which is a large cause of
> the problem.  Now if i index on the track level, than all those words
> would have to show up in just one track (including the album, artist, and
> track name), which is expected.  If i index on the album level however,
> now, those words just need to show up anywhere throughout the entire
> album.
> 
> So, while it will match dream theater - innocence faded, it will also
> match an album that has all the words dream theater innocence faded
> mentioned across all tracks, which for small queries can be very common.
> 
> Basically, I'm looking for a way to say match all the words in the search
> query across the artist, album, and track name, but only looking at one
> track (a multivalued field) at a time given a query without any quotes. 
> Does that make sense at all?
> 
> That is why I was leaning towards the track level index, such as:
> id, artist, album, track (all single valued)
> 
> as it does solve that problem, but then I have to deal with duplicate data
> being put in the artist/album fields (and a bunch of other fields).  Also,
> indexing on the album level poses further complications given that I also
> store the location to a track preview clip next to each track and keeping
> track of sets of data like that in solr is not really feasible.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24099668.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig

Hi Vicky,

Vicky_Dev schrieb:

We are also facing same problem mentioned in the post (we are using
dismaxrequesthandler)::



When we are searching for --q=prdTitle_s:"ladybird"&qt=dismax , we are
getting 2 results --  unique key ID =1000 and  unique key ID =1001


(1) Append debugQuery=true to your query and see how the DisMax query
parser rewrites your query, interpreting what you think is a field name
as just another query term.

(2) Proceed immediately to read the whole Wiki page explaining DisMax:

http://wiki.apache.org/solr/DisMaxRequestHandler


Is it possible to just exact match which is nothing but unique key =
1001?


Yes, it is:  q=id:1001

(1) Don't use DisMax here, that will not interpret field names.
(2) Replace "id" by whatever name you gave to your unique key field.

Michael Ludwig


Re: Searching across multivalued fields

2009-06-18 Thread Vicky_Dev

Hi Michel,

We are also facing same problem mentioned in the post (we are using
dismaxrequesthandler)::

Ex: There is product title field in which --possible values
1) in unique key ID =1000
prdTitle_s field contains value "ladybird classic"

2) in unique key ID =1001
prdTitle_s field contains value "ladybird" 

When we are searching for --q=prdTitle_s:"ladybird"&qt=dismax , we are
getting 2 results --  unique key ID =1000 and  unique key ID =1001 

Is it possible to just exact match which is nothing but unique key = 1001?

Note: by default mm value is 100% per Solr documentation

~Vikrant





Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
>> Ok, so lets suppose i did index across just the album.  Using that
>> index, how would I be able to handle searches of the form "artist name
>> track name".
> 
> What does the user interface look like? Do you have separate fields for
> artists and tracks? Or just one field?
> 
>> If i do the search using a phrase query, this won't match anything
>> because the artist and track are not in one field (hence my idea of
>> creating a third concatenated field).
> 
> What do you expect the user to enter?
> 
> * "dream theater innocence faded" - certainly wrong
> * dream theater "innocence faded" - much better
> 
> Use the DisMax query parser to read the query, as I suggested in my
> first reply. You need to become more familiar with the various search
> facilities, that will probably steer your ideas in more promising
> directions. Read up about DisMax.
> 
>> If i make it a non phrase query, itll return albums that have those
>> words across all the tracks, which is not ideal.  I.e. if you search
>> for a track titled "love me" you will get back albums with the words
>> love and me in different tracks.
> 
> That doesn't make sense me to me. Did you inspect your query using
> debugQuery=true as I suggested? What did it boil down to?
> 
>> Basically, i'd like it to look at each track individually
> 
> That tells me you're thinking database and table scan.
> 
>> and if the artist + just one track match all the search terms, then
>> that counts as a match.  Does that make sense?  If i index on the
>> track level, that should work, but then i have to store album/artist
>> info on each track.
> 
> I think the following makes much more sense:
> 
>>> An album should be a document and have the following fields (and
>>> maybe more, if you have more data attached to it):
>>>
>>> id - unique, an identifier
>>> title - album title
>>> interpret - the musician, possibly multi-valued
>>> track - every song or whatever, definitely multi-valued
> 
> Read up about multi-valued fields (sample schema.xml, for example, or
> Google) if you're unsure what this is; your posting subject, however,
> suggests you aren't.
> 
> Regards,
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24093897.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig

MilkDud schrieb:

Ok, so lets suppose i did index across just the album.  Using that
index, how would I be able to handle searches of the form "artist name
track name".


What does the user interface look like? Do you have separate fields for
artists and tracks? Or just one field?


If i do the search using a phrase query, this won't match anything
because the artist and track are not in one field (hence my idea of
creating a third concatenated field).


What do you expect the user to enter?

* "dream theater innocence faded" - certainly wrong
* dream theater "innocence faded" - much better

Use the DisMax query parser to read the query, as I suggested in my
first reply. You need to become more familiar with the various search
facilities, that will probably steer your ideas in more promising
directions. Read up about DisMax.


If i make it a non phrase query, itll return albums that have those
words across all the tracks, which is not ideal.  I.e. if you search
for a track titled "love me" you will get back albums with the words
love and me in different tracks.


That doesn't make sense me to me. Did you inspect your query using
debugQuery=true as I suggested? What did it boil down to?


Basically, i'd like it to look at each track individually


That tells me you're thinking database and table scan.


and if the artist + just one track match all the search terms, then
that counts as a match.  Does that make sense?  If i index on the
track level, that should work, but then i have to store album/artist
info on each track.


I think the following makes much more sense:


An album should be a document and have the following fields (and
maybe more, if you have more data attached to it):

id - unique, an identifier
title - album title
interpret - the musician, possibly multi-valued
track - every song or whatever, definitely multi-valued


Read up about multi-valued fields (sample schema.xml, for example, or
Google) if you're unsure what this is; your posting subject, however,
suggests you aren't.

Regards,

Michael Ludwig


Re: Searching across multivalued fields

2009-06-17 Thread MilkDud

Ok, so lets suppose i did index across just the album.  Using that index, how
would I be able to handle searches of the form "artist name track name".  If
i do the search using a phrase query, this won't match anything because the
artist and track are not in one field (hence my idea of creating a third
concatenated field).  If i make it a non phrase query, itll return albums
that have those words across all the tracks, which is not ideal.  I.e. if
you search for a track titled "love me" you will get back albums with the
words love and me in different tracks.  Basically, i'd like it to look at
each track individually and if the artist + just one track match all the
search terms, then that counts as a match.  Does that make sense?  If i
index on the track level, that should work, but then i have to store
album/artist info on each track. 


Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
> 
>> Basically, what I am trying to do is index a collection of music for
>> an online music store.  This contains information on the track, album,
>> and artist levels.  These are all different object types in the same
>> schema and it does contain a lot of redundant information.
> 
> What's a document in your case? If I were you, I'd probably organize
> the data so that each album is one document, because that's what you'd
> expect (shopping experience).
> 
>> For example, a track will have its own listing, but will show up again
>> in the album listing and the artist listing for the objects that own
>> that track.
> 
> Sounds a bit bizarre to me, but then I don't know much about your
> requirements.
> 
>> There are reasons it is done this way as we search/display across the
>> three differently.
> 
> Hmm.
> 
>> That said, I have thought of ways of just indexing tracks and
>> maintaining all the relevant information, but that seems to introduce
>> its own issues.
> 
> An album should be a document and have the following fields (and maybe
> more, if you have more data attached to it):
> 
> id - unique, an identifier
> title - album title
> interpret - the musician, possibly multi-valued
> track - every song or whatever, definitely multi-valued
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24079492.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig

MilkDud schrieb:


Basically, what I am trying to do is index a collection of music for
an online music store.  This contains information on the track, album,
and artist levels.  These are all different object types in the same
schema and it does contain a lot of redundant information.


What's a document in your case? If I were you, I'd probably organize
the data so that each album is one document, because that's what you'd
expect (shopping experience).


For example, a track will have its own listing, but will show up again
in the album listing and the artist listing for the objects that own
that track.


Sounds a bit bizarre to me, but then I don't know much about your
requirements.


There are reasons it is done this way as we search/display across the
three differently.


Hmm.


That said, I have thought of ways of just indexing tracks and
maintaining all the relevant information, but that seems to introduce
its own issues.


An album should be a document and have the following fields (and maybe
more, if you have more data attached to it):

id - unique, an identifier
title - album title
interpret - the musician, possibly multi-valued
track - every song or whatever, definitely multi-valued

Michael Ludwig


Re: Searching across multivalued fields

2009-06-17 Thread MilkDud

Sure.  To be clear, I am actually revamping an existing index, that I've
found numerous problems with so far.  Basically, what I am trying to do is
index a collection of music for an online music store.  This contains
information on the track, album, and artist levels.  These are all different
object types in the same schema and it does contain a lot of redundant
information.  For example, a track will have its own listing, but will show
up again in the album listing and the artist listing for the objects that
own that track.  There are reasons it is done this way as we search/display
across the three differently.  That said, I have thought of ways of just
indexing tracks and maintaining all the relevant information, but that seems
to introduce its own issues.

Thanks,
Jason


Erick Erickson wrote:
> 
> H. Could you expand a bit more on the problem you're trying
> to solve? The index organization you're hinting at seems close enough
> to a set of database tables to make me wonder if you're using an
> inappropriate index structure given the problem you want to solve.
> 
> Not that I know enough about your problem/solution to have a valid
> opinion, but there's at least a chance that this is an XY problem
> 
> Best
> Erick
> 
> On Wed, Jun 17, 2009 at 4:52 PM, MilkDud  wrote:
> 
>>
>> Yea, not using stopwords at all.  I do have tracks specified in the pf
>> param
>> along with a few other fields.  That said, with a phrase query I lose the
>> ability to search for an artist and track combined.  Two solutions i've
>> thought of include indexing at the track level only (right now i have
>> separate documents at the track, artist, and album level) or having a
>> field
>> that contains both the artist and track name concatenated, allowing for
>> phrase queries containing bother artist and track names.
>>
>>
>> Michael Ludwig-4 wrote:
>> >
>> > MilkDud schrieb:
>> >>
>> >> That part I understand and is what I have now.  It's the fact that
>> >> since tracks is multivalued, and i search for a track "love me", i
>> >> will also get back artists that have the words love and me in separate
>> >> tracks.
>> >
>> > Jason,
>> >
>> > are you sure "me" isn't in a stopword list used to analyze your query?
>> > Append debugQuery=true to find out whether by any chance it is removed
>> > from your query phrase. In that case, your phrase won't survive
>> parsing,
>> > and all you'll be left with is "love" :-)
>> >
>> > But I guess there are quite a lot of "love" titles :-)
>> >
>> >> Now with a phrase query with a small ps and a large posIncGap that
>> >> could word.  But then I lose the ability to search for artist and
>> >> track name together.
>> >
>> > Another thing, are you sure you have enabled "pf" for "track"?
>> >
>> > Michael Ludwig
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24076620.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24077360.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-17 Thread Erick Erickson
H. Could you expand a bit more on the problem you're trying
to solve? The index organization you're hinting at seems close enough
to a set of database tables to make me wonder if you're using an
inappropriate index structure given the problem you want to solve.

Not that I know enough about your problem/solution to have a valid
opinion, but there's at least a chance that this is an XY problem

Best
Erick

On Wed, Jun 17, 2009 at 4:52 PM, MilkDud  wrote:

>
> Yea, not using stopwords at all.  I do have tracks specified in the pf
> param
> along with a few other fields.  That said, with a phrase query I lose the
> ability to search for an artist and track combined.  Two solutions i've
> thought of include indexing at the track level only (right now i have
> separate documents at the track, artist, and album level) or having a field
> that contains both the artist and track name concatenated, allowing for
> phrase queries containing bother artist and track names.
>
>
> Michael Ludwig-4 wrote:
> >
> > MilkDud schrieb:
> >>
> >> That part I understand and is what I have now.  It's the fact that
> >> since tracks is multivalued, and i search for a track "love me", i
> >> will also get back artists that have the words love and me in separate
> >> tracks.
> >
> > Jason,
> >
> > are you sure "me" isn't in a stopword list used to analyze your query?
> > Append debugQuery=true to find out whether by any chance it is removed
> > from your query phrase. In that case, your phrase won't survive parsing,
> > and all you'll be left with is "love" :-)
> >
> > But I guess there are quite a lot of "love" titles :-)
> >
> >> Now with a phrase query with a small ps and a large posIncGap that
> >> could word.  But then I lose the ability to search for artist and
> >> track name together.
> >
> > Another thing, are you sure you have enabled "pf" for "track"?
> >
> > Michael Ludwig
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24076620.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Searching across multivalued fields

2009-06-17 Thread MilkDud

Yea, not using stopwords at all.  I do have tracks specified in the pf param
along with a few other fields.  That said, with a phrase query I lose the
ability to search for an artist and track combined.  Two solutions i've
thought of include indexing at the track level only (right now i have
separate documents at the track, artist, and album level) or having a field
that contains both the artist and track name concatenated, allowing for
phrase queries containing bother artist and track names.


Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
>>
>> That part I understand and is what I have now.  It's the fact that
>> since tracks is multivalued, and i search for a track "love me", i
>> will also get back artists that have the words love and me in separate
>> tracks.
> 
> Jason,
> 
> are you sure "me" isn't in a stopword list used to analyze your query?
> Append debugQuery=true to find out whether by any chance it is removed
> from your query phrase. In that case, your phrase won't survive parsing,
> and all you'll be left with is "love" :-)
> 
> But I guess there are quite a lot of "love" titles :-)
> 
>> Now with a phrase query with a small ps and a large posIncGap that
>> could word.  But then I lose the ability to search for artist and
>> track name together.
> 
> Another thing, are you sure you have enabled "pf" for "track"?
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24076620.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig

MilkDud schrieb:


That part I understand and is what I have now.  It's the fact that
since tracks is multivalued, and i search for a track "love me", i
will also get back artists that have the words love and me in separate
tracks.


Jason,

are you sure "me" isn't in a stopword list used to analyze your query?
Append debugQuery=true to find out whether by any chance it is removed
from your query phrase. In that case, your phrase won't survive parsing,
and all you'll be left with is "love" :-)

But I guess there are quite a lot of "love" titles :-)


Now with a phrase query with a small ps and a large posIncGap that
could word.  But then I lose the ability to search for artist and
track name together.


Another thing, are you sure you have enabled "pf" for "track"?

Michael Ludwig


Re: Searching across multivalued fields

2009-06-17 Thread MilkDud

Michael,

That part I understand and is what I have now.  It's the fact that since
tracks is multivalued, and i search for a track "love me", i will also get
back artists that have the words love and me in separate tracks.  Now with a
phrase query with a small ps and a large posIncGap that could word.  But
then I lose the ability to search for artist and track name together.

-Jason


Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
> 
>> To be more specific, I'm indexing a collection of music albums that
>> have multiple tracks and an album artist.  So, some searches will
>> contain both the artist name and the track name.  I can't make this a
>> single phrase query as it is indexed across two separate fields.
> 
> Use the DisMaxRequestHandler and specify all fields you want to use in
> your query in the qf parameter.
> 
>
> artist^3 album^2 track^1 
> 
> http://wiki.apache.org/solr/DisMaxRequestHandler
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24074933.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig

MilkDud schrieb:


To be more specific, I'm indexing a collection of music albums that
have multiple tracks and an album artist.  So, some searches will
contain both the artist name and the track name.  I can't make this a
single phrase query as it is indexed across two separate fields.


Use the DisMaxRequestHandler and specify all fields you want to use in
your query in the qf parameter.

  
   artist^3 album^2 track^1 

http://wiki.apache.org/solr/DisMaxRequestHandler

Michael Ludwig