RE: Query on multivalue field

2011-03-01 Thread Steven A Rowe
Hi Scott,

Querying against a multi-valued field just works - no special incantation 
required.

Steve

 -Original Message-
 From: Scott Yeadon [mailto:scott.yea...@anu.edu.au]
 Sent: Monday, February 28, 2011 11:50 PM
 To: solr-user@lucene.apache.org
 Subject: Query on multivalue field
 
 Hi,
 
 I have a variable number of text-based fields associated with each
 primary record which I wanted to apply a search across. I wanted to
 avoid the use of dynamic fields if possible or having to create a
 different document type in the index (as the app is based around the
 primary record and different views mean a lot of work to revamp
 pagination etc).
 
 So, is there a way to apply a query to each value of a multivalued field
 or is it always treated as a single field from a query perspective?
 
 Thanks.
 
 Scott.


Re: Query on multivalue field

2011-03-01 Thread Scott Yeadon

Thanks, but just to confirm the way multiValued fields work:

In a multiValued field, call it field1, if I have two values indexed to 
this field, say value 1 = some text...termA...more text and value 2 = 
some text...termB...more text and do a search such as field1:(termA termB)
(where solrQueryParser defaultOperator=AND/) I'm getting a hit 
returned even though both terms don't occur within a single value in the 
multiValued field.


What I'm wondering is if there is a way of applying the query against 
each value of the field rather than against the field in its entirety. 
The reason being is the number of values I want to store is variable and 
I'd like to avoid the use of dynamic fields or restructuring the index 
if possible.


Scott.

On 2/03/11 12:35 AM, Steven A Rowe wrote:

Hi Scott,

Querying against a multi-valued field just works - no special incantation 
required.

Steve


-Original Message-
From: Scott Yeadon [mailto:scott.yea...@anu.edu.au]
Sent: Monday, February 28, 2011 11:50 PM
To:solr-user@lucene.apache.org
Subject: Query on multivalue field

Hi,

I have a variable number of text-based fields associated with each
primary record which I wanted to apply a search across. I wanted to
avoid the use of dynamic fields if possible or having to create a
different document type in the index (as the app is based around the
primary record and different views mean a lot of work to revamp
pagination etc).

So, is there a way to apply a query to each value of a multivalued field
or is it always treated as a single field from a query perspective?

Thanks.

Scott.




Re: Query on multivalue field

2011-03-01 Thread Ahmet Arslan
 In a multiValued field, call it field1, if I have two
 values indexed to 
 this field, say value 1 = some text...termA...more text
 and value 2 = 
 some text...termB...more text and do a search such as
 field1:(termA termB)
 (where solrQueryParser defaultOperator=AND/) I'm
 getting a hit 
 returned even though both terms don't occur within a single
 value in the 
 multiValued field.
 
 What I'm wondering is if there is a way of applying the
 query against 
 each value of the field rather than against the field in
 its entirety. 
 The reason being is the number of values I want to store is
 variable and 
 I'd like to avoid the use of dynamic fields or
 restructuring the index 
 if possible.

Your best bet can be using positionIncrementGap and to issue a phrase query 
(implicit AND) with the appropriate slop value. 

Ff you have positionIncrementGap=100, you can simulate this with using
q=field1:termA termB~100

http://search-lucene.com/m/Hbdvz1og7D71/


  


Re: Query on multivalue field

2011-03-01 Thread Scott Yeadon
The only trick with this is ensuring the searches return the right 
results and don't go across value boundaries. If I set the gap to the 
largest text size we expect (approx 5000 chars) what impact does such a 
large value have (i.e. does Solr physically separate these fragments in 
the index or just apply the figure as part of any query?


Scott.

On 2/03/11 9:01 AM, Ahmet Arslan wrote:

In a multiValued field, call it field1, if I have two
values indexed to
this field, say value 1 = some text...termA...more text
and value 2 =
some text...termB...more text and do a search such as
field1:(termA termB)
(wheresolrQueryParser defaultOperator=AND/) I'm
getting a hit
returned even though both terms don't occur within a single
value in the
multiValued field.

What I'm wondering is if there is a way of applying the
query against
each value of the field rather than against the field in
its entirety.
The reason being is the number of values I want to store is
variable and
I'd like to avoid the use of dynamic fields or
restructuring the index
if possible.

Your best bet can be using positionIncrementGap and to issue a phrase query 
(implicit AND) with the appropriate slop value.

Ff you have positionIncrementGap=100, you can simulate this with using
q=field1:termA termB~100

http://search-lucene.com/m/Hbdvz1og7D71/








Re: Query on multivalue field

2011-03-01 Thread Jonathan Rochkind
Each token has a position set on it. So if you index the value alpha 
beta gamma, it winds up stored in Solr as (sort of, for the way we want 
to look at it)


document1:
alpha:position 1
beta:position 2
gamma: postition 3

 If you set the position increment gap large, then after one value in a 
multi-valued field ends, the position increment gap will be added to the 
positions for the next value. Solr doesn't actually internally have much 
of any idea of a multi-valued field, ALL a multi-valued indexed field 
is, is a position increment gap seperating tokens from different 'values'.


So index in a multi-valued field, with position increment gap 1,  
the values:  [alpha beta gamma, aleph bet], you get kind of like:


document1:
alpha: 1
beta: 2
gamma: 3
aleph: 10004
bet: 10005

A large position increment gap, as far as I know and can tell (please 
someone correct me if I'm wrong, I am not a Solr developer) has no 
effect on the size or efficiency of your index on disk.


I am not sure why positionIncrementGap doesn't just default to a very 
large number, to provide behavior that more matches what people expect 
from the idea of a multi-valued field. So maybe there is some flaw in 
my understanding, that justifies some reason for it not to be this way?


But I set my positionIncrementGap very large, and haven't seen any issues.


On 3/1/2011 5:46 PM, Scott Yeadon wrote:

The only trick with this is ensuring the searches return the right
results and don't go across value boundaries. If I set the gap to the
largest text size we expect (approx 5000 chars) what impact does such a
large value have (i.e. does Solr physically separate these fragments in
the index or just apply the figure as part of any query?

Scott.

On 2/03/11 9:01 AM, Ahmet Arslan wrote:

In a multiValued field, call it field1, if I have two
values indexed to
this field, say value 1 = some text...termA...more text
and value 2 =
some text...termB...more text and do a search such as
field1:(termA termB)
(wheresolrQueryParser defaultOperator=AND/) I'm
getting a hit
returned even though both terms don't occur within a single
value in the
multiValued field.

What I'm wondering is if there is a way of applying the
query against
each value of the field rather than against the field in
its entirety.
The reason being is the number of values I want to store is
variable and
I'd like to avoid the use of dynamic fields or
restructuring the index
if possible.

Your best bet can be using positionIncrementGap and to issue a phrase query 
(implicit AND) with the appropriate slop value.

Ff you have positionIncrementGap=100, you can simulate this with using
q=field1:termA termB~100

http://search-lucene.com/m/Hbdvz1og7D71/








Re: Query on multivalue field

2011-03-01 Thread Scott Yeadon
Tested it out and seems to work well as long as I set the gap to a value 
much longer than the text - 1 appear to work fine for our current 
data. Thanks heaps for all the help guys!


Scott.

On 2/03/11 11:13 AM, Jonathan Rochkind wrote:
Each token has a position set on it. So if you index the value alpha 
beta gamma, it winds up stored in Solr as (sort of, for the way we 
want to look at it)


document1:
alpha:position 1
beta:position 2
gamma: postition 3

 If you set the position increment gap large, then after one value in 
a multi-valued field ends, the position increment gap will be added to 
the positions for the next value. Solr doesn't actually internally 
have much of any idea of a multi-valued field, ALL a multi-valued 
indexed field is, is a position increment gap seperating tokens from 
different 'values'.


So index in a multi-valued field, with position increment gap 1,  
the values:  [alpha beta gamma, aleph bet], you get kind of like:


document1:
alpha: 1
beta: 2
gamma: 3
aleph: 10004
bet: 10005

A large position increment gap, as far as I know and can tell (please 
someone correct me if I'm wrong, I am not a Solr developer) has no 
effect on the size or efficiency of your index on disk.


I am not sure why positionIncrementGap doesn't just default to a very 
large number, to provide behavior that more matches what people expect 
from the idea of a multi-valued field. So maybe there is some flaw 
in my understanding, that justifies some reason for it not to be this 
way?


But I set my positionIncrementGap very large, and haven't seen any 
issues.



On 3/1/2011 5:46 PM, Scott Yeadon wrote:

The only trick with this is ensuring the searches return the right
results and don't go across value boundaries. If I set the gap to the
largest text size we expect (approx 5000 chars) what impact does such a
large value have (i.e. does Solr physically separate these fragments in
the index or just apply the figure as part of any query?

Scott.

On 2/03/11 9:01 AM, Ahmet Arslan wrote:

In a multiValued field, call it field1, if I have two
values indexed to
this field, say value 1 = some text...termA...more text
and value 2 =
some text...termB...more text and do a search such as
field1:(termA termB)
(wheresolrQueryParser defaultOperator=AND/) I'm
getting a hit
returned even though both terms don't occur within a single
value in the
multiValued field.

What I'm wondering is if there is a way of applying the
query against
each value of the field rather than against the field in
its entirety.
The reason being is the number of values I want to store is
variable and
I'd like to avoid the use of dynamic fields or
restructuring the index
if possible.
Your best bet can be using positionIncrementGap and to issue a 
phrase query (implicit AND) with the appropriate slop value.


Ff you have positionIncrementGap=100, you can simulate this with 
using

q=field1:termA termB~100

http://search-lucene.com/m/Hbdvz1og7D71/