Re: Phrase Query Problem?

2010-11-02 Thread Tod

On 11/1/2010 11:14 PM, Ken Stanley wrote:

On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com  wrote:


I have a number of fields I need to do an exact match on.  I've defined
them as 'string' in my schema.xml.  I've noticed that I get back query
results that don't have all of the words I'm using to search with.

For example:


q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

Should, with an exact match, return only one entry but it returns five some
of which don't have any of the fields I've specified.  I've tried this both
with and without quotes.

What could I be doing wrong?


Thanks - Tod




Tod,

Without knowing your exact field definition, my first guess would be your
first boolean query; because it is not quoted, what SOLR typically does is
to transform that type of query into something like (assuming your uniqueKey
is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do
(mykeywords:Compliance+With+Conduct+Standards) you might see different
(better?) results. Otherwise, appenddebugQuery=on to your URL and you can
see exactly how SOLR is parsing your query. If none of that helps, what is
your field definition in your schema.xml?

- Ken



The field definition is:

field name=mykeywords type=string indexed=true stored=true 
multiValued=true/


The request:

select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on

The response looks like this:

 responseHeader:{
  status:0,
  QTime:8,
  params:{
wt:json,
q:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),

start:0,
indent:true,
fl:mykeywords,
debugQuery:on}},
 response:{numFound:6,start:0,docs:[
{
 mykeywords:[Compliance With Attorney Conduct Standards]},
{
 mykeywords:[Anti-Bribery,Bribes]},
{
 mykeywords:[Marketing Guidelines,Marketing]},
{},
{
 mykeywords:[Anti-Bribery,Due Diligence]},
{
 mykeywords:[Anti-Bribery,AntiBribery]}]
 },
 debug:{
  rawquerystring:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  querystring:(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  parsedquery:(mykeywords:Compliance text:attorney text:conduct 
text:standard) mykeywords:All mykeywords:ALL,
  parsedquery_toString:(mykeywords:Compliance text:attorney 
text:conduct text:standard) mykeywords:All mykeywords:ALL,

  explain:{
...

As you mentioned, looking at the parsed query its breaking the request 
up on word boundaries rather than on the entire phrase.  The goal is to 
return only the very first entry.  Any ideas?



Thanks - Tod


Re: Phrase Query Problem?

2010-11-02 Thread Erick Erickson
That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick

On Tue, Nov 2, 2010 at 5:25 AM, Tod listac...@gmail.com wrote:

 On 11/1/2010 11:14 PM, Ken Stanley wrote:

 On Mon, Nov 1, 2010 at 10:26 PM, Todlistac...@gmail.com  wrote:

  I have a number of fields I need to do an exact match on.  I've defined
 them as 'string' in my schema.xml.  I've noticed that I get back query
 results that don't have all of the words I'm using to search with.

 For example:



 q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

 Should, with an exact match, return only one entry but it returns five
 some
 of which don't have any of the fields I've specified.  I've tried this
 both
 with and without quotes.

 What could I be doing wrong?


 Thanks - Tod



 Tod,

 Without knowing your exact field definition, my first guess would be your
 first boolean query; because it is not quoted, what SOLR typically does is
 to transform that type of query into something like (assuming your
 uniqueKey
 is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you
 do
 (mykeywords:Compliance+With+Conduct+Standards) you might see different
 (better?) results. Otherwise, appenddebugQuery=on to your URL and you can
 see exactly how SOLR is parsing your query. If none of that helps, what is
 your field definition in your schema.xml?

 - Ken


 The field definition is:

 field name=mykeywords type=string indexed=true stored=true
 multiValued=true/

 The request:


 select?q=(((mykeywords:Compliance+With+Attorney+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))fl=mykeywordsstart=0indent=truewt=jsondebugQuery=on

 The response looks like this:

  responseHeader:{
  status:0,
  QTime:8,
  params:{
wt:json,
q:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
start:0,
indent:true,
fl:mykeywords,
debugQuery:on}},
  response:{numFound:6,start:0,docs:[
{
 mykeywords:[Compliance With Attorney Conduct Standards]},
{
 mykeywords:[Anti-Bribery,Bribes]},
{
 mykeywords:[Marketing Guidelines,Marketing]},
{},
{
 mykeywords:[Anti-Bribery,Due Diligence]},
{
 mykeywords:[Anti-Bribery,AntiBribery]}]
  },
  debug:{
  rawquerystring:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  querystring:(((mykeywords:Compliance With Attorney Conduct
 Standards)OR(mykeywords:All)OR(mykeywords:ALL))),
  parsedquery:(mykeywords:Compliance text:attorney text:conduct
 text:standard) mykeywords:All mykeywords:ALL,
  parsedquery_toString:(mykeywords:Compliance text:attorney text:conduct
 text:standard) mykeywords:All mykeywords:ALL,
  explain:{
 ...

 As you mentioned, looking at the parsed query its breaking the request up
 on word boundaries rather than on the entire phrase.  The goal is to return
 only the very first entry.  Any ideas?


 Thanks - Tod



Re: Phrase Query Problem?

2010-11-02 Thread Ken Stanley
On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote:

 That's not the response I get when I try your query, so I suspect
 something's not quite right with your test...

 But you could also try putting parentheses around the words, like
 mykeywords:(Compliance+With+Conduct+Standards)

 Best
 Erick


I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken


Re: Phrase Query Problem?

2010-11-02 Thread Tod

On 11/2/2010 9:21 AM, Ken Stanley wrote:

On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote:


That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick



I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken



What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work.  Once I removed the quotes and escaped spaces it 
worked as expected.  This seems odd since I would have expected the 
quotes to have triggered a phrase query.


Thanks for your help.

- Tod


Re: Phrase Query Problem?

2010-11-02 Thread Jonathan Rochkind
Indeed something doesn't seem right about that, quotes are for phrases, 
you are right, and I get confused even thinking about what happens when 
you try to escape spaces like that.


I think there's something odd going on with your URI-escaping in 
general. Here's what the string should actually look like for  
mykeywords:Compliance With Conduct Standards , when put into a URI:


mykeywords%3A%22Compliance+With+Conduct+Standards%22

You really ought to escape the colon and the double quotes too, to 
follow URI spec. If you weren't escaping the double-quotes, that could 
explain your issue.  And I seriously don't understand what putting a 
backslash in the URI accomplishes in this case, it confuses me trying to 
understand what's going on there, and personally I never like it when i 
just try random things until something I don't understand works.



Tod wrote:

On 11/2/2010 9:21 AM, Ken Stanley wrote:
  

On Tue, Nov 2, 2010 at 8:19 AM, Erick Ericksonerickerick...@gmail.comwrote:



That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick


  

I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:Bank of America OR
df_text_company_name:Bank of America, the following is shown as my
query/parsed query when debugQuery is on:

str name=rawquerystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=querystring
df_text_exact_company_name:Bank of America OR df_text_company_name:Bank
of America
/str
str name=parsedquery
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:bank ? america)
/str
str name=parsedquery_toString
df_text_exact_company_name:Bank of America df_text_company_name:bank ?
america
/str

The difference is subtle, but important. If I were to do
df_text_company_name:Bank and America, I would still match Bank of
America. These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken




What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work.  Once I removed the quotes and escaped spaces it 
worked as expected.  This seems odd since I would have expected the 
quotes to have triggered a phrase query.


Thanks for your help.

- Tod
  


Phrase Query Problem?

2010-11-01 Thread Tod
I have a number of fields I need to do an exact match on.  I've defined 
them as 'string' in my schema.xml.  I've noticed that I get back query 
results that don't have all of the words I'm using to search with.


For example:

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

Should, with an exact match, return only one entry but it returns five 
some of which don't have any of the fields I've specified.  I've tried 
this both with and without quotes.


What could I be doing wrong?


Thanks - Tod



Re: Phrase Query Problem?

2010-11-01 Thread Ken Stanley
On Mon, Nov 1, 2010 at 10:26 PM, Tod listac...@gmail.com wrote:

 I have a number of fields I need to do an exact match on.  I've defined
 them as 'string' in my schema.xml.  I've noticed that I get back query
 results that don't have all of the words I'm using to search with.

 For example:


 q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))start=0indent=truewt=json

 Should, with an exact match, return only one entry but it returns five some
 of which don't have any of the fields I've specified.  I've tried this both
 with and without quotes.

 What could I be doing wrong?


 Thanks - Tod



Tod,

Without knowing your exact field definition, my first guess would be your
first boolean query; because it is not quoted, what SOLR typically does is
to transform that type of query into something like (assuming your uniqueKey
is id): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do
(mykeywords:Compliance+With+Conduct+Standards) you might see different
(better?) results. Otherwise, append debugQuery=on to your URL and you can
see exactly how SOLR is parsing your query. If none of that helps, what is
your field definition in your schema.xml?

- Ken


phrase query problem .. how to?

2007-02-04 Thread rubdabadub

Hi

Suppose you have a field name with data - Sony CLT2134 handheld
camera. When doing a phrase search like Sony Camera or sony
handheld -- Solr returns 0 results. Often time our searchers doesn't
know the model number but perform phrase search.. How do I solve this
issue?

Regards


Re: phrase query problem .. how to?

2007-02-04 Thread Yonik Seeley

On 2/4/07, rubdabadub [EMAIL PROTECTED] wrote:

Suppose you have a field name with data - Sony CLT2134 handheld
camera. When doing a phrase search like Sony Camera or sony
handheld -- Solr returns 0 results. Often time our searchers doesn't
know the model number but perform phrase search.. How do I solve this
issue?


If you are controlling the query structure you could
- use a sloppy phrase query... sony handheld~10
- use the dismax handler to create a different query structure
- don't use a phrase query at all... change the default operator to
and (q.op=AND)
 to require both terms.

-Yonik