subject:"Re\: Partial Match with DF"

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Thank you for the heads up! I think in some cases we will want to strip out
punctuation but in others we might need it (for example, "liquid courage."
should tokenize to "liquid" and "courage", while "1.5 oz liquid courage"
should tokenize to "1.5", "oz", "liquid" and "courage").

I'll have to do some experimenting to see which one will work best for us.

On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson 
wrote:

> Yeah, they've saved me on numerous occasions, glad to see they helped.
>
> One caution BTW when you start changing fieldTypes is you have to
> watch punctuation. StandardTokenizerFactory won't pass through most
> punctuation.
>
> WordDelimiterFilterFactory breaks on non alpha-num, including
> punctuation effectively throwing it out.
>
> But WhitespaceTokenizer does just that and spits out punctuation as
> part of tokens, i.e.
> "my words." (note period) is broken up as "my" "words." and wouldn't
> match a search on "word".
>
> One other note, there's a tokenizer/filter for a zillion different
> cases, you can go wild. Here's a partial
> list:https://cwiki.apache.org/confluence/display/solr/
> Understanding+Analyzers%2C+Tokenizers%2C+and+Filters,
> see the "Tokenizer", "Filters" and CharFilters" links. There are 12
> tokenizers listed and 40 or so filters... and the list is not
> guaranteed to be complete.
>
> On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson
>  wrote:
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> My guess: Your analysis chain for the fields is different, i.e. they
> >> have a different fieldType. In particular, watch out for the "string"
> >> type, people are often confused about it. It does _not_ break input
> >> into tokens, you need a text-based field type, text_en is one example
> >> that is usually in the configs by default.
> >>
> >> Two tools that'll help you enormously:
> >>
> >> admin UI>>select core (or collection) from the drop-down>>analysis
> >> That shows you exactly how Solr/Lucene break up text at query and index
> >> time
> >>
> >> add =query to the URL. That'll show you how the query was parsed.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> >>  wrote:
> >> > Oh, great! Thank you!
> >> >
> >> > So if I switch over to eDisMax I'd specify the fields to query via the
> >> "qf"
> >> > parameter, right? That seems to have the same result (only matches
> when I
> >> > specify the exact phrase in the field, not just certain words from
> it).
> >> >
> >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> >> df is default field - you can only give one. To search over multiple
> >> >> fields, you switch to eDisMax query parser and fl parameter.
> >> >>
> >> >> Then, the question will be what type definition your fields have.
> When
> >> you
> >> >> search text field, you are using its definition because of copyField.
> >> Your
> >> >> original fields may be strings.
> >> >>
> >> >> Remember to reload core and reminded when you change definitions.
> >> >>
> >> >> Regards,
> >> >>Alex
> >> >>
> >> >>
> >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com>
> >> >> wrote:
> >> >>
> >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but I
> >> >> can't
> >> >> > seem to find an explanation for the behavior I'm seeing.
> >> >> >
> >> >> > If I have a document that looks like this:
> >> >> > {
> >> >> > field1: "aaa bbb",
> >> >> > field2: "ccc ddd",
> >> >> > field3: "eee fff"
> >> >> > }
> >> >> >
> >> >> > And I do a search where "q" is "aaa ccc", I get the document in the
> >> >> > results. This is because (please correct me if I'm wrong) the
> default
> >> >> "df"
> >> >> > is set to the "_text_" field, which contains the text values from
> all
> >> >> > fields.
> >> >> >
> >> >> > However, if I do a search where "df" is "field1" and "field2" and
> "q"
> >> is
> >> >> > "aaa ccc" (words from field1 and field2) I get no results.
> >> >> >
> >> >> > In a simpler example, if I do a search where "df" is "field1" and
> "q"
> >> is
> >> >> > "aaa" (a word from field1) I still get no results.
> >> >> >
> >> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the
> full
> >> >> > value of field1) then I get the document in the results.
> >> >> >
> >> >> > So I'm concluding that when using "df" to specify which fields to
> >> search
> >> >> > then only an exact match on the full field value will return a
> >>

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Wow, that's really powerful! Thank you!

On Thu, Mar 16, 2017 at 11:19 AM, Charlie Hull  wrote:

> Hi Mark,
>
> Open Source Connection's excellent www.splainer.io might also be useful to
> help you break down exactly what your query is doing.
>
> Cheers
>
> Charlie
>
> P.S. planning a blog soon listing 'useful Solr tools'
>
> On 16 March 2017 at 14:39, Mark Johnson 
> wrote:
>
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > My guess: Your analysis chain for the fields is different, i.e. they
> > > have a different fieldType. In particular, watch out for the "string"
> > > type, people are often confused about it. It does _not_ break input
> > > into tokens, you need a text-based field type, text_en is one example
> > > that is usually in the configs by default.
> > >
> > > Two tools that'll help you enormously:
> > >
> > > admin UI>>select core (or collection) from the drop-down>>analysis
> > > That shows you exactly how Solr/Lucene break up text at query and index
> > > time
> > >
> > > add =query to the URL. That'll show you how the query was parsed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> > >  wrote:
> > > > Oh, great! Thank you!
> > > >
> > > > So if I switch over to eDisMax I'd specify the fields to query via
> the
> > > "qf"
> > > > parameter, right? That seems to have the same result (only matches
> > when I
> > > > specify the exact phrase in the field, not just certain words from
> it).
> > > >
> > > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> df is default field - you can only give one. To search over multiple
> > > >> fields, you switch to eDisMax query parser and fl parameter.
> > > >>
> > > >> Then, the question will be what type definition your fields have.
> When
> > > you
> > > >> search text field, you are using its definition because of
> copyField.
> > > Your
> > > >> original fields may be strings.
> > > >>
> > > >> Remember to reload core and reminded when you change definitions.
> > > >>
> > > >> Regards,
> > > >>Alex
> > > >>
> > > >>
> > > >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but
> > I
> > > >> can't
> > > >> > seem to find an explanation for the behavior I'm seeing.
> > > >> >
> > > >> > If I have a document that looks like this:
> > > >> > {
> > > >> > field1: "aaa bbb",
> > > >> > field2: "ccc ddd",
> > > >> > field3: "eee fff"
> > > >> > }
> > > >> >
> > > >> > And I do a search where "q" is "aaa ccc", I get the document in
> the
> > > >> > results. This is because (please correct me if I'm wrong) the
> > default
> > > >> "df"
> > > >> > is set to the "_text_" field, which contains the text values from
> > all
> > > >> > fields.
> > > >> >
> > > >> > However, if I do a search where "df" is "field1" and "field2" and
> > "q"
> > > is
> > > >> > "aaa ccc" (words from field1 and field2) I get no results.
> > > >> >
> > > >> > In a simpler example, if I do a search where "df" is "field1" and
> > "q"
> > > is
> > > >> > "aaa" (a word from field1) I still get no results.
> > > >> >
> > > >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the
> > full
> > > >> > value of field1) then I get the document in the results.
> > > >> >
> > > >> > So I'm concluding that when using "df" to specify which fields to
> > > search
> > > >> > then only an exact match on the full field value will return a
> > > document.
> > > >> >
> > > >> > Is that a correct conclusion? Is there another way to specify
> which
> > > >> fields
> > > >> > to search without requiring an exact match? The results I'd like
> to
> > > >> achieve
> > > >> > are:
> > > >> >
> > > >> > Would Match:
> > > >> > q=aaa
> > > >> > q=aaa bbb
> > > >> > q=aaa ccc
> > > >> > q=aaa fff
> > > >> >
> > > >> > Would Not Match:
> > > >> > q=eee
> > > >> > q=fff
> > > >> > q=eee fff
> > > >> >
> > > >> > --
> > > >> > *This message is intended only for the use of the individual or
> > > entity to
> > > >> > which it is addressed and may contain information that is
> > privileged,
> > > >> > confidential and exempt from disclosure under applicable law. If
> you
> > > have
> > > >> > received this message in error, you are hereby notified that any
> > use,
> > > >> > dissemination, distribution or copying of this message is
> > prohibited.
> > > If
>

Re: Partial Match with DF

2017-03-16 Thread Charlie Hull

Hi Mark,

Open Source Connection's excellent www.splainer.io might also be useful to
help you break down exactly what your query is doing.

Cheers

Charlie

P.S. planning a blog soon listing 'useful Solr tools'

On 16 March 2017 at 14:39, Mark Johnson 
wrote:

> You're right! The fields I'm searching are all "string" type. I switched to
> "text_en" and now it's working exactly as I need it to! I'll do some
> research to see if "text_en" or another "text" type field is best for our
> needs.
>
> Also, those debug options are amazing! They'll help tremendously in the
> future.
>
> Thank you much!
>
> On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson 
> wrote:
>
> > My guess: Your analysis chain for the fields is different, i.e. they
> > have a different fieldType. In particular, watch out for the "string"
> > type, people are often confused about it. It does _not_ break input
> > into tokens, you need a text-based field type, text_en is one example
> > that is usually in the configs by default.
> >
> > Two tools that'll help you enormously:
> >
> > admin UI>>select core (or collection) from the drop-down>>analysis
> > That shows you exactly how Solr/Lucene break up text at query and index
> > time
> >
> > add =query to the URL. That'll show you how the query was parsed.
> >
> > Best,
> > Erick
> >
> > On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> >  wrote:
> > > Oh, great! Thank you!
> > >
> > > So if I switch over to eDisMax I'd specify the fields to query via the
> > "qf"
> > > parameter, right? That seems to have the same result (only matches
> when I
> > > specify the exact phrase in the field, not just certain words from it).
> > >
> > > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > wrote:
> > >
> > >> df is default field - you can only give one. To search over multiple
> > >> fields, you switch to eDisMax query parser and fl parameter.
> > >>
> > >> Then, the question will be what type definition your fields have. When
> > you
> > >> search text field, you are using its definition because of copyField.
> > Your
> > >> original fields may be strings.
> > >>
> > >> Remember to reload core and reminded when you change definitions.
> > >>
> > >> Regards,
> > >>Alex
> > >>
> > >>
> > >> On 16 Mar 2017 9:15 AM, "Mark Johnson"  >
> > >> wrote:
> > >>
> > >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but
> I
> > >> can't
> > >> > seem to find an explanation for the behavior I'm seeing.
> > >> >
> > >> > If I have a document that looks like this:
> > >> > {
> > >> > field1: "aaa bbb",
> > >> > field2: "ccc ddd",
> > >> > field3: "eee fff"
> > >> > }
> > >> >
> > >> > And I do a search where "q" is "aaa ccc", I get the document in the
> > >> > results. This is because (please correct me if I'm wrong) the
> default
> > >> "df"
> > >> > is set to the "_text_" field, which contains the text values from
> all
> > >> > fields.
> > >> >
> > >> > However, if I do a search where "df" is "field1" and "field2" and
> "q"
> > is
> > >> > "aaa ccc" (words from field1 and field2) I get no results.
> > >> >
> > >> > In a simpler example, if I do a search where "df" is "field1" and
> "q"
> > is
> > >> > "aaa" (a word from field1) I still get no results.
> > >> >
> > >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the
> full
> > >> > value of field1) then I get the document in the results.
> > >> >
> > >> > So I'm concluding that when using "df" to specify which fields to
> > search
> > >> > then only an exact match on the full field value will return a
> > document.
> > >> >
> > >> > Is that a correct conclusion? Is there another way to specify which
> > >> fields
> > >> > to search without requiring an exact match? The results I'd like to
> > >> achieve
> > >> > are:
> > >> >
> > >> > Would Match:
> > >> > q=aaa
> > >> > q=aaa bbb
> > >> > q=aaa ccc
> > >> > q=aaa fff
> > >> >
> > >> > Would Not Match:
> > >> > q=eee
> > >> > q=fff
> > >> > q=eee fff
> > >> >
> > >> > --
> > >> > *This message is intended only for the use of the individual or
> > entity to
> > >> > which it is addressed and may contain information that is
> privileged,
> > >> > confidential and exempt from disclosure under applicable law. If you
> > have
> > >> > received this message in error, you are hereby notified that any
> use,
> > >> > dissemination, distribution or copying of this message is
> prohibited.
> > If
> > >> > you have received this communication in error, please notify the
> > sender
> > >> > immediately and destroy the transmitted information.*
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards,
> > >
> > > *Mark Johnson* | .NET Software Engineer
> > >
> > > Office: 603-392-7017
> > >
> > > Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> > 03101
> > >
> > >

Re: Partial Match with DF

2017-03-16 Thread Erick Erickson

Yeah, they've saved me on numerous occasions, glad to see they helped.

One caution BTW when you start changing fieldTypes is you have to
watch punctuation. StandardTokenizerFactory won't pass through most
punctuation.

WordDelimiterFilterFactory breaks on non alpha-num, including
punctuation effectively throwing it out.

But WhitespaceTokenizer does just that and spits out punctuation as
part of tokens, i.e.
"my words." (note period) is broken up as "my" "words." and wouldn't
match a search on "word".

One other note, there's a tokenizer/filter for a zillion different
cases, you can go wild. Here's a partial
list:https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters,
see the "Tokenizer", "Filters" and CharFilters" links. There are 12
tokenizers listed and 40 or so filters... and the list is not
guaranteed to be complete.

On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson
 wrote:
> You're right! The fields I'm searching are all "string" type. I switched to
> "text_en" and now it's working exactly as I need it to! I'll do some
> research to see if "text_en" or another "text" type field is best for our
> needs.
>
> Also, those debug options are amazing! They'll help tremendously in the
> future.
>
> Thank you much!
>
> On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson 
> wrote:
>
>> My guess: Your analysis chain for the fields is different, i.e. they
>> have a different fieldType. In particular, watch out for the "string"
>> type, people are often confused about it. It does _not_ break input
>> into tokens, you need a text-based field type, text_en is one example
>> that is usually in the configs by default.
>>
>> Two tools that'll help you enormously:
>>
>> admin UI>>select core (or collection) from the drop-down>>analysis
>> That shows you exactly how Solr/Lucene break up text at query and index
>> time
>>
>> add =query to the URL. That'll show you how the query was parsed.
>>
>> Best,
>> Erick
>>
>> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
>>  wrote:
>> > Oh, great! Thank you!
>> >
>> > So if I switch over to eDisMax I'd specify the fields to query via the
>> "qf"
>> > parameter, right? That seems to have the same result (only matches when I
>> > specify the exact phrase in the field, not just certain words from it).
>> >
>> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> df is default field - you can only give one. To search over multiple
>> >> fields, you switch to eDisMax query parser and fl parameter.
>> >>
>> >> Then, the question will be what type definition your fields have. When
>> you
>> >> search text field, you are using its definition because of copyField.
>> Your
>> >> original fields may be strings.
>> >>
>> >> Remember to reload core and reminded when you change definitions.
>> >>
>> >> Regards,
>> >>Alex
>> >>
>> >>
>> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" 
>> >> wrote:
>> >>
>> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
>> >> can't
>> >> > seem to find an explanation for the behavior I'm seeing.
>> >> >
>> >> > If I have a document that looks like this:
>> >> > {
>> >> > field1: "aaa bbb",
>> >> > field2: "ccc ddd",
>> >> > field3: "eee fff"
>> >> > }
>> >> >
>> >> > And I do a search where "q" is "aaa ccc", I get the document in the
>> >> > results. This is because (please correct me if I'm wrong) the default
>> >> "df"
>> >> > is set to the "_text_" field, which contains the text values from all
>> >> > fields.
>> >> >
>> >> > However, if I do a search where "df" is "field1" and "field2" and "q"
>> is
>> >> > "aaa ccc" (words from field1 and field2) I get no results.
>> >> >
>> >> > In a simpler example, if I do a search where "df" is "field1" and "q"
>> is
>> >> > "aaa" (a word from field1) I still get no results.
>> >> >
>> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
>> >> > value of field1) then I get the document in the results.
>> >> >
>> >> > So I'm concluding that when using "df" to specify which fields to
>> search
>> >> > then only an exact match on the full field value will return a
>> document.
>> >> >
>> >> > Is that a correct conclusion? Is there another way to specify which
>> >> fields
>> >> > to search without requiring an exact match? The results I'd like to
>> >> achieve
>> >> > are:
>> >> >
>> >> > Would Match:
>> >> > q=aaa
>> >> > q=aaa bbb
>> >> > q=aaa ccc
>> >> > q=aaa fff
>> >> >
>> >> > Would Not Match:
>> >> > q=eee
>> >> > q=fff
>> >> > q=eee fff
>> >> >
>> >> > --
>> >> > *This message is intended only for the use of the individual or
>> entity to
>> >> > which it is addressed and may contain information that is privileged,
>> >> > confidential and exempt from disclosure under applicable law. If you
>> have
>> >> > received this message in error, you are

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

You're right! The fields I'm searching are all "string" type. I switched to
"text_en" and now it's working exactly as I need it to! I'll do some
research to see if "text_en" or another "text" type field is best for our
needs.

Also, those debug options are amazing! They'll help tremendously in the
future.

Thank you much!

On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson 
wrote:

> My guess: Your analysis chain for the fields is different, i.e. they
> have a different fieldType. In particular, watch out for the "string"
> type, people are often confused about it. It does _not_ break input
> into tokens, you need a text-based field type, text_en is one example
> that is usually in the configs by default.
>
> Two tools that'll help you enormously:
>
> admin UI>>select core (or collection) from the drop-down>>analysis
> That shows you exactly how Solr/Lucene break up text at query and index
> time
>
> add =query to the URL. That'll show you how the query was parsed.
>
> Best,
> Erick
>
> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
>  wrote:
> > Oh, great! Thank you!
> >
> > So if I switch over to eDisMax I'd specify the fields to query via the
> "qf"
> > parameter, right? That seems to have the same result (only matches when I
> > specify the exact phrase in the field, not just certain words from it).
> >
> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> df is default field - you can only give one. To search over multiple
> >> fields, you switch to eDisMax query parser and fl parameter.
> >>
> >> Then, the question will be what type definition your fields have. When
> you
> >> search text field, you are using its definition because of copyField.
> Your
> >> original fields may be strings.
> >>
> >> Remember to reload core and reminded when you change definitions.
> >>
> >> Regards,
> >>Alex
> >>
> >>
> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" 
> >> wrote:
> >>
> >> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> >> can't
> >> > seem to find an explanation for the behavior I'm seeing.
> >> >
> >> > If I have a document that looks like this:
> >> > {
> >> > field1: "aaa bbb",
> >> > field2: "ccc ddd",
> >> > field3: "eee fff"
> >> > }
> >> >
> >> > And I do a search where "q" is "aaa ccc", I get the document in the
> >> > results. This is because (please correct me if I'm wrong) the default
> >> "df"
> >> > is set to the "_text_" field, which contains the text values from all
> >> > fields.
> >> >
> >> > However, if I do a search where "df" is "field1" and "field2" and "q"
> is
> >> > "aaa ccc" (words from field1 and field2) I get no results.
> >> >
> >> > In a simpler example, if I do a search where "df" is "field1" and "q"
> is
> >> > "aaa" (a word from field1) I still get no results.
> >> >
> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> >> > value of field1) then I get the document in the results.
> >> >
> >> > So I'm concluding that when using "df" to specify which fields to
> search
> >> > then only an exact match on the full field value will return a
> document.
> >> >
> >> > Is that a correct conclusion? Is there another way to specify which
> >> fields
> >> > to search without requiring an exact match? The results I'd like to
> >> achieve
> >> > are:
> >> >
> >> > Would Match:
> >> > q=aaa
> >> > q=aaa bbb
> >> > q=aaa ccc
> >> > q=aaa fff
> >> >
> >> > Would Not Match:
> >> > q=eee
> >> > q=fff
> >> > q=eee fff
> >> >
> >> > --
> >> > *This message is intended only for the use of the individual or
> entity to
> >> > which it is addressed and may contain information that is privileged,
> >> > confidential and exempt from disclosure under applicable law. If you
> have
> >> > received this message in error, you are hereby notified that any use,
> >> > dissemination, distribution or copying of this message is prohibited.
> If
> >> > you have received this communication in error, please notify the
> sender
> >> > immediately and destroy the transmitted information.*
> >> >
> >>
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > *Mark Johnson* | .NET Software Engineer
> >
> > Office: 603-392-7017
> >
> > Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> 03101
> >
> >   
> >
> > *Supporting The Practice Of Healthy Living*
> >
> > 
> > 
> > 
> > 
> > 
> > 
> >  Ecologics-EI_IE388367.11,28.htm>
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed

Re: Partial Match with DF

2017-03-16 Thread Erick Erickson

My guess: Your analysis chain for the fields is different, i.e. they
have a different fieldType. In particular, watch out for the "string"
type, people are often confused about it. It does _not_ break input
into tokens, you need a text-based field type, text_en is one example
that is usually in the configs by default.

Two tools that'll help you enormously:

admin UI>>select core (or collection) from the drop-down>>analysis
That shows you exactly how Solr/Lucene break up text at query and index time

add =query to the URL. That'll show you how the query was parsed.

Best,
Erick

On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
 wrote:
> Oh, great! Thank you!
>
> So if I switch over to eDisMax I'd specify the fields to query via the "qf"
> parameter, right? That seems to have the same result (only matches when I
> specify the exact phrase in the field, not just certain words from it).
>
> On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch 
> wrote:
>
>> df is default field - you can only give one. To search over multiple
>> fields, you switch to eDisMax query parser and fl parameter.
>>
>> Then, the question will be what type definition your fields have. When you
>> search text field, you are using its definition because of copyField. Your
>> original fields may be strings.
>>
>> Remember to reload core and reminded when you change definitions.
>>
>> Regards,
>>Alex
>>
>>
>> On 16 Mar 2017 9:15 AM, "Mark Johnson" 
>> wrote:
>>
>> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
>> can't
>> > seem to find an explanation for the behavior I'm seeing.
>> >
>> > If I have a document that looks like this:
>> > {
>> > field1: "aaa bbb",
>> > field2: "ccc ddd",
>> > field3: "eee fff"
>> > }
>> >
>> > And I do a search where "q" is "aaa ccc", I get the document in the
>> > results. This is because (please correct me if I'm wrong) the default
>> "df"
>> > is set to the "_text_" field, which contains the text values from all
>> > fields.
>> >
>> > However, if I do a search where "df" is "field1" and "field2" and "q" is
>> > "aaa ccc" (words from field1 and field2) I get no results.
>> >
>> > In a simpler example, if I do a search where "df" is "field1" and "q" is
>> > "aaa" (a word from field1) I still get no results.
>> >
>> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
>> > value of field1) then I get the document in the results.
>> >
>> > So I'm concluding that when using "df" to specify which fields to search
>> > then only an exact match on the full field value will return a document.
>> >
>> > Is that a correct conclusion? Is there another way to specify which
>> fields
>> > to search without requiring an exact match? The results I'd like to
>> achieve
>> > are:
>> >
>> > Would Match:
>> > q=aaa
>> > q=aaa bbb
>> > q=aaa ccc
>> > q=aaa fff
>> >
>> > Would Not Match:
>> > q=eee
>> > q=fff
>> > q=eee fff
>> >
>> > --
>> > *This message is intended only for the use of the individual or entity to
>> > which it is addressed and may contain information that is privileged,
>> > confidential and exempt from disclosure under applicable law. If you have
>> > received this message in error, you are hereby notified that any use,
>> > dissemination, distribution or copying of this message is prohibited. If
>> > you have received this communication in error, please notify the sender
>> > immediately and destroy the transmitted information.*
>> >
>>
>
>
>
> --
>
> Best Regards,
>
> *Mark Johnson* | .NET Software Engineer
>
> Office: 603-392-7017
>
> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101
>
>   
>
> *Supporting The Practice Of Healthy Living*
>
> 
> 
> 
> 
> 
> 
> 
>
> --
> *This message is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged,
> confidential and exempt from disclosure under applicable law. If you have
> received this message in error, you are hereby notified that any use,
> dissemination, distribution or copying of this message is prohibited. If
> you have received this communication in error, please notify the sender
> immediately and destroy the transmitted information.*

Re: Partial Match with DF

2017-03-16 Thread Mark Johnson

Oh, great! Thank you!

So if I switch over to eDisMax I'd specify the fields to query via the "qf"
parameter, right? That seems to have the same result (only matches when I
specify the exact phrase in the field, not just certain words from it).

On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch 
wrote:

> df is default field - you can only give one. To search over multiple
> fields, you switch to eDisMax query parser and fl parameter.
>
> Then, the question will be what type definition your fields have. When you
> search text field, you are using its definition because of copyField. Your
> original fields may be strings.
>
> Remember to reload core and reminded when you change definitions.
>
> Regards,
>Alex
>
>
> On 16 Mar 2017 9:15 AM, "Mark Johnson" 
> wrote:
>
> > Forgive me if I'm missing something obvious -- I'm new to Solr, but I
> can't
> > seem to find an explanation for the behavior I'm seeing.
> >
> > If I have a document that looks like this:
> > {
> > field1: "aaa bbb",
> > field2: "ccc ddd",
> > field3: "eee fff"
> > }
> >
> > And I do a search where "q" is "aaa ccc", I get the document in the
> > results. This is because (please correct me if I'm wrong) the default
> "df"
> > is set to the "_text_" field, which contains the text values from all
> > fields.
> >
> > However, if I do a search where "df" is "field1" and "field2" and "q" is
> > "aaa ccc" (words from field1 and field2) I get no results.
> >
> > In a simpler example, if I do a search where "df" is "field1" and "q" is
> > "aaa" (a word from field1) I still get no results.
> >
> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> > value of field1) then I get the document in the results.
> >
> > So I'm concluding that when using "df" to specify which fields to search
> > then only an exact match on the full field value will return a document.
> >
> > Is that a correct conclusion? Is there another way to specify which
> fields
> > to search without requiring an exact match? The results I'd like to
> achieve
> > are:
> >
> > Would Match:
> > q=aaa
> > q=aaa bbb
> > q=aaa ccc
> > q=aaa fff
> >
> > Would Not Match:
> > q=eee
> > q=fff
> > q=eee fff
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
> >
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

  

*Supporting The Practice Of Healthy Living*









-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Re: Partial Match with DF

2017-03-16 Thread Alexandre Rafalovitch

df is default field - you can only give one. To search over multiple
fields, you switch to eDisMax query parser and fl parameter.

Then, the question will be what type definition your fields have. When you
search text field, you are using its definition because of copyField. Your
original fields may be strings.

Remember to reload core and reminded when you change definitions.

Regards,
   Alex


On 16 Mar 2017 9:15 AM, "Mark Johnson" 
wrote:

> Forgive me if I'm missing something obvious -- I'm new to Solr, but I can't
> seem to find an explanation for the behavior I'm seeing.
>
> If I have a document that looks like this:
> {
> field1: "aaa bbb",
> field2: "ccc ddd",
> field3: "eee fff"
> }
>
> And I do a search where "q" is "aaa ccc", I get the document in the
> results. This is because (please correct me if I'm wrong) the default "df"
> is set to the "_text_" field, which contains the text values from all
> fields.
>
> However, if I do a search where "df" is "field1" and "field2" and "q" is
> "aaa ccc" (words from field1 and field2) I get no results.
>
> In a simpler example, if I do a search where "df" is "field1" and "q" is
> "aaa" (a word from field1) I still get no results.
>
> If I do a search where "df" is "field1" and "q" is "aaa bbb" (the full
> value of field1) then I get the document in the results.
>
> So I'm concluding that when using "df" to specify which fields to search
> then only an exact match on the full field value will return a document.
>
> Is that a correct conclusion? Is there another way to specify which fields
> to search without requiring an exact match? The results I'd like to achieve
> are:
>
> Would Match:
> q=aaa
> q=aaa bbb
> q=aaa ccc
> q=aaa fff
>
> Would Not Match:
> q=eee
> q=fff
> q=eee fff
>
> --
> *This message is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged,
> confidential and exempt from disclosure under applicable law. If you have
> received this message in error, you are hereby notified that any use,
> dissemination, distribution or copying of this message is prohibited. If
> you have received this communication in error, please notify the sender
> immediately and destroy the transmitted information.*
>

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

Re: Partial Match with DF

8 matches

Site Navigation

Mail list logo

Footer information