Thank you for the heads up! I think in some cases we will want to strip out
punctuation but in others we might need it (for example, "liquid courage."
should tokenize to "liquid" and "courage", while "1.5 oz liquid courage"
should tokenize to "1.5", "oz", "liquid" and "courage").

I'll have to do some experimenting to see which one will work best for us.

On Thu, Mar 16, 2017 at 11:09 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Yeah, they've saved me on numerous occasions, glad to see they helped.
>
> One caution BTW when you start changing fieldTypes is you have to
> watch punctuation. StandardTokenizerFactory won't pass through most
> punctuation.
>
> WordDelimiterFilterFactory breaks on non alpha-num, including
> punctuation effectively throwing it out.
>
> But WhitespaceTokenizer does just that and spits out punctuation as
> part of tokens, i.e.
> "my words." (note period) is broken up as "my" "words." and wouldn't
> match a search on "word".
>
> One other note, there's a tokenizer/filter for a zillion different
> cases, you can go wild. Here's a partial
> list:https://cwiki.apache.org/confluence/display/solr/
> Understanding+Analyzers%2C+Tokenizers%2C+and+Filters,
> see the "Tokenizer", "Filters" and CharFilters" links. There are 12
> tokenizers listed and 40 or so filters... and the list is not
> guaranteed to be complete.
>
> On Thu, Mar 16, 2017 at 7:39 AM, Mark Johnson
> <mjohn...@emersonecologics.com> wrote:
> > You're right! The fields I'm searching are all "string" type. I switched
> to
> > "text_en" and now it's working exactly as I need it to! I'll do some
> > research to see if "text_en" or another "text" type field is best for our
> > needs.
> >
> > Also, those debug options are amazing! They'll help tremendously in the
> > future.
> >
> > Thank you much!
> >
> > On Thu, Mar 16, 2017 at 10:02 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> My guess: Your analysis chain for the fields is different, i.e. they
> >> have a different fieldType. In particular, watch out for the "string"
> >> type, people are often confused about it. It does _not_ break input
> >> into tokens, you need a text-based field type, text_en is one example
> >> that is usually in the configs by default.
> >>
> >> Two tools that'll help you enormously:
> >>
> >> admin UI>>select core (or collection) from the drop-down>>analysis
> >> That shows you exactly how Solr/Lucene break up text at query and index
> >> time
> >>
> >> add &debug=query to the URL. That'll show you how the query was parsed.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Mar 16, 2017 at 6:52 AM, Mark Johnson
> >> <mjohn...@emersonecologics.com> wrote:
> >> > Oh, great! Thank you!
> >> >
> >> > So if I switch over to eDisMax I'd specify the fields to query via the
> >> "qf"
> >> > parameter, right? That seems to have the same result (only matches
> when I
> >> > specify the exact phrase in the field, not just certain words from
> it).
> >> >
> >> > On Thu, Mar 16, 2017 at 9:33 AM, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> >> df is default field - you can only give one. To search over multiple
> >> >> fields, you switch to eDisMax query parser and fl parameter.
> >> >>
> >> >> Then, the question will be what type definition your fields have.
> When
> >> you
> >> >> search text field, you are using its definition because of copyField.
> >> Your
> >> >> original fields may be strings.
> >> >>
> >> >> Remember to reload core and reminded when you change definitions.
> >> >>
> >> >> Regards,
> >> >>    Alex
> >> >>
> >> >>
> >> >> On 16 Mar 2017 9:15 AM, "Mark Johnson" <
> mjohn...@emersonecologics.com>
> >> >> wrote:
> >> >>
> >> >> > Forgive me if I'm missing something obvious -- I'm new to Solr,
> but I
> >> >> can't
> >> >> > seem to find an explanation for the behavior I'm seeing.
> >> >> >
> >> >> > If I have a document that looks like this:
> >> >> > {
> >> >> >     field1: "aaa bbb",
> >> >> >     field2: "ccc ddd",
> >> >> >     field3: "eee fff"
> >> >> > }
> >> >> >
> >> >> > And I do a search where "q" is "aaa ccc", I get the document in the
> >> >> > results. This is because (please correct me if I'm wrong) the
> default
> >> >> "df"
> >> >> > is set to the "_text_" field, which contains the text values from
> all
> >> >> > fields.
> >> >> >
> >> >> > However, if I do a search where "df" is "field1" and "field2" and
> "q"
> >> is
> >> >> > "aaa ccc" (words from field1 and field2) I get no results.
> >> >> >
> >> >> > In a simpler example, if I do a search where "df" is "field1" and
> "q"
> >> is
> >> >> > "aaa" (a word from field1) I still get no results.
> >> >> >
> >> >> > If I do a search where "df" is "field1" and "q" is "aaa bbb" (the
> full
> >> >> > value of field1) then I get the document in the results.
> >> >> >
> >> >> > So I'm concluding that when using "df" to specify which fields to
> >> search
> >> >> > then only an exact match on the full field value will return a
> >> document.
> >> >> >
> >> >> > Is that a correct conclusion? Is there another way to specify which
> >> >> fields
> >> >> > to search without requiring an exact match? The results I'd like to
> >> >> achieve
> >> >> > are:
> >> >> >
> >> >> > Would Match:
> >> >> > q=aaa
> >> >> > q=aaa bbb
> >> >> > q=aaa ccc
> >> >> > q=aaa fff
> >> >> >
> >> >> > Would Not Match:
> >> >> > q=eee
> >> >> > q=fff
> >> >> > q=eee fff
> >> >> >
> >> >> > --
> >> >> > *This message is intended only for the use of the individual or
> >> entity to
> >> >> > which it is addressed and may contain information that is
> privileged,
> >> >> > confidential and exempt from disclosure under applicable law. If
> you
> >> have
> >> >> > received this message in error, you are hereby notified that any
> use,
> >> >> > dissemination, distribution or copying of this message is
> prohibited.
> >> If
> >> >> > you have received this communication in error, please notify the
> >> sender
> >> >> > immediately and destroy the transmitted information.*
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Best Regards,
> >> >
> >> > *Mark Johnson* | .NET Software Engineer
> >> >
> >> > Office: 603-392-7017
> >> >
> >> > Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> >> 03101
> >> >
> >> > <http://www.emersonecologics.com/>  <https://wellevate.me/#/>
> >> >
> >> > *Supporting The Practice Of Healthy Living*
> >> >
> >> > <http://blog.emersonecologics.com/>
> >> > <https://www.linkedin.com/company/emerson-ecologics>
> >> > <https://www.facebook.com/emersonecologics/>
> >> > <https://twitter.com/EmersonEcologic>
> >> > <https://www.instagram.com/emerson_ecologics/>
> >> > <https://www.pinterest.com/emersonecologic/>
> >> > <https://www.glassdoor.com/Overview/Working-at-Emerson-
> >> Ecologics-EI_IE388367.11,28.htm>
> >> >
> >> > --
> >> > *This message is intended only for the use of the individual or
> entity to
> >> > which it is addressed and may contain information that is privileged,
> >> > confidential and exempt from disclosure under applicable law. If you
> have
> >> > received this message in error, you are hereby notified that any use,
> >> > dissemination, distribution or copying of this message is prohibited.
> If
> >> > you have received this communication in error, please notify the
> sender
> >> > immediately and destroy the transmitted information.*
> >>
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > *Mark Johnson* | .NET Software Engineer
> >
> > Office: 603-392-7017
> >
> > Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> 03101
> >
> > <http://www.emersonecologics.com/>  <https://wellevate.me/#/>
> >
> > *Supporting The Practice Of Healthy Living*
> >
> > <http://blog.emersonecologics.com/>
> > <https://www.linkedin.com/company/emerson-ecologics>
> > <https://www.facebook.com/emersonecologics/>
> > <https://twitter.com/EmersonEcologic>
> > <https://www.instagram.com/emerson_ecologics/>
> > <https://www.pinterest.com/emersonecologic/>
> > <https://www.glassdoor.com/Overview/Working-at-Emerson-
> Ecologics-EI_IE388367.11,28.htm>
> >
> > --
> > *This message is intended only for the use of the individual or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential and exempt from disclosure under applicable law. If you have
> > received this message in error, you are hereby notified that any use,
> > dissemination, distribution or copying of this message is prohibited. If
> > you have received this communication in error, please notify the sender
> > immediately and destroy the transmitted information.*
>



-- 

Best Regards,

*Mark Johnson* | .NET Software Engineer

Office: 603-392-7017

Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | 03101

<http://www.emersonecologics.com/>  <https://wellevate.me/#/>

*Supporting The Practice Of Healthy Living*

<http://blog.emersonecologics.com/>
<https://www.linkedin.com/company/emerson-ecologics>
<https://www.facebook.com/emersonecologics/>
<https://twitter.com/EmersonEcologic>
<https://www.instagram.com/emerson_ecologics/>
<https://www.pinterest.com/emersonecologic/>
<https://www.glassdoor.com/Overview/Working-at-Emerson-Ecologics-EI_IE388367.11,28.htm>

-- 
*This message is intended only for the use of the individual or entity to 
which it is addressed and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If you have 
received this message in error, you are hereby notified that any use, 
dissemination, distribution or copying of this message is prohibited. If 
you have received this communication in error, please notify the sender 
immediately and destroy the transmitted information.*

Reply via email to