Re: Highlighting words with special characters
Hi Shawn, Yes I can confirm, it works with out any errors with multiple tokenizers. Following is my analysis chain StandardTokenizerFactory (only in index) StopFilterFactory LowerCaseFilterFactory ASCIIFoldingFilterFactory EnglishPossessiveFilterFactory StemmerOverrideFilterFactory (only in query) NgramTokenizerFactory (only in index) I'll have a look more into what you said, Single tokenizer in analysis chain. Regards, Lasitha Lasitha Wattaladeniya Software Engineer Mobile : +6593896893 Blog : techreadme.blogspot.com On Thu, Jul 20, 2017 at 7:12 PM, Shawn Heiseywrote: > On 7/19/2017 8:31 PM, Lasitha Wattaladeniya wrote: > > But I have NgramTokenizerFactory at the end of indexing analyzer chain. > > Therefore I should still tokenize the email address. But how this affects > > the highlighting?, that's what I'm confused to understand > > You can only have one tokenizer in an analysis chain. I have no idea > what happens if you have more than one. I personally would expect that > to result in an initialization error, but maybe what it does is ignore > the additional tokenizers. Your experience seems to indicate that it > does NOT result in an error. Can you confirm? > > The analysis is done in this order: > > CharFilters > Tokenizer > Filters > > Thanks, > Shawn > >
Re: Highlighting words with special characters
Hi ahmet, But I have NgramTokenizerFactory at the end of indexing analyzer chain. Therefore I should still tokenize the email address. But how this affects the highlighting?, that's what I'm confused to understand Solr version : 4.10.4 Regards, Lasitha On 20 Jul 2017 08:28, "Ahmet Arslan"wrote: Hi, Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* tokenize URLs and Emails. Actually it recognises them and emits them as a single token. Ahmet On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya < watt...@gmail.com> wrote: Update, I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and now it shows highlighted text fragments in the indexed email text. But I don't understand this behavior. Can someone shed some light please On 18 Jul 2017 14:18, "Lasitha Wattaladeniya" wrote: > Further more, ngram field has following tokenizer/filter chain in index > and query > > UAX29URLEmailTokenizerFactory (only in index) > stopFilterFactory > LowerCaseFilterFactory > ASCIIFoldingFilterFactory > EnglishPossessiveFilterFactory > StemmerOverrideFilterFactory (only in query) > NgramTokenizerFactory (only in index) > > Regards, > Lasitha > > On 18 Jul 2017 14:11, "Lasitha Wattaladeniya" wrote: > >> Hi devs, >> >> I have setup solr highlighting with default setup (only changed the >> fragsize to 0 to match any field length). It worked fine but recently I >> discovered it doesn't highlight for words with special characters in the >> middle. >> >> For an example, let's say I have indexed email address test.f...@ran.com >> to a ngram field. And when I search for the partial text fsdg, I get the >> results but it's not highlighted. It works in all other scenarios as >> expected. >> >> The ngram field has termVectors, termPositions, termOffsets set to true. >> >> Can somebody please suggest me, what may be wrong here? >> >> (sorry for the unstructured text. Typed using a mobile phone ) >> >> Regards >> Lasitha >> >
Re: Highlighting words with special characters
Hi, Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* tokenize URLs and Emails. Actually it recognises them and emits them as a single token. Ahmet On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniyawrote: Update, I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and now it shows highlighted text fragments in the indexed email text. But I don't understand this behavior. Can someone shed some light please On 18 Jul 2017 14:18, "Lasitha Wattaladeniya" wrote: > Further more, ngram field has following tokenizer/filter chain in index > and query > > UAX29URLEmailTokenizerFactory (only in index) > stopFilterFactory > LowerCaseFilterFactory > ASCIIFoldingFilterFactory > EnglishPossessiveFilterFactory > StemmerOverrideFilterFactory (only in query) > NgramTokenizerFactory (only in index) > > Regards, > Lasitha > > On 18 Jul 2017 14:11, "Lasitha Wattaladeniya" wrote: > >> Hi devs, >> >> I have setup solr highlighting with default setup (only changed the >> fragsize to 0 to match any field length). It worked fine but recently I >> discovered it doesn't highlight for words with special characters in the >> middle. >> >> For an example, let's say I have indexed email address test.f...@ran.com >> to a ngram field. And when I search for the partial text fsdg, I get the >> results but it's not highlighted. It works in all other scenarios as >> expected. >> >> The ngram field has termVectors, termPositions, termOffsets set to true. >> >> Can somebody please suggest me, what may be wrong here? >> >> (sorry for the unstructured text. Typed using a mobile phone ) >> >> Regards >> Lasitha >> >
Re: Highlighting words with special characters
Update, I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and now it shows highlighted text fragments in the indexed email text. But I don't understand this behavior. Can someone shed some light please On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"wrote: > Further more, ngram field has following tokenizer/filter chain in index > and query > > UAX29URLEmailTokenizerFactory (only in index) > stopFilterFactory > LowerCaseFilterFactory > ASCIIFoldingFilterFactory > EnglishPossessiveFilterFactory > StemmerOverrideFilterFactory (only in query) > NgramTokenizerFactory (only in index) > > Regards, > Lasitha > > On 18 Jul 2017 14:11, "Lasitha Wattaladeniya" wrote: > >> Hi devs, >> >> I have setup solr highlighting with default setup (only changed the >> fragsize to 0 to match any field length). It worked fine but recently I >> discovered it doesn't highlight for words with special characters in the >> middle. >> >> For an example, let's say I have indexed email address test.f...@ran.com >> to a ngram field. And when I search for the partial text fsdg, I get the >> results but it's not highlighted. It works in all other scenarios as >> expected. >> >> The ngram field has termVectors, termPositions, termOffsets set to true. >> >> Can somebody please suggest me, what may be wrong here? >> >> (sorry for the unstructured text. Typed using a mobile phone ) >> >> Regards >> Lasitha >> >
Re: Highlighting words with special characters
Further more, ngram field has following tokenizer/filter chain in index and query UAX29URLEmailTokenizerFactory (only in index) stopFilterFactory LowerCaseFilterFactory ASCIIFoldingFilterFactory EnglishPossessiveFilterFactory StemmerOverrideFilterFactory (only in query) NgramTokenizerFactory (only in index) Regards, Lasitha On 18 Jul 2017 14:11, "Lasitha Wattaladeniya"wrote: > Hi devs, > > I have setup solr highlighting with default setup (only changed the > fragsize to 0 to match any field length). It worked fine but recently I > discovered it doesn't highlight for words with special characters in the > middle. > > For an example, let's say I have indexed email address test.f...@ran.com > to a ngram field. And when I search for the partial text fsdg, I get the > results but it's not highlighted. It works in all other scenarios as > expected. > > The ngram field has termVectors, termPositions, termOffsets set to true. > > Can somebody please suggest me, what may be wrong here? > > (sorry for the unstructured text. Typed using a mobile phone ) > > Regards > Lasitha >