Re: PatternReplaceFilterFactory problem
Thanks for the help - changing the field type of the destination for the copy fields to "text_en" solved the problem. I'd foolishly assumed that the analysis of the source fields was applied then the resulting tokens passed to the copy field, which doesn't really make sense now that I think about it! So the indexing process is: +---+ ++ +-+ |companyName| | companyName | | companyName | |input data |>|text_en analysis|>|index| +---+ ++ +-+ | | ++ +-+ +-->| text |>|text | |text_en analysis| |index| ++ +-+ Rather than: +---+ ++ +-+ |companyName| | companyName | | companyName | |input data |>|text_en analysis|-->|index| +---+ ++ +-+ | +-+ +-+ | text|>|text | |text_general analysis| |index| +-+ +-+ On 28/01/2019 12:37, Scott Stults wrote: Hi Chris, You've included the field definition of type text_en, but in your queries you're searching the field "text", which is of type text_general. That may be the source of your problem, but if looking into that doesn't help send the definition of text_general as well. Hope that helps! -Scott On Mon, Jan 28, 2019 at 6:02 AM Chris Wareham < chris.ware...@graduate-jobs.com> wrote: I'm trying to index some data which often includes domain names. I'd like to remove the .com TLD, so I have modified the text_en field type by adding a PatternReplaceFilterFactory filter. However, it doesn't appear to be working as a search for "text:(mydomain.com)" matches records but "text:(mydomain)" does not. The actual field definitions are as follows:
Re: PatternReplaceFilterFactory problem
In Admin UI, there is an Analysis screen. You can enter your text and your query there and see what happens to it at every step of the processing pipeline. This should tell you whether the problem is in indexing, query, or somewhere else entirely (e.g. you are querying a different field as Scott suggests). Regards, Alex. P.s. (Semi-)random tip of the day. If you copyField the content, it is indexed and searched by the rules of the _target_ field. Only when you search on the field directly, its chain is invoked. On Mon, 28 Jan 2019 at 06:02, Chris Wareham wrote: > > I'm trying to index some data which often includes domain names. I'd > like to remove the .com TLD, so I have modified the text_en field type > by adding a PatternReplaceFilterFactory filter. However, it doesn't > appear to be working as a search for "text:(mydomain.com)" matches > records but "text:(mydomain)" does not. > > positionIncrementGap="100"> > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > The actual field definitions are as follows: > > stored="true" required="true" /> > stored="true" required="true" /> > stored="false" /> > > >
Re: PatternReplaceFilterFactory problem
Hi Chris, You've included the field definition of type text_en, but in your queries you're searching the field "text", which is of type text_general. That may be the source of your problem, but if looking into that doesn't help send the definition of text_general as well. Hope that helps! -Scott On Mon, Jan 28, 2019 at 6:02 AM Chris Wareham < chris.ware...@graduate-jobs.com> wrote: > I'm trying to index some data which often includes domain names. I'd > like to remove the .com TLD, so I have modified the text_en field type > by adding a PatternReplaceFilterFactory filter. However, it doesn't > appear to be working as a search for "text:(mydomain.com)" matches > records but "text:(mydomain)" does not. > > positionIncrementGap="100"> > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" synonyms="synonyms.txt"/> > ignoreCase="true"/> > > pattern="([-a-z])\.com" replacement="$1"/> > > protected="protwords.txt"/> > > > > > The actual field definitions are as follows: > > stored="true" required="true" /> > stored="true" required="true" /> > stored="false" /> > > > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com