[GitHub] lucene-solr issue #328: SOLR-12034
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/328 Thank you, @romseygeek , for thinking of this PR. I'm closing it because I don't want to wreck the API of CustomAnalyzer.Builder(). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user romseygeek commented on the issue: https://github.com/apache/lucene-solr/pull/328 I think we can proceed with LUCENE-8497 anyway by adding a new method to TokenizerChain that only uses multi-term aware filters. Feel free to close this one @tballison . --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/328 Custom Analyzer was just added to have an easy-to-use Builder-Like API for Analyzers. It was not meant to replace SOlr's (although it would be nice, but it's impossible as I figured out at that time, too). Solr is based on modifiable classes and XML, not builders... --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/328 If you can't make Solr work with the current public CustomAnalyzer API, please keep it as is and use your own TokenizerChain in Solr. Please don't make CustomAnalyzer unmodifiable or add access to internal fields! In fact, it's a minimum amount of code behind the 3 lists of factories that build up the Analyzer that does not justify cluttering Lucene's API (like the horrible MultiTermAwareComponent added by Solr). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/328 Wow...long time since I've visited this code. Now I think I recall...the ugliness that I don't like imposing on the CustomAnalyzer's API is that it holds its own ResourceLoader and applies it when the user calls, e.g. `withTokenizer(class/classname, params)`, `add(Token|Char)Filter(class/classname, params)`. In Solr, the charfilter, tokenizer, tokenfilter factories are fully built with resources loaded by `FieldTypePluginLoader`'s `loader` a (`SolrResourceLoader`) in `readAnalyzer(Node node)` one by one...I think (???), and _then_ they are added to the `CustomAnalyzer`. I also see in `ManagedIndexSchema`, that there's `postReadInform()` which calls `informResourceLoaderAwareObjectsInChain`, which then loads the resources. So, when I break the API in CustomAnalyzer and make public, e.g. `addTokenFilter(TokenFilterFactory factory)`, there's an unused private variable `ResourceLoader loader`, which feels ugly...a user could both specify a resource loader in `Builder`'s initializer and then pass in fully loaded components that would bypass that resource loader. This smells bad to me... Any recommendations? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GitHub] lucene-solr issue #328: SOLR-12034
See LUCENE-8497 for more details. Mayya would like to replace the marker interface with type-safe methods on CharFilterFactory and TokenFilterFactory > On 1 Oct 2018, at 16:15, Erick Erickson wrote: > > Just skimming here and on my way to the airport. The original purpose > of MultiTermAwareComponent was to be able to, for instance, search > wildcards without having to, say, lowercase the term on the client side > (remember all the questions like "Why does a search for 'powerful' succeed > but not 'Power*' ?"). I'm assuming this change won't alter that behavior > > > On Mon, Oct 1, 2018 at 1:06 AM romseygeek wrote: >> >> Github user romseygeek commented on the issue: >> >> https://github.com/apache/lucene-solr/pull/328 >> >> Hi @tballison , are you still interested in pushing this one? I can help >> out, as I'd like to get this committed so that we can move forward with >> LUCENE-8497 and remove the `MultiTermAwareComponent` interface. >> >> >> --- >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user romseygeek commented on the issue: https://github.com/apache/lucene-solr/pull/328 The lucene-level changes are there so that you can take an existing CustomAnalyzer and tweak it, if I'm reading correctly. Can you instead add a constructor to CustomAnalyzer.Builder that takes a CustomAnalyzer? I think that would simplify things. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/328 @romseygeek , y, happy to fix/update this. I'll take a look later today. Part of the reason I gave up on this is that I didn't like the changes I had to make at the Lucene level. It felt like I was screwing up the elegant Lucene-level API. Any recommendations? Also, @dsmiley recommended I move the Lucene-level modifications to another issue. Are you ok if these go in as one, or should I open up a separate PR for the Lucene-level mods? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GitHub] lucene-solr issue #328: SOLR-12034
Just skimming here and on my way to the airport. The original purpose of MultiTermAwareComponent was to be able to, for instance, search wildcards without having to, say, lowercase the term on the client side (remember all the questions like "Why does a search for 'powerful' succeed but not 'Power*' ?"). I'm assuming this change won't alter that behavior On Mon, Oct 1, 2018 at 1:06 AM romseygeek wrote: > > Github user romseygeek commented on the issue: > > https://github.com/apache/lucene-solr/pull/328 > > Hi @tballison , are you still interested in pushing this one? I can help > out, as I'd like to get this committed so that we can move forward with > LUCENE-8497 and remove the `MultiTermAwareComponent` interface. > > > --- > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user romseygeek commented on the issue: https://github.com/apache/lucene-solr/pull/328 Hi @tballison , are you still interested in pushing this one? I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org