[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-02 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
Thank you, @romseygeek , for thinking of this PR.  I'm closing it because I 
don't want to wreck the API of CustomAnalyzer.Builder().


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-02 Thread romseygeek
Github user romseygeek commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
I think we can proceed with LUCENE-8497 anyway by adding a new method to 
TokenizerChain that only uses multi-term aware filters.  Feel free to close 
this one @tballison .


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
Custom Analyzer was just added to have an easy-to-use Builder-Like API for 
Analyzers. It was not meant to replace SOlr's (although it would be nice, but 
it's impossible as I figured out at that time, too). Solr is based on 
modifiable classes and XML, not builders...


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
If you can't make Solr work with the current public CustomAnalyzer API, 
please keep it as is and use your own TokenizerChain in Solr. Please don't make 
CustomAnalyzer unmodifiable or add access to internal fields! In fact, it's a 
minimum amount of code behind the 3 lists of factories that build up the 
Analyzer that does not justify cluttering Lucene's API (like the horrible 
MultiTermAwareComponent added by Solr).


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
Wow...long time since I've visited this code.  Now I think I recall...the 
ugliness that I don't like imposing on the CustomAnalyzer's API is that it 
holds its own ResourceLoader and applies it when the user calls, e.g. 
`withTokenizer(class/classname, params)`, 
`add(Token|Char)Filter(class/classname, params)`.  

In Solr, the charfilter, tokenizer, tokenfilter factories are fully built 
with resources loaded by `FieldTypePluginLoader`'s  `loader` a 
(`SolrResourceLoader`) in `readAnalyzer(Node node)` one by one...I think (???), 
and _then_ they are added to the `CustomAnalyzer`.

I also see in `ManagedIndexSchema`, that there's `postReadInform()` which 
calls `informResourceLoaderAwareObjectsInChain`, which then loads the resources.

So, when I break the API in CustomAnalyzer and make public, e.g. 
`addTokenFilter(TokenFilterFactory factory)`, there's an unused private 
variable `ResourceLoader loader`, which feels ugly...a user could both specify 
a resource loader in `Builder`'s initializer and then pass in fully loaded 
components that would bypass that resource loader.  This smells bad to me...

Any recommendations?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread Alan Woodward
See LUCENE-8497 for more details.  Mayya would like to replace the marker 
interface with type-safe methods on CharFilterFactory and TokenFilterFactory

> On 1 Oct 2018, at 16:15, Erick Erickson  wrote:
> 
> Just skimming here and on my way to the airport. The original purpose
> of MultiTermAwareComponent was to be able to, for instance, search
> wildcards without having to, say, lowercase the term on the client side
> (remember all the questions like "Why does a search for 'powerful' succeed
> but not 'Power*' ?"). I'm assuming this change won't alter that behavior
> 
> 
> On Mon, Oct 1, 2018 at 1:06 AM romseygeek  wrote:
>> 
>> Github user romseygeek commented on the issue:
>> 
>>   https://github.com/apache/lucene-solr/pull/328
>> 
>>   Hi @tballison , are you still interested in pushing this one?  I can help 
>> out, as I'd like to get this committed so that we can move forward with 
>> LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>> 
>> 
>> ---
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread romseygeek
Github user romseygeek commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
The lucene-level changes are there so that you can take an existing 
CustomAnalyzer and tweak it, if I'm reading correctly.  Can you instead add a 
constructor to CustomAnalyzer.Builder that takes a CustomAnalyzer?  I think 
that would simplify things.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
@romseygeek , y, happy to fix/update this.  I'll take a look later today.

Part of the reason I gave up on this is that I didn't like the changes I 
had to make at the Lucene level.  It felt like I was screwing up the elegant 
Lucene-level API.  Any recommendations?

Also, @dsmiley recommended I move the Lucene-level modifications to another 
issue.  Are you ok if these go in as one, or should I open up a separate PR for 
the Lucene-level mods?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread Erick Erickson
Just skimming here and on my way to the airport. The original purpose
of MultiTermAwareComponent was to be able to, for instance, search
wildcards without having to, say, lowercase the term on the client side
(remember all the questions like "Why does a search for 'powerful' succeed
but not 'Power*' ?"). I'm assuming this change won't alter that behavior


On Mon, Oct 1, 2018 at 1:06 AM romseygeek  wrote:
>
> Github user romseygeek commented on the issue:
>
> https://github.com/apache/lucene-solr/pull/328
>
> Hi @tballison , are you still interested in pushing this one?  I can help 
> out, as I'd like to get this committed so that we can move forward with 
> LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>
>
> ---
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #328: SOLR-12034

2018-10-01 Thread romseygeek
Github user romseygeek commented on the issue:

https://github.com/apache/lucene-solr/pull/328
  
Hi @tballison , are you still interested in pushing this one?  I can help 
out, as I'd like to get this committed so that we can move forward with 
LUCENE-8497 and remove the `MultiTermAwareComponent` interface.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org