Re: Filter Factory question

2017-09-29 Thread Emir Arnautović
It is still on master: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.java

Re: Filter Factory question

2017-09-28 Thread Erick Erickson
PatternCaptureGroupTokenFilter has been around since 2013 (at least that's the earliest revision in Git). I located it even in 5x so it should be there in ...lucene/analysis/common/src/java/org/apache/lucene/analysis/pattern Best, Erick On Thu, Sep 28, 2017 at 7:45 AM, Webster Homer

Re: Filter Factory question

2017-09-28 Thread Webster Homer
It's still buggy, so not ready to share. I keep a copy of Solr source which I use for this type of development. I don't see PatternCaptureGroupTokenFilterFactory in the Solr 6.2 code base at all. I was thinking of seeing how it treated the positions etc... My code now looks reasonable in the

Re: Filter Factory question

2017-09-27 Thread Stefan Matheis
> In any case I figured out my problem. I was over thinking it. Mind to share? -Stefan On Sep 27, 2017 4:34 PM, "Webster Homer" wrote: > There is a need for a special filter since the input has to be normalized. > That is the main requirement, splitting into pieces is

Re: Filter Factory question

2017-09-27 Thread Webster Homer
There is a need for a special filter since the input has to be normalized. That is the main requirement, splitting into pieces is optional. As far as I know there is nothing in solr that knows about molecular formulas. In any case I figured out my problem. I was over thinking it. On Wed, Sep 27,

Re: Filter Factory question

2017-09-27 Thread Emir Arnautović
Hi Homer, There is no need for special filter, there is one that is for some reason not part of documentation (will ask why so follow that thread if decided to go this way): You can use something like: This will capture all atom counts as a separate tokens. HTH, Emir > On 26 Sep 2017, at

Filter Factory question

2017-09-26 Thread Webster Homer
I am trying to create a filter that normalizes an input token, but also splits it inot multiple pieces. Sort of like what the WordDelimiterFilter does. It's meant to take a molecular formula like C2H6O and normalize it to C2H6O1 That part works. However I was also going to have it put out the

Re: phonetic filter factory question

2015-08-16 Thread Jamie Johnson
Thanks, i didn't know you could do this, I'll check this out. On Aug 15, 2015 12:54 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: From the teaching to fish category of advice (since I don't know the actual answer). Did you try Analysis screen in the Admin UI? If you check Verbose

Re: phonetic filter factory question

2015-08-15 Thread Alexandre Rafalovitch
From the teaching to fish category of advice (since I don't know the actual answer). Did you try Analysis screen in the Admin UI? If you check Verbose output mark, you will see all the offsets and can easily confirm the detailed behavior for yourself. Regards, Alex. Solr Analyzers,

phonetic filter factory question

2015-08-15 Thread Jamie Johnson
The JavaDoc says that the PhoneticFilterFactory will inject tokens with an offset of 0 into the stream. I'm assuming this means an offset of 0 from the token that it is analyzing, is that right? I am trying to collapse some of my schema, I currently have a text field that I use for general