I need a go to for writing the custom tokenizer. any suggestions?

On Fri, Sep 25, 2015 at 2:36 PM, Siddhartha Singh Sandhu <
sandhus...@gmail.com> wrote:

> For sure.
>
> On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch <arafa...@gmail.com
> > wrote:
>
>> I think (I lost the library link) you would need to build a bridge by
>> doing a custom Analyzer or Tokenizer and then using the library under
>> the covers. Would be a nice contribution to open-source if you managed
>> to achieve that.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 25 September 2015 at 12:58, Siddhartha Singh Sandhu
>> <sandhus...@gmail.com> wrote:
>> > Hi,
>> >
>> > I wanted to use the twitter-text libraries github implementation to
>> filter
>> > the tokens(hashtags) in my text. I know I can use the Pattern Matching
>> > tokenizer also, but would trust twitter's library more then my own
>> regex to
>> > do the job for me. I wanted to use it in unison with
>> > the solr.WhitespaceTokenizerFactory to get the tokens.
>> >
>> > Need help in understanding on how can I do that. Do I have to refactor
>> the
>> > twitter Java library to "extends TokenFilterFactory" or can I use it the
>> > way it is.
>> >
>> > Regards,
>> >
>> > Sid.
>>
>
>

Reply via email to