I need a go to for writing the custom tokenizer. any suggestions? On Fri, Sep 25, 2015 at 2:36 PM, Siddhartha Singh Sandhu < sandhus...@gmail.com> wrote:
> For sure. > > On Fri, Sep 25, 2015 at 1:13 PM, Alexandre Rafalovitch <arafa...@gmail.com > > wrote: > >> I think (I lost the library link) you would need to build a bridge by >> doing a custom Analyzer or Tokenizer and then using the library under >> the covers. Would be a nice contribution to open-source if you managed >> to achieve that. >> >> Regards, >> Alex. >> ---- >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> http://www.solr-start.com/ >> >> >> On 25 September 2015 at 12:58, Siddhartha Singh Sandhu >> <sandhus...@gmail.com> wrote: >> > Hi, >> > >> > I wanted to use the twitter-text libraries github implementation to >> filter >> > the tokens(hashtags) in my text. I know I can use the Pattern Matching >> > tokenizer also, but would trust twitter's library more then my own >> regex to >> > do the job for me. I wanted to use it in unison with >> > the solr.WhitespaceTokenizerFactory to get the tokens. >> > >> > Need help in understanding on how can I do that. Do I have to refactor >> the >> > twitter Java library to "extends TokenFilterFactory" or can I use it the >> > way it is. >> > >> > Regards, >> > >> > Sid. >> > >