Filtered docs and positions enum
First sorry for the post to here and the solr list, not sure where this is most appropriately asked but since there is no response there I figured I'd try here... I have what I believe to be a fairly unique use case (as i have not seen it mentioned before) that I'm looking for some thoughts on. I currently have a need to filter terms based on a users authorizations, the implementation is currently based on https://github.com/jej2003/lucure-core/blob/master/src/main/java/com/lucure/core/codec/AccessFilteredDocsAndPositionsEnum.java The current implementation that we're using wraps a DocsAndPositionsEnum, but there is a bit of an unknown that I am not sure is or is not an issue around freq() and positions for a particular term. Specifically right now freq() is unmodified as is provided by the wrapped DocsAndPositionsEnum, but when a caller calls nextPosition and encounters a term with authorizations they don't have access to we simply call nextPosition on the wrapped DocsAndPositionsEnum. In this scenario we've said for instance that freq() was 2, but the caller only had access to 1. Currently there is no equivalent to the no more docs constant for positions so we are currently returning -1 (though we're considering changing to MAX_INTEGER). We've already seen possible issues with this in the phrase scorer (thus the reason we were considering returning MAX_INTEGER), but the only way I can truly see to remedy this in the current implementation is to get freq() right from the start, I unfortunately can't see how to do that without processing all of the items up front to get freq correct given the users authorizations. Ok, that was long so now for the question. Is returning a huge number (say MAX_INTEGER) from nextPosition() ok for situations like this? Is there specific places we should be looking to verify? I know ideally we instead would look to get the frequencies correct given the authorizations, but if there aren't any negative consequences to the current approach I would prefer to avoid the upfront processing. As always any feedback would be appreciated
How to use case in-sentive search
Dear Team, I am trying to build a search engine for fetching person info based on name or email Id. For this I have standard Analyzer & wildcard. If I enter case senstive query I get the result. but how to go about for case in-senstive I mean if I search for rohan or Rohan should be same, Currently I search as per DB that is Rohan , I get the result & not for rohan. I have posted the same query in Stack overflow http://stackoverflow.com/questions/30881355/java-lucene-4-5-how-to-search-by-case-insensitive/30926385#30926385 Please help me out, is there any refernce where I can look in -- Thanks & Regards Vardhaman B.N 9945840928
Re: How to use case in-sentive search
Add LowercaseFilterFactory to your analysis chain for the fieldType both at query and index time. You'll need to re-index. The admin UI/analysis page will help you understand the effects of each analysis step defined in your fieldTypes. Best, Erick On Fri, Aug 14, 2015 at 3:44 AM, vardhaman narasagoudar wrote: > Dear Team, > > I am trying to build a search engine for fetching person info based on name > or email Id. For this I have standard Analyzer & wildcard. If I enter case > senstive query I get the result. but how to go about for case in-senstive > > I mean if I search for rohan or Rohan should be same, Currently I search > as per DB that is Rohan , I get the result & not for rohan. > > I have posted the same query in Stack overflow > http://stackoverflow.com/questions/30881355/java-lucene-4-5-how-to-search-by-case-insensitive/30926385#30926385 > > Please help me out, is there any refernce where I can look in > > -- > Thanks & Regards > Vardhaman B.N > 9945840928 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to use case in-sentive search
I was assuming this was a Lucene question... The StandardAnalyzer already includes the lower case filter, so the default should be case-insensitive query. See: https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html If the question was really how to get case-sensitive query, simply create your own analyzer without the lower case filter. -- Jack Krupansky On Fri, Aug 14, 2015 at 10:07 AM, Erick Erickson wrote: > Add LowercaseFilterFactory to your analysis chain for the fieldType > both at query and index time. You'll need to re-index. > > The admin UI/analysis page will help you understand the effects > of each analysis step defined in your fieldTypes. > > Best, > Erick > > On Fri, Aug 14, 2015 at 3:44 AM, vardhaman narasagoudar > wrote: > > Dear Team, > > > > I am trying to build a search engine for fetching person info based on > name > > or email Id. For this I have standard Analyzer & wildcard. If I enter > case > > senstive query I get the result. but how to go about for case in-senstive > > > > I mean if I search for rohan or Rohan should be same, Currently I search > > as per DB that is Rohan , I get the result & not for rohan. > > > > I have posted the same query in Stack overflow > > > http://stackoverflow.com/questions/30881355/java-lucene-4-5-how-to-search-by-case-insensitive/30926385#30926385 > > > > Please help me out, is there any refernce where I can look in > > > > -- > > Thanks & Regards > > Vardhaman B.N > > 9945840928 > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: How to use case in-sentive search
Hi, Wildcard queries don't use the Analyzer, so they are case sensitive. Most of Lucene's query parsers allow to lowercase although there is a wildcard, but xou have to enable this. In most cases it is recommended to use a plain simple analyzer for fields using wildcards. If you also have stemming this will not work correctly with wildcards. In general, if your queries require wildcards by default then you should review your analysis! A good configured analysis chain should allow the user to find stuff without using wildcards!!! Uwe Am 14. August 2015 16:12:46 MESZ, schrieb Jack Krupansky : >I was assuming this was a Lucene question... > >The StandardAnalyzer already includes the lower case filter, so the >default >should be case-insensitive query. > >See: >https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html > >If the question was really how to get case-sensitive query, simply >create >your own analyzer without the lower case filter. > > >-- Jack Krupansky > >On Fri, Aug 14, 2015 at 10:07 AM, Erick Erickson > >wrote: > >> Add LowercaseFilterFactory to your analysis chain for the fieldType >> both at query and index time. You'll need to re-index. >> >> The admin UI/analysis page will help you understand the effects >> of each analysis step defined in your fieldTypes. >> >> Best, >> Erick >> >> On Fri, Aug 14, 2015 at 3:44 AM, vardhaman narasagoudar >> wrote: >> > Dear Team, >> > >> > I am trying to build a search engine for fetching person info based >on >> name >> > or email Id. For this I have standard Analyzer & wildcard. If I >enter >> case >> > senstive query I get the result. but how to go about for case >in-senstive >> > >> > I mean if I search for rohan or Rohan should be same, Currently I >search >> > as per DB that is Rohan , I get the result & not for rohan. >> > >> > I have posted the same query in Stack overflow >> > >> >http://stackoverflow.com/questions/30881355/java-lucene-4-5-how-to-search-by-case-insensitive/30926385#30926385 >> > >> > Please help me out, is there any refernce where I can look in >> > >> > -- >> > Thanks & Regards >> > Vardhaman B.N >> > 9945840928 >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de
getting full english word from tokenizing with SmartChineseAnalyzer
Hi, I am new with Lucene Analyzer. I would like to get the full English tokens from SmartChineseAnalyzer. But I’m only getting stems. The following code has predefined the sentence in "testStr": String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 晋级决赛secure position. congratulations."; The printed tokenized result is: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanes player 奥 原 希望 这 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secur posit congratul As you can see some long English tokens such as Japanese, position and congratulations are cut short in the tokenization process. I hope I didn't use it wrong. Test code: private static void testChineseTokenizer() { String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 晋级决赛secure position. congratulations."; Analyzer analyzer = new SmartChineseAnalyzer(); List result = new ArrayList(); StringReader sr = new StringReader(testStr); try { TokenStream stream = analyzer.tokenStream(null,sr); CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class); stream.reset(); while (stream.incrementToken()) { String token = cattr.toString(); result.add(token); } stream.end(); stream.close(); sr.close(); analyzer.close(); stream = null; for (String tok: result) { System.out.print(" " + tok); } System.out.println(); } catch(IOException e) { // not thrown b/c we're using a string reader... } }
Re: getting full english word from tokenizing with SmartChineseAnalyzer
The easiest thing to do is to create your own analyzer, cut and paste the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it, and get rid of the line in createComponents(String fieldName, Reader reader) that says result = new PorterStemFilter(result); On Fri, Aug 14, 2015 at 11:20 AM, Wayne Xin wrote: > Hi, > > > > I am new with Lucene Analyzer. I would like to get the full English tokens > from SmartChineseAnalyzer. But I’m only getting stems. The following code > has predefined the sentence in "testStr": > String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 > 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 > 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 > 晋级决赛secure position. congratulations."; > > The printed tokenized result is: > > 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 > first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 > 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanes player 奥 原 希望 这 > 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secur posit congratul > > As you can see some long English tokens such as Japanese, position and > congratulations are cut short in the tokenization process. I hope I didn't > use it wrong. > > Test code: > > private static void testChineseTokenizer() { > String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 > 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 > 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 > 晋级决赛secure position. congratulations."; > Analyzer analyzer = new SmartChineseAnalyzer(); > List result = new ArrayList(); > StringReader sr = new StringReader(testStr); > > try { > TokenStream stream = analyzer.tokenStream(null,sr); > CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class); > stream.reset(); > while (stream.incrementToken()) > { String token = cattr.toString(); result.add(token); } > > stream.end(); > stream.close(); > sr.close(); > analyzer.close(); > stream = null; > for (String tok: result) > { System.out.print(" " + tok); } > > System.out.println(); > } > catch(IOException e) > { // not thrown b/c we're using a string reader... } > > } > > > >
Re: getting full english word from tokenizing with SmartChineseAnalyzer
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is final, otherwise we could overwrite createComponents(). New output: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanese player 奥 原 希望 这 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secure position congratulations -Wayne On 8/14/15, 8:48 AM, "Michael Mastroianni" wrote: >The easiest thing to do is to create your own analyzer, cut and paste the >code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into >it, >and get rid of the line in createComponents(String fieldName, Reader >reader) that says > >result = new PorterStemFilter(result); > > >On Fri, Aug 14, 2015 at 11:20 AM, Wayne Xin wrote: > >> Hi, >> >> >> >> I am new with Lucene Analyzer. I would like to get the full English >>tokens >> from SmartChineseAnalyzer. But I’m only getting stems. The following >>code >> has predefined the sentence in "testStr": >> String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 >> 晋级决赛secure position. congratulations."; >> >> The printed tokenized result is: >> >> 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 >> first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 >> 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanes player 奥 原 希望 这 >> 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secur posit congratul >> >> As you can see some long English tokens such as Japanese, position and >> congratulations are cut short in the tokenization process. I hope I >>didn't >> use it wrong. >> >> Test code: >> >> private static void testChineseTokenizer() { >> String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在2/4区,不 >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区,6号种子王仪涵若想 >> 晋级决赛secure position. congratulations."; >> Analyzer analyzer = new SmartChineseAnalyzer(); >> List result = new ArrayList(); >> StringReader sr = new StringReader(testStr); >> >> try { >> TokenStream stream = analyzer.tokenStream(null,sr); >> CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class); >> stream.reset(); >> while (stream.incrementToken()) >> { String token = cattr.toString(); result.add(token); } >> >> stream.end(); >> stream.close(); >> sr.close(); >> analyzer.close(); >> stream = null; >> for (String tok: result) >> { System.out.print(" " + tok); } >> >> System.out.println(); >> } >> catch(IOException e) >> { // not thrown b/c we're using a string reader... } >> >> } >> >> >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: getting full english word from tokenizing with SmartChineseAnalyzer
Hi, it's much easier to create own analyzers since Lucene 5.0 (without defining your own classes): https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html Using the builder you can create your own analyzer just with a few lines of code. The names and params used are the factories known from Apache Solr. Analyzers are final by design. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Wayne Xin [mailto:wayne_...@hotmail.com] > Sent: Friday, August 14, 2015 8:44 PM > To: java-user@lucene.apache.org > Subject: Re: getting full english word from tokenizing with > SmartChineseAnalyzer > > Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is > final, otherwise we could overwrite createComponents(). > > New output: > > 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 > 马 林 > first seed 同 处 1 4 区 3 号 > 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 > 铉 > 先 要 过 日本 小将 > japanese player 奥 原 希望 这 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 > 决赛 > secure position > congratulations > > -Wayne > > > > On 8/14/15, 8:48 AM, "Michael Mastroianni" > wrote: > > >The easiest thing to do is to create your own analyzer, cut and paste > >the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer > >into it, and get rid of the line in createComponents(String fieldName, > >Reader > >reader) that says > > > >result = new PorterStemFilter(result); > > > > > >On Fri, Aug 14, 2015 at 11:20 AM, Wayne Xin > wrote: > > > >> Hi, > >> > >> > >> > >> I am new with Lucene Analyzer. I would like to get the full English > >>tokens from SmartChineseAnalyzer. But I’m only getting stems. The > >>following code has predefined the sentence in "testStr": > >> String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军 > 西班牙选手马 > >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成 > 池铉处在2/4区,不 > >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区 > ,6号种子王仪涵若想 > >> 晋级决赛secure position. congratulations."; > >> > >> The printed tokenized result is: > >> > >> 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选 > 手 马 林 > >> first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 > 池 > >> 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanes player 奥 原 > 希望 这 > >> 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secur posit congratul > >> > >> As you can see some long English tokens such as Japanese, position > >>and congratulations are cut short in the tokenization process. I hope > >>I didn't use it wrong. > >> > >> Test code: > >> > >> private static void testChineseTokenizer() { String testStr = > >> "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 > >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成 > 池铉处在2/4区,不 > >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区 > ,6号种子王仪涵若想 > >> 晋级决赛secure position. congratulations."; Analyzer analyzer = new > >> SmartChineseAnalyzer(); List result = new > >> ArrayList(); StringReader sr = new StringReader(testStr); > >> > >> try { > >> TokenStream stream = analyzer.tokenStream(null,sr); CharTermAttribute > >> cattr = stream.addAttribute(CharTermAttribute.class); > >> stream.reset(); > >> while (stream.incrementToken()) > >> { String token = cattr.toString(); result.add(token); } > >> > >> stream.end(); > >> stream.close(); > >> sr.close(); > >> analyzer.close(); > >> stream = null; > >> for (String tok: result) > >> { System.out.print(" " + tok); } > >> > >> System.out.println(); > >> } > >> catch(IOException e) > >> { // not thrown b/c we're using a string reader... } > >> > >> } > >> > >> > >> > >> > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: getting full english word from tokenizing with SmartChineseAnalyzer
Thanks Uwe. This seems to be a handy tool. My problem is I need a better example (tutorial maybe) to show me what are necessary/default filters a SmartChineseAnalyzer or JapaneseAnalyzer needs. In this case, I guess I need a HMMChineseTokenzier and a stop filter but not a porter stem filter. I could give a try later but a tutorial would be nice. Thanks for the suggestion though. -Wayne On 8/14/15, 4:40 PM, "Uwe Schindler" wrote: >Hi, > >it's much easier to create own analyzers since Lucene 5.0 (without >defining your own classes): >https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/an >alysis/custom/CustomAnalyzer.html >Using the builder you can create your own analyzer just with a few lines >of code. The names and params used are the factories known from Apache >Solr. > >Analyzers are final by design. > >Uwe >- >Uwe Schindler >H.-H.-Meier-Allee 63, D-28213 Bremen >http://www.thetaphi.de >eMail: u...@thetaphi.de > > >> -Original Message- >> From: Wayne Xin [mailto:wayne_...@hotmail.com] >> Sent: Friday, August 14, 2015 8:44 PM >> To: java-user@lucene.apache.org >> Subject: Re: getting full english word from tokenizing with >> SmartChineseAnalyzer >> >> Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is >> final, otherwise we could overwrite createComponents(). >> >> New output: >> >> 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 >> 马 林 >> first seed 同 处 1 4 区 3 号 >> 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 >> 铉 >> 先 要 过 日本 小将 >> japanese player 奥 原 希望 这 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 >> 决赛 >> secure position >> congratulations >> >> -Wayne >> >> >> >> On 8/14/15, 8:48 AM, "Michael Mastroianni" >> wrote: >> >> >The easiest thing to do is to create your own analyzer, cut and paste >> >the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer >> >into it, and get rid of the line in createComponents(String fieldName, >> >Reader >> >reader) that says >> > >> >result = new PorterStemFilter(result); >> > >> > >> >On Fri, Aug 14, 2015 at 11:20 AM, Wayne Xin >> wrote: >> > >> >> Hi, >> >> >> >> >> >> >> >> I am new with Lucene Analyzer. I would like to get the full English >> >>tokens from SmartChineseAnalyzer. But I’m only getting stems. The >> >>following code has predefined the sentence in "testStr": >> >> String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军 >> 西班牙选手马 >> >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成 >> 池铉处在2/4区,不 >> >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区 >> ,6号种子王仪涵若想 >> >> 晋级决赛secure position. congratulations."; >> >> >> >> The printed tokenized result is: >> >> >> >> 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选 >> 手 马 林 >> >> first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 >> 池 >> >> 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanes player 奥 原 >> 希望 这 >> >> 关 下 半 区 6 号 种子 王 仪 涵 若 想 晋级 决赛 secur posit congratul >> >> >> >> As you can see some long English tokens such as Japanese, position >> >>and congratulations are cut short in the tokenization process. I hope >> >>I didn't use it wrong. >> >> >> >> Test code: >> >> >> >> private static void testChineseTokenizer() { String testStr = >> >> "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 >> >> 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成 >> 池铉处在2/4区,不 >> >> 过成池铉先要过日本小将(Japanese player)奥原希望这关。下半区 >> ,6号种子王仪涵若想 >> >> 晋级决赛secure position. congratulations."; Analyzer analyzer = new >> >> SmartChineseAnalyzer(); List result = new >> >> ArrayList(); StringReader sr = new StringReader(testStr); >> >> >> >> try { >> >> TokenStream stream = analyzer.tokenStream(null,sr); CharTermAttribute >> >> cattr = stream.addAttribute(CharTermAttribute.class); >> >> stream.reset(); >> >> while (stream.incrementToken()) >> >> { String token = cattr.toString(); result.add(token); } >> >> >> >> stream.end(); >> >> stream.close(); >> >> sr.close(); >> >> analyzer.close(); >> >> stream = null; >> >> for (String tok: result) >> >> { System.out.print(" " + tok); } >> >> >> >> System.out.println(); >> >> } >> >> catch(IOException e) >> >> { // not thrown b/c we're using a string reader... } >> >> >> >> } >> >> >> >> >> >> >> >> >> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org