Re: Term Extraction

2009-08-13 Thread Grant Ingersoll
I would just throw your doc into a MemoryIndex (lives in contrib/ memory, I think; it only holds one doc), get the Vector and do what you need to do. So you would kind of be doing indexing, but not really. On Aug 13, 2009, at 8:43 AM, joe_coder wrote: Grant, thanks for responding. My i

Re: Term Extraction

2009-08-13 Thread joe_coder
For example, I am able to do Analyzer analyzer = new StandardAnalyzer(); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some text goes here")); Token t = ts.next(); while (t!=null) { System.out.println("token: "+t)); t

Re: Term Extraction

2009-08-13 Thread joe_coder
Grant, thanks for responding. My issue is that I am not planning to use lucene ( as I don't need any search capability, atleast yet). All I have is a text document and I need to extract keywords and their frequency ( which could be a simple split on space and tracking the count). But I realize th

Re: Term Extraction

2009-08-13 Thread Grant Ingersoll
On Aug 13, 2009, at 7:40 AM, joe_coder wrote: I was wondering if there is any way to directly use Lucene API to extract terms from a given string. My requirement is that I have a text document for which I need a term frequency vector ( after stemming, removing stopwords and synonyms che