TermFrequency for a String

2017-04-28 Thread Manjula Wijewickrema
IndexReader.getTermFreqVectors(2)[0].getTermFrequencies()[5]; In the above example, Lucene gives me the term frequency of the 5th term (e.g. say "planet") in the tfv of the corpus document "2". But I need to get the term frequency for a specified term using its string value. E.g.: term

Total of term frequencies

2017-04-16 Thread Manjula Wijewickrema
Hi, Is there any way to get the total count of terms in the Term Frequency Vector (tvf)? I need to calculate the Normalized term frequency of each term in my tvf. I know how to obtain the length of the tvf, but it doesn't work since I need to count duplicate occurrences as well. Highly

Only term frequencies

2017-04-06 Thread Manjula Wijewickrema
Hi, I have a document collection with hundreds of documents. I need to do know the term frequency for a given query term in each document. I know that 'hit.score' will give me the Lucene score for each document (and it includes term frequency as well). But I need to call only term frequencies in

Re: hit.score

2017-03-27 Thread Manjula Wijewickrema
Thanks Adrien. On Mon, Mar 27, 2017 at 6:56 PM, Adrien Grand <jpou...@gmail.com> wrote: > You can use IndexSearcher.explain to see how the score was computed. > > Le lun. 27 mars 2017 à 14:46, Manjula Wijewickrema <manjul...@gmail.com> a > écrit : > > >

hit.score

2017-03-27 Thread Manjula Wijewickrema
Hi, Can someone help me to understand the value given by 'hit.score' in Lucene. I indexed a single document with five different words with different frequencies and try to understand this value. However, it doesn't seem to be normalized term frequency or tf-idf. I am using Lucene 2.91. Any help

Why hit is 0 for bigrams?

2014-07-07 Thread Manjula Wijewickrema
Hi, I tried to index bigrams from a documhe system gave and the system gave me the following output with the frequencies of the bigrams(output 1): array size:15 array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1,

bigram problem

2014-07-02 Thread Manjula Wijewickrema
Hi, Could please explain me how to determine the tf-idf score for bigrams. My program is able to index and search bigrams correctly, but it does not calculate the tf-idf for bigrams. If someone can, please help me to resolve this. Regards, Manjula.

Re: bigram problem

2014-07-02 Thread Manjula Wijewickrema
having the bigram. I hope this is fine. Alternatively, use NGramTokenizer where ( n=2 in your case) while indexing. In such a case, each bigram can interpreted as a normal lucene term. Thanks, Parnab On Wed, Jul 2, 2014 at 8:45 AM, Manjula Wijewickrema manjul...@gmail.com wrote: Hi

Why bigram tf-idf is 0?

2014-06-24 Thread Manjula Wijewickrema
Hi, In my programme, I tried to select the most relevant document based on bigrams. System gives me the following output. {contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main librari/2, manjula assist/4,

Re: ShingleAnalyzerWrapper question

2014-06-16 Thread Manjula Wijewickrema
Dear Steve, It works. Thanks. On Wed, Jun 11, 2014 at 6:18 PM, Steve Rowe sar...@gmail.com wrote: You should give sw rather than analyzer in the IndexWriter actor. Steve www.lucidworks.com On Jun 11, 2014 2:24 AM, Manjula Wijewickrema manjul...@gmail.com wrote: Hi, In my

ShingleAnalyzerWrapper question

2014-06-11 Thread Manjula Wijewickrema
Hi, In my programme, I can index and search a document based on unigrams. I modified the code as follows to obtain the results based on bigrams. However, it did not give me the desired output. * *public* *static* *void* createIndex() *throws* CorruptIndexException,

Re: Is it wrong to create index writer on each query request.

2014-06-05 Thread Manjula Wijewickrema
Hi, What are the other disadvantages (other than the time factor) of creating index for every request? Manjula. On Thu, Jun 5, 2014 at 2:34 PM, Aditya findbestopensou...@gmail.com wrote: Hi Rajendra You should NOT create index writer for every request. Whether it is time consuming to

Re: Phrase indexing and searching

2013-12-23 Thread Manjula Wijewickrema
...@gmail.com wrote: Hi Manjula, Sounds like ShingleFilter will do what you want: http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html Steve www.lucidworks.com On Dec 22, 2013 11:25 PM, Manjula Wijewickrema manjul...@gmail.com wrote

Phrase indexing and searching

2013-12-22 Thread Manjula Wijewickrema
Dear All, My Lucene programme is able to index single words and search the most matching documents (based on term frequencies) documents from a corpus to the input document. Now I want to index two word phrases and search the matching corpus documents (based on phrase frequencies) to the input

Phrase indexing and searching

2013-12-18 Thread Manjula Wijewickrema
Dear list, My Lucene programme is able to index single words and search the most matching documents (based on term frequencies) documents from a corpus to the input document. Now I want to index two word phrases and search the matching corpus documents (based on phrase frequencies) to the input

Re: Editing StopWordList

2010-12-21 Thread manjula wijewickrema
and then add your own words to it. You could then initialize the analyzer using this new stop set instead of the default stop set. Hope that helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi, 1) In my

Editing StopWordList

2010-12-20 Thread manjula wijewickrema
Hi, 1) In my application, I need to add more words to the stop word list. Therefore, is it possible to add more words into the default lucene stop word list? 2) If is it possible, then how can I do this? Appreciate any comment from you. Thanks, Manjula.

Re: Analyzer

2010-12-02 Thread manjula wijewickrema
directory to your project and maintaining your own grammar-based tokenizer. Best Erick On Tue, Nov 30, 2010 at 12:06 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi Steve, Thanx a lot for your reply. Yes there are only two classes and it's corrcet that the way you have realized

Analyzer

2010-11-29 Thread manjula wijewickrema
Hi, In my work, I am using Lucene and two java classes. In the first one, I index a document and in the second one, I try to search the most relevant document for the indexed document in the first one. In the first java class, I use the SnowballAnalyzer in the createIndex method and

Re: Analyzer

2010-11-29 Thread manjula wijewickrema
analysis, rather than StandardAnalyzer. Steve -Original Message- From: manjula wijewickrema [mailto:manjul...@gmail.com] Sent: Monday, November 29, 2010 4:32 AM To: java-user@lucene.apache.org Subject: Analyzer Hi, In my work, I am using Lucene and two java classes

Re: Databases

2010-07-28 Thread manjula wijewickrema
Hi, Thanks a lot for your information. Regards, Manjula. On Fri, Jul 23, 2010 at 12:48 PM, tarun sapra t.sapr...@gmail.com wrote: You can use HibernateSearch to maintain the synchronization between Lucene index and Mysql RDBMS. On Fri, Jul 23, 2010 at 11:16 AM, manjula wijewickrema

Databases

2010-07-22 Thread manjula wijewickrema
Hi, Normally, when I am building my index directory for indexed documents, I used to keep my indexed files simply in a directory called 'filesToIndex'. So in this case, I do not use any standar database management system such as mySql or any other. 1) Will it be possible to use mySql or any

Re: scoring and index size

2010-07-12 Thread manjula wijewickrema
Hi Koji, Thanks for your information Manjula On Fri, Jul 9, 2010 at 5:04 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/07/09 19:30), manjula wijewickrema wrote: Uwe, thanx for your comments. Following is the code I used in this case. Could you pls. let me know where I have to insert

Re: MaxFieldLength

2010-07-12 Thread manjula wijewickrema
with any MaxfieldLength 5,000. HTH Erick On Mon, Jul 12, 2010 at 4:00 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi, I have seen that, onece the field length of a document goes over a certain limit ( http://lucene.apache.org/java/2_9_3/api/all/org/apache/lucene/index

Re: Why not normalization?

2010-07-09 Thread manjula wijewickrema
Hi Rebecca, Thanks for your valuble comments. Yes I observed tha, once the number of terms of the goes up, fieldNorm value goes down correspondingly. I think, therefore there won't be any default due to the variation of total number of terms in the document. Am I right? Manjula. On Thu, Jul 8,

scoring and index size

2010-07-09 Thread manjula wijewickrema
Hi, I run a single programme to see the way of scoring by Lucene for single indexed document. The explain() method gave me the following results. *** Searching for 'metaphysics' Number of hits: 1 0.030706111 0.030706111 = (MATCH) fieldWeight(contents:metaphys in 0), product

Re: scoring and index size

2010-07-09 Thread manjula wijewickrema
removed stop words, so the norm is not what you exspect? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: manjula wijewickrema [mailto:manjul...@gmail.com] Sent: Friday, July 09, 2010 9:21 AM To: java

Re: Why not normalization?

2010-07-09 Thread manjula wijewickrema
Thanx On Fri, Jul 9, 2010 at 1:10 PM, Uwe Schindler u...@thetaphi.de wrote: Thanks for your valuble comments. Yes I observed tha, once the number of terms of the goes up, fieldNorm value goes down correspondingly. I think, therefore there won't be any default due to the variation of total

Re: Lucene Scoring

2010-07-07 Thread manjula wijewickrema
like System.out.println(indexSearcher.explain(query, 0)); See the javadocs for details. -- Ian. On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema manjul...@gmail.com wrote: Dear Grant, Thanks a lot for your guidence. As you have mentioned, I tried to use explain() method to get

Re: Lucene Scoring

2010-07-06 Thread manjula wijewickrema
(); Document document = hit.getDocument(); String path = document.get(*FIELD_PATH*); System.*out*.println(Hit: + path); } } } On Mon, Jul 5, 2010 at 7:46 PM, Grant Ingersoll gsing...@apache.org wrote: On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: Hi, In my application, I

Lucene Scoring

2010-07-05 Thread manjula wijewickrema
Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html and some

Re: How to get file names instead of paths?

2010-06-15 Thread manjula wijewickrema
. On Fri, Jun 11, 2010 at 11:20 AM, manjula wijewickrema manjul...@gmail.com wrote: Hi, Using the following programme I was able to get the entire file path of indexed files which matched with the given queries. But my intention is to get only the file names even without .txt extention

How to get file names instead of paths?

2010-06-11 Thread manjula wijewickrema
Hi, Using the following programme I was able to get the entire file path of indexed files which matched with the given queries. But my intention is to get only the file names even without .txt extention as I need to send these file names as labels to another application. So, pls. let me know how

Re: Arrange terms[i]

2010-05-25 Thread manjula wijewickrema
Dear Grant, Thanks for your reply. Manjula On Mon, May 24, 2010 at 4:37 PM, Grant Ingersoll gsing...@apache.orgwrote: On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote: Hi, I wrote aprogram to get the ferquencies and terms of an indexed document. The output comes as follows

Re: Problem of getTermFrequencies()

2010-05-20 Thread manjula wijewickrema
and freqs are arrays. Try terms[i] and freqs[i]. -- Ian. On Mon, May 17, 2010 at 12:23 PM, manjula wijewickrema manjul...@gmail.com wrote: Hi, I wrote a code with a view to display the indexed terms and get their term frequencies of a single document. Although it displys

Arrange terms[i]

2010-05-20 Thread manjula wijewickrema
Hi, I wrote aprogram to get the ferquencies and terms of an indexed document. The output comes as follows; If I print : +tfv[0] Output: array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1, sampl/1, term/4, test/1} In the same way I can print terms[i] and freqs[i], but the

Re: How to call high fre. terms using HighFreTerms class

2010-05-17 Thread manjula wijewickrema
the instructions here for getting the source: http://wiki.apache.org/lucene-java/HowToContribute HTH Erick On Sat, May 15, 2010 at 1:49 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi, I am struggling with using HighFreTerms class for the purpose of find high fre. terms in my index

Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Hi, I wrote a code with a view to display the indexed terms and get their term frequencies of a single document. Although it displys those terms in the index, it does not give the term frequencies. Instead it displays ' frequencies are:[...@80fa6f '. What's the reason for this. The code I have

Re: Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Dear Ian, I changed it as you said and now it is working nicely. Thanks a lot for your kind help. Manjula On Mon, May 17, 2010 at 6:46 PM, Ian Lea ian@gmail.com wrote: terms and freqs are arrays. Try terms[i] and freqs[i]. -- Ian. On Mon, May 17, 2010 at 12:23 PM, manjula

Re: Error of the code

2010-05-14 Thread manjula wijewickrema
() return? You don't appear to be doing anything with the String term in for ( String term : vector.getTerms() ) - presumably you intend to. -- Ian. On Thu, May 13, 2010 at 1:16 PM, manjula wijewickrema manjul...@gmail.com wrote: Dear Ian, Thanks a lot for your immediate reply. As you

Access indexed terms

2010-05-14 Thread manjula wijewickrema
Hi, Is it possible to put the indexed terms into an array in lucene. For example, imagine I have indexed a single document in Lucene and now I want to acces those terms in the index. Is it possible to retrieve (call) those terms as array elements? If it is possible, then how? Thanks, Manjula

Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
, Andrzej Bialecki a...@getopt.org wrote: On 2010-05-14 11:35, manjula wijewickrema wrote: Hi, Is it possible to put the indexed terms into an array in lucene. For example, imagine I have indexed a single document in Lucene and now I want to acces those terms in the index. Is it possible

Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
class in my code. But I was unable to find any guidence of how to do it? If you can pls. be kind enough to tell me how can I use this class in my code. Thanx Manjula On Fri, May 14, 2010 at 6:16 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-05-14 14:24, manjula wijewickrema wrote: Hi

How to call high fre. terms using HighFreTerms class

2010-05-14 Thread manjula wijewickrema
Hi, I am struggling with using HighFreTerms class for the purpose of find high fre. terms in my index. My target is to get the high frequency terms in an indexed document (single document). To do that I have added org.apache.lucene.misc package into my project. I think upto that point I am

Re: Class_for_HighFrequencyTerms

2010-05-13 Thread manjula wijewickrema
Orange -Original Message- From: manjula wijewickrema manjul...@gmail.com Date: Tue, 11 May 2010 15:13:12 To: java-user@lucene.apache.org Subject: Re: Class_for_HighFrequencyTerms Dear Erick, I lokked for it and even added IndexReader.java and TermFreqVector.java from http

Error of the code

2010-05-13 Thread manjula wijewickrema
Dear All, I am trying to get the term frequencies (through TermFreqVector) of a document (using Lucene 2.9.1). In order to do that I have used the following code. But there is a compile time error in the code and I can't figure it out. Could somebody can guide me what's wrong with it. Compile

Re: Error of the code

2010-05-13 Thread manjula wijewickrema
); with IndexReader ir = whatever(...); TermFreqVector vector = ir.getTermFreqVector(0, fieldname ); And you'll need to move it to after the writer.close() call if you want it to see the doc you've just added. -- Ian. On Thu, May 13, 2010 at 11:07 AM, manjula wijewickrema manjul

Re: Class_for_HighFrequencyTerms

2010-05-11 Thread manjula wijewickrema
at TermFreqVector? Best Erick On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi, If I index a document (single document) in Lucene, then how can I get the term frequencies (even the first and second highest occuring terms) of that document? Is there any

Re: Trace only exactly matching terms!

2010-05-10 Thread manjula wijewickrema
On Fri, May 7, 2010 at 2:22 PM, manjula wijewickrema manjul...@gmail.com wrote: Hi, I am using Lucene 2.9.1 . I have downloaded and run the 'HelloLucene.java' class by modifing the input document and user query in various ways. Once I put the document sentenses as 'Lucene

Class_for_HighFrequencyTerms

2010-05-10 Thread manjula wijewickrema
Hi, If I index a document (single document) in Lucene, then how can I get the term frequencies (even the first and second highest occuring terms) of that document? Is there any class/method to do taht? If anybody knows, pls. help me. Thanks Manjula

Trace only exactly matching terms!

2010-05-07 Thread manjula wijewickrema
Hi, I am using Lucene 2.9.1 . I have downloaded and run the 'HelloLucene.java' class by modifing the input document and user query in various ways. Once I put the document sentenses as 'Lucene in actions' insted of 'Lucene in action', and I gave the query as 'action' and run the programme. But it

Term/Phrase frequencies

2010-05-06 Thread manjula wijewickrema
Hi, I am new to Lucene. If I want to know the term or phrase frequency of an input document, will it be possible through Lucene? Thanks, Manjula