Re: Query regarding Lucene

2016-03-10 Thread Jack Krupansky
Are you calling the IndexSearcher#explain method to get the details of the
score calculation?

How exactly are your results not what you expect?

What Similarity are you using? Scores will be the product of the underlying
calculated scores and you term boost values.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 12:54 AM, Dwaipayan Roy 
wrote:

> Hello everyone,
>
> I am Dwaipayan, a research scholar from Indian Statistical Institute,
> Kolkata working in the field of Information Retrieval.
> For my research purpose, I use Lucene (4.10.4).
>
> Recently, I am facing a doubt regarding Lucene on how to boost the query
> term at the time of searching. Preciously, I am implementing a paper on
> query expansion (Relevance Based Language Model - Victor Lavrenko, Bruce
> Croft, SIGIR-2001). In the paper, the expanded query is formed with terms
> taken from the initially retrieved documents. The expansion terms are
> selected and weighted following a probability. Thus, the weight of the
> expansion terms are some probability values which are normalized to summed
> into one. This results into making the term weights a small fractional
> decimal value; e.g. for most of the cases, it is some where near to 0.1 if
> 10 expansion terms are added and the weight keeps on reducing if more
> expansion terms are considered.
> When I am using this fractional decimal value as the expansion term weight
> in Lucene BooleanQuery, I am not getting the expected result. I think the
> problem is with the weight that is applied with setBoost()of lucene boolean
> query. Exactly following the paper, I am setting these weights with those
> normalized probability values.
>
> Can anyone of you please help me out in this problem?
>
> Thanks,
> Dwaipayan Roy.
> Research Scholar
> Indian Statistical Institute
> Kolkata, India
>


Re: query regarding Lucene Indexing and searching

2014-03-02 Thread Jack Krupansky
Please elaborate on what you expect will be in this payload. Is it 
information derived from the indexing process itself or is it external 
information to be added to the indexed terms?


-- Jack Krupansky

-Original Message- 
From: Mrugendra

Sent: Sunday, March 2, 2014 5:15 AM
To: java-user@lucene.apache.org
Subject: query regarding Lucene Indexing and searching

Sir i am PG student, my research topic is to optimize the indexing file
[reduce index file size, RAM usage, CPU utilization, and create index with
payload to improve searching speed].

Currently working scope is Desktop search engine

1.i am using lucene for indexing the pdf files[indexing file name and
content]. after applying standard analyzer lucene index file size is 11 MB
for 1.77GB
and windows 8 windows.edb file size 42 MB for 1.77GB[Tested for windows
desktop environment]. So the space complexity is done.

How to do time complexity?

2. how to apply lemmatization with standard analyzer to reduce index file
size and ADD PAYLOAD during indexing.

3. from where i can find the test benchmark.

--
Regards

Rahevar Mrugendrasinh 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org