Scoring issue

2008-11-26 Thread AlexElba

Hello ,
I have two document in my lucene index

Document
stored/uncompressed
stored/uncompressed,indexed,tokenized stored/uncompressed>

Document
stored/uncompressed
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized
stored/uncompressed,indexed,tokenized>

and I am  searching for +tagKey:hot +tagKey:dog

which is exact match for 2nd document, but I am getting 1.0 score for first
document and 0.7 for second one.

I have custom similarity where  lengthNorm is (1.0 / tokenCount) others are
some consents 

why my first document is getting higher score?
-- 
View this message in context: 
http://www.nabble.com/Scoring-issue-tp20707410p20707410.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: how to search for starts with multiple words in lucene

2008-11-26 Thread AlexElba

Hi,
I think you can achieve your goal using StandardAnalyzer during indexing and
for search, and use WildcardQuery for Query I think it will work!!


naveen.a wrote:
> 
> Hi,
> 
> Below is a document in lucene
> -
> Field   Value
> -
> ID:1
> 110_a:library and information
> -
> I need to search for starts with logic, below are the search cases for the
> above document
> 
> --
> Query Result
> --
> 110_a:l*   ID - 1
> 110_a:library*   ID - 1
> 110_a:library *  No Results
> 110_a:library a*No Results
> 110_a:"library a*"  No Results
> --
> here, if i apply single word for starts with search, it is found,
> but if i add any space after the first word, it is not found
> 
> so, how to apply the query to search for starts with multiple words
> 

-- 
View this message in context: 
http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20707534.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lunene 2.3-2.4 switch: Scoring change

2009-01-29 Thread AlexElba

Hello,
I have project which I am trying to switch from lucene 2.3.2 to 2.4 I am
getting some strange scores

Before my code was:

Hits hits= searcher.search(query);
Float score = hits.score(1)

and scores from hist was from 0-1; 1 was 100% match

I change code to use hit collector

  TopDocCollector collector = new TopDocCollector(99);
 searcher.search(query, collector);
 ScoreDocs[] hits= collector.topDocs().scoreDocs;
 int docId = hits[1].doc;
 Document document = searcher.doc(docId);
Float score = hits[1].score

The scores from this class are from 2-12.5 for the same query.

How to change my scores to old way?

  



-- 
View this message in context: 
http://www.nabble.com/Lunene-2.3-2.4-switch%3A-Scoring-change-tp21739867p21739867.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-03 Thread AlexElba

Hello,

I was using lucene 2.3.2 with hits and switch to lucene 2.4.0 and now I am
using TopDocCollector.

I have two queries which are running against the same index.
One query is returning 80bytes information other one is returning 2000bytes

With old Hits the query which was returning smaller data was faster which
has bigger data was slower.
After I change to TopDocCollector both big and small once returning same
time. 

Searcher is exactly the same and queries are the same only difference is in
one place I was using Hits in other TopDocCollector 

Who has any idea why, and how can I fix this?
-- 
View this message in context: 
http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p21822877.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lunene 2.3-2.4 switch: Scoring change

2009-02-18 Thread AlexElba



AlexElba wrote:
> 
> Hello,
> I have project which I am trying to switch from lucene 2.3.2 to 2.4 I am
> getting some strange scores
> 
> Before my code was:
> 
> Hits hits= searcher.search(query);
> Float score = hits.score(1)
> 
> and scores from hist was from 0-1; 1 was 100% match
> 
> I change code to use hit collector
> 
>   TopDocCollector collector = new TopDocCollector(99);
>  searcher.search(query, collector);
>  ScoreDocs[] hits= collector.topDocs().scoreDocs;
>  int docId = hits[1].doc;
>  Document document = searcher.doc(docId);
> Float score = hits[1].score
> 
> The scores from this class are from 2-12.5 for the same query.
> 
> How to change my scores to old way?
> 
>   
> 
> 
> 
> 


I fix the problem. The prblem was there queue and pushing and poping. After
some optimization of the TopDocCollector it got faster
-- 
View this message in context: 
http://www.nabble.com/Lunene-2.3-2.4-switch%3A-Scoring-change-tp21739867p22092512.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: TopDocCollector vs Hits: TopDocCollector slowing....

2009-02-18 Thread AlexElba



Grant Ingersoll-6 wrote:
> 
> I presume they are both now slower, right?  Otherwise you wouldn't  
> mind the speedup on the bigger one.  Hits did caching and prefetched  
> things, which has it's tradeoffs.  Can you describe how you were  
> measuring the queries?  How many results were you getting?
> 
> 
> 
> -Grant
> 
> On Feb 3, 2009, at 8:37 PM, AlexElba wrote:
> 
>>
>> Hello,
>>
>> I was using lucene 2.3.2 with hits and switch to lucene 2.4.0 and  
>> now I am
>> using TopDocCollector.
>>
>> I have two queries which are running against the same index.
>> One query is returning 80bytes information other one is returning  
>> 2000bytes
>>
>> With old Hits the query which was returning smaller data was faster  
>> which
>> has bigger data was slower.
>> After I change to TopDocCollector both big and small once returning  
>> same
>> time.
>>
>> Searcher is exactly the same and queries are the same only  
>> difference is in
>> one place I was using Hits in other TopDocCollector
>>
>> Who has any idea why, and how can I fix this?
>> -- 
>> View this message in context:
>> http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p21822877.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 


I fix the problem. The problem was there queue and pushing and poping they
had. After some optimization of the TopDocCollector it got much faster
-- 
View this message in context: 
http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p22092548.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene SnowBall unexpected behavior for some terms

2009-04-10 Thread AlexElba

Hello,
I was working with lucene snowball 2.3.2 and I switch to 2.4.0. 
After switch I came by to some case where lucene doesn't do lemmatization
correctly. So far I found only one case spa - spas. spas are not getting
lemmatize at all...
BTW I saw the same behavior on solr 1.3


Anybody have any idea why?
-- 
View this message in context: 
http://www.nabble.com/Lucene-SnowBall-unexpected-behavior-for-some-terms-tp22991689p22991689.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene SnowBall unexpected behavior for some terms

2009-04-16 Thread AlexElba

I look thru source code for snowball. I think this bug does exist in previous
version as well I asked in there mailing list no response so far. 
This is there demo page it has the same issue
http://snowball.tartarus.org/demo.php

I was trying to find there pattern for words which will not get lemmatized.
So far no success. 
-- 
View this message in context: 
http://www.nabble.com/Lucene-SnowBall-unexpected-behavior-for-some-terms-tp22991689p23088274.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Best way for paging with TopDocs class?

2009-04-16 Thread AlexElba

Why you don't extend to HitCollector  and put all logic you need into it?




Ivan Vasilev-2 wrote:
> 
> Hi All,
> 
> As Hits class was deprecated in current Lucene and is expected to be 
> excluded from Lucene 3.0 we decided to change our code so that to use 
> TopDocs class.
> Our app provides paging and now we are uondering what is the bset way to 
> do it with th TopDocs. I can see only this possibility:
> 1. User opens page 1 - we load by searcher.search(..., docNum, ... ) 
> method as many docs as for page 1;
> 2. User opens page 2 - we load as many results as the amount for page 1 
> and page 2 (note that docs for page 1 are loaded again);
> ...
> N. User opens page n - we load as many docs as the amount of all pages 
> from #1 to #N (note that page 1 docs were loaded N-1 times, page 2 docs 
> N-2 times etc).
> 
> With Hits class this loading of documents of previous pages was avoided 
> - they were loaded once and when needed docs for the next page Hits just 
> loaded the next portion of docs without reloading the previous pages.
> 
> So my question is:
> Is there better way for paging with the class TopDocs than the one that 
> I describe here?
> 
> Thanks in Advance,
> Ivan
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Best-way-for-paging-with-TopDocs-class--tp23079735p23088509.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Appropriate analyzer

2009-04-21 Thread AlexElba


try to use RegexQuery



Artyom Sokolov wrote:
> 
> Hello.
> 
> Currently I'm trying to find something like an analyzer to solve the
> problem.
> 
> Actually, what I need is next: search on a query string step-by-step,
> trimming last char on each step. Small example:
> 
> In index we've: abc, abcdef, xyz
> When search on abcdefgh the most relevant result should be abcdef, while
> searching on abcde the best one is abc.
> 
> Thanks.
> 
> Sincerely,
> Artyom Sokolov
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Appropriate-analyzer-tp23164855p23166323.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Judge

2009-06-25 Thread AlexElba

Hello,
I was looking to Judge interface with TrecJudge implementation and I am not
clear how to use it.
What data do I need to pass into constructor.

Anybody have any experience with this class?



Thanks,
Alex
-- 
View this message in context: 
http://www.nabble.com/Lucene-Judge-tp24209288p24209288.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RangeFilter

2010-01-13 Thread AlexElba

Hello,

I am currently using lucene 2.4 and have document with 3 fields

id  
name 
rank

and have query and filter when I am trying to use rang filter on rank I am
not getting any result back

RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true);

I have documents which are in this interval 


Any suggestion what am I doing wrong?

Regards




-- 
View this message in context: 
http://old.nabble.com/RangeFilter-tp27148785p27148785.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: RangeFilter

2010-01-13 Thread AlexElba

Thanks Steve.

Mike for now I can not upgrade... 
-- 
View this message in context: 
http://old.nabble.com/RangeFilter-tp27148785p27151315.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: RangeFilter

2010-01-13 Thread AlexElba

Hello,

I change filter to follow
  RangeFilter rangeFilter = new RangeFilter(
   "rank", NumberTools
.longToString(rating), NumberTools
.longToString(10), true, true);

and change index to store rank the same way... But still not seeing :( any
results 


AlexElba wrote:
> 
> Hello,
> 
> I am currently using lucene 2.4 and have document with 3 fields
> 
> id  
> name 
> rank
> 
> and have query and filter when I am trying to use rang filter on rank I am
> not getting any result back
> 
> RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true);
> 
> I have documents which are in this interval 
> 
> 
> Any suggestion what am I doing wrong?
> 
> Regards
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/RangeFilter-tp27148785p27155102.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: RangeFilter

2010-01-14 Thread AlexElba

Did you completely re-index?
Yes I did 

Here is method which creates index



public void write(List data, Directory directory, Analyzer
analyzer) {
IndexWriter  indexWriter = new IndexWriter(directory, analyzer,
MaxFieldLength.LIMITED);

try {
for (Object[] obj: data) {
try {
Document document = new Document();
Field field = new Field("id", obj[0]
document.add(field);
Field rank = new Field("rank", 
NumberTools
.longToString(Long.valueOf(obj[3])), Store.NO,
Index.ANALYZED_NO_NORMS);
document.add(rank);
indexWriter.addDocument(document);
} catch (CorruptIndexException e) {

} catch (IOException e) {

}
}
} finally {
try {
indexWriter.commit();
} catch (CorruptIndexException e) {
  
} catch (IOException e) {
  }
}
}


Yeap I am using luke but this app is ram base index...



Steven A Rowe wrote:
> 
> Hi AlexElba,
> 
> Did you completely re-index?
> 
> If you did, then there is some other problem - can you share (more of)
> your code?
> 
> Do you know about Luke?  It's an essential tool for Lucene index
> debugging:
> 
>http://www.getopt.org/luke/
> 
> Steve
> 
> On 01/13/2010 at 8:34 PM, AlexElba wrote:
>> 
>> Hello,
>> 
>> I change filter to follow
>>   RangeFilter rangeFilter = new RangeFilter(
>>"rank", NumberTools
>> .longToString(rating), NumberTools
>> .longToString(10), true, true);
>> 
>> and change index to store rank the same way... But still not seeing :(
>> any results
>> 
>> 
>> AlexElba wrote:
>> > 
>> > Hello,
>> > 
>> > I am currently using lucene 2.4 and have document with 3 fields
>> > 
>> > id
>> > name
>> > rank
>> > 
>> > and have query and filter when I am trying to use rang filter on rank I
>> > am not getting any result back
>> > 
>> > RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true,
>> > true);
>> > 
>> > I have documents which are in this interval
>> > 
>> > 
>> > Any suggestion what am I doing wrong?
>> > 
>> > Regards
>> > 
>> > 
>> > 
>> > 
>> > 
>> 
>> -- View this message in context: http://old.nabble.com/RangeFilter-
>> tp27148785p27155102.html Sent from the Lucene - Java Users mailing list
>> archive at Nabble.com.
>> 
>> 
>> - To
>> unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For
>> additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/RangeFilter-tp27148785p27166330.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene search for OR

2008-08-14 Thread AlexElba

Hello I am trying to search for or(Oregon) even when it is not capitalized it
is not returning any results.

How to search for 'or' ?
-- 
View this message in context: 
http://www.nabble.com/Lucene-search-for-OR-tp18990623p18990623.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]