Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput

2006-01-02 Thread Erik Hatcher
I haven't checked the specifics, but many of the contrib (the  
"sandbox" is the old name for it) projects have upgraded their latest  
code to be against the trunk of Lucene, which is destined to be  
Lucene 1.9.  You'll need to either grab a previous JAR built before  
the codebase changed, or upgrade yourself to the trunk of Lucene's  
subversion repository all the way around.


Erik


On Dec 31, 2005, at 10:21 AM, Colin Young wrote:

I'm attempting to compile Lucene with some sandbox code --  
specifically
the Berkely DB index storage -- and I'm running into and issue  
where the

code is attempting to import IndexInput (apparently located in
org.apache.lucene.store.IndexInput) but I can't find it in the source
anywhere. I'm not sure if the sandbox code is maybe using a more  
recent

version of the Lucene code, or if I'm missing something obvious. My
personaly guess is that it's the latter.

I'm using Lucene 1.4.3 source and the db directory from the source
repository at the apache site.

Thanks for any tips.

Colin


Notice: This email message is for the sole use of the intended  
recipient(s) and may contain confidential and privileged  
information. Any unauthorized review, use, disclosure or  
distribution is prohibited. If you are not the intended recipient,  
please contact the sender by reply email and destroy all copies of  
the original message.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



My first question in 2006 :D

2006-01-02 Thread Daniel Cortes

Hello everybody and happy new year!
My first question about lucene in 2006 is the next:

What I have to do with the message "No tvx file". Every night I have to 
do a complete indexation proces of a forum in phpBB.
For example in an indexation of 93 documents (posts in Forum phpBB) i 
see 4 messages of No tvx file in my logs.

The called that produce the message is this(contents is an string):
   Field CONTENTS = Field.UnStored("CONTENTS",contents,true);
   CONTENTS.setBoost((float) 0.2);
   lucene_doc.add(CONTENTS);

The problem is when I'm working with an active forum that I can obtain 
near of 200 message of "No tvx file"

My index is setCompoundFile(true);

What I do wrong? Or what can I do to not obtain this messages?

Thks for any reply


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: My first question in 2006 :D

2006-01-02 Thread Bernhard Messer

Daniel,

you can simply ignore this message. It only says that you have term 
vectors enabled and add one ore more "empty" documents without a body. 
If you don't need term vectors for any special operations on index 
terms, switch this feature off.


Bernhard

Daniel Cortes wrote:


Hello everybody and happy new year!
My first question about lucene in 2006 is the next:

What I have to do with the message "No tvx file". Every night I have 
to do a complete indexation proces of a forum in phpBB.
For example in an indexation of 93 documents (posts in Forum phpBB) i 
see 4 messages of No tvx file in my logs.

The called that produce the message is this(contents is an string):
   Field CONTENTS = Field.UnStored("CONTENTS",contents,true);
   CONTENTS.setBoost((float) 0.2);
   lucene_doc.add(CONTENTS);

The problem is when I'm working with an active forum that I can obtain 
near of 200 message of "No tvx file"

My index is setCompoundFile(true);

What I do wrong? Or what can I do to not obtain this messages?

Thks for any reply


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Problems with sandbox - can't find org.apache.lucene.store.IndexInput

2006-01-02 Thread Colin Young
That would probably explain things. Is 1.9 close, or are we still
talking months aways? Unfortunately, what I'm trying to do is use the
code for Berkeley DB Java Edition which, best as I can tell was only
ported against the 1.9 code, so it looks like my choices are to do the
port myself, or check out 1.9 to see what the current issues and and see
how stable it is for my purposes.

Thanks

Colin Young

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: 2 January, 2006 05:12
To: java-user@lucene.apache.org
Subject: Re: Problems with sandbox - can't find
org.apache.lucene.store.IndexInput

I haven't checked the specifics, but many of the contrib (the "sandbox"
is the old name for it) projects have upgraded their latest code to be
against the trunk of Lucene, which is destined to be Lucene 1.9.  You'll
need to either grab a previous JAR built before the codebase changed, or
upgrade yourself to the trunk of Lucene's subversion repository all the
way around.

Erik


On Dec 31, 2005, at 10:21 AM, Colin Young wrote:

> I'm attempting to compile Lucene with some sandbox code -- 
> specifically the Berkely DB index storage -- and I'm running into and 
> issue where the code is attempting to import IndexInput (apparently 
> located in
> org.apache.lucene.store.IndexInput) but I can't find it in the source 
> anywhere. I'm not sure if the sandbox code is maybe using a more 
> recent version of the Lucene code, or if I'm missing something 
> obvious. My personaly guess is that it's the latter.
>
> I'm using Lucene 1.4.3 source and the db directory from the source 
> repository at the apache site.
>
> Thanks for any tips.
>
> Colin
>
>
> Notice: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. 
> Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient, please contact the 
> sender by reply email and destroy all copies of the original message.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Notice: This email message is for the sole use of the intended recipient(s) and 
may contain confidential and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problems with sandbox - can't find org.apache.lucene.store.IndexInput

2006-01-02 Thread Erik Hatcher

Trunk of Lucene is very stable, more so than 1.4.3 I've heard.

Is 1.9 release close?  Hard to even say.  It could be.  No  
substantial changes to the trunk before 1.9 is officially released  
are planned that I know of.


Erik


On Jan 2, 2006, at 3:51 PM, Colin Young wrote:


That would probably explain things. Is 1.9 close, or are we still
talking months aways? Unfortunately, what I'm trying to do is use the
code for Berkeley DB Java Edition which, best as I can tell was only
ported against the 1.9 code, so it looks like my choices are to do the
port myself, or check out 1.9 to see what the current issues and  
and see

how stable it is for my purposes.

Thanks

Colin Young

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 2 January, 2006 05:12
To: java-user@lucene.apache.org
Subject: Re: Problems with sandbox - can't find
org.apache.lucene.store.IndexInput

I haven't checked the specifics, but many of the contrib (the  
"sandbox"

is the old name for it) projects have upgraded their latest code to be
against the trunk of Lucene, which is destined to be Lucene 1.9.   
You'll
need to either grab a previous JAR built before the codebase  
changed, or
upgrade yourself to the trunk of Lucene's subversion repository all  
the

way around.

Erik


On Dec 31, 2005, at 10:21 AM, Colin Young wrote:


I'm attempting to compile Lucene with some sandbox code --
specifically the Berkely DB index storage -- and I'm running into and
issue where the code is attempting to import IndexInput (apparently
located in
org.apache.lucene.store.IndexInput) but I can't find it in the source
anywhere. I'm not sure if the sandbox code is maybe using a more
recent version of the Lucene code, or if I'm missing something
obvious. My personaly guess is that it's the latter.

I'm using Lucene 1.4.3 source and the db directory from the source
repository at the apache site.

Thanks for any tips.

Colin


Notice: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Notice: This email message is for the sole use of the intended  
recipient(s) and may contain confidential and privileged  
information. Any unauthorized review, use, disclosure or  
distribution is prohibited. If you are not the intended recipient,  
please contact the sender by reply email and destroy all copies of  
the original message.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Problems with sandbox - can't find org.apache.lucene.store.IndexInput

2006-01-02 Thread Colin Young
That's good enough for me. At this point, going with a reasonably stable
branch rather than using my code appears to be the more conservative
option considering our release timeframe (which allows for extensive
testing).

Thanks for the help (and the excellent book).

Colin

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: 2 January, 2006 21:03
To: java-user@lucene.apache.org
Subject: Re: Problems with sandbox - can't find
org.apache.lucene.store.IndexInput

Trunk of Lucene is very stable, more so than 1.4.3 I've heard.

Is 1.9 release close?  Hard to even say.  It could be.  No substantial
changes to the trunk before 1.9 is officially released are planned that
I know of.

Erik


On Jan 2, 2006, at 3:51 PM, Colin Young wrote:

> That would probably explain things. Is 1.9 close, or are we still 
> talking months aways? Unfortunately, what I'm trying to do is use the 
> code for Berkeley DB Java Edition which, best as I can tell was only 
> ported against the 1.9 code, so it looks like my choices are to do the

> port myself, or check out 1.9 to see what the current issues and and 
> see how stable it is for my purposes.
>
> Thanks
>
> Colin Young
>
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: 2 January, 2006 05:12
> To: java-user@lucene.apache.org
> Subject: Re: Problems with sandbox - can't find 
> org.apache.lucene.store.IndexInput
>
> I haven't checked the specifics, but many of the contrib (the 
> "sandbox"
> is the old name for it) projects have upgraded their latest code to be
> against the trunk of Lucene, which is destined to be Lucene 1.9.   
> You'll
> need to either grab a previous JAR built before the codebase changed, 
> or upgrade yourself to the trunk of Lucene's subversion repository all

> the way around.
>
>   Erik
>
>
> On Dec 31, 2005, at 10:21 AM, Colin Young wrote:
>
>> I'm attempting to compile Lucene with some sandbox code -- 
>> specifically the Berkely DB index storage -- and I'm running into and

>> issue where the code is attempting to import IndexInput (apparently 
>> located in
>> org.apache.lucene.store.IndexInput) but I can't find it in the source

>> anywhere. I'm not sure if the sandbox code is maybe using a more 
>> recent version of the Lucene code, or if I'm missing something 
>> obvious. My personaly guess is that it's the latter.
>>
>> I'm using Lucene 1.4.3 source and the db directory from the source 
>> repository at the apache site.
>>
>> Thanks for any tips.
>>
>> Colin
>>
>>
>> Notice: This email message is for the sole use of the intended
>> recipient(s) and may contain confidential and privileged information.
>> Any unauthorized review, use, disclosure or distribution is 
>> prohibited. If you are not the intended recipient, please contact the

>> sender by reply email and destroy all copies of the original message.
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> Notice: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. 
> Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient, please contact the 
> sender by reply email and destroy all copies of the original message.
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Notice: This email message is for the sole use of the intended recipient(s) and 
may contain confidential and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Scoring

2006-01-02 Thread Harini Raghavan
Thank you Chris. That seems like a good suggestion. I will try to pass a 
different Query object to the Highlighter api that the one used for 
searching.


I plan to break down the HTML document and store the title/sub 
title/content in different fields of the index. So if I create a new 
query comparing company name and keywords against title and content 
fields, then I am assuming that highlighter api will give a higher 
ranking to the fragment where both terms of the query match against 
those fragments where just one term(either title or content) matches. I 
am assuming that even if I do not increase the boost factor of any of 
the terms, the api will take care of this ranking.
This is my understanding of the scoring/ranking algorithm. Any comments 
anyone?


Thanks,
Harini

Chris Hostetter wrote:


: My requirement is to show the relevant fragments of the news article for
: each company along with the search results. But the highlighter api
: sometimes picks up the fragments which are not so relevant to the news
: article/company. I would like to know if there is anyway that I can
: modify the scoring/ranking of these fragments in such a way that the
: news items in which a company name & keywords in the headline gets
: assigned a very strong relevancy ranking,  closely followed by a company
: name mention in the first paragraph and a  multiple-mention within the
: entire story. Something like headline =   5 points,  first paragraph =
: four, etc.

Well, the sample query you mentioned isn't checking any company names, or
doing anything with a "keywords" field.  I'm not to familiar with the way
the highlighter package works, but i imagine that with the types of
queries you said you are using, if you are highlighting the "Content"
field, the CompanyId and the FilingDate clauses of your query will be
fairly irelevent (becuase they are numbers, not because they are different
field names)

An idea i've suggested before (but i don't remember if anyone ever said
wether it is a viable use of the Highlighter or not) is to give the
highlighter a completely different Query object then the one you used to
get your search results.

ie, if you search query (what you want used to compute score) is...

 +(CompanyId:10 CompanyId:20) Content:"cost saving" Content:outsource

...but once you've gotten those results, what you really care about is
highlighting the name of the company, and you think the best fragments
when those company names appear near the other words, then give the
highlighter a query that looks like...

 "companyname10 cost savings"~20 "companyname20 outsource"~20 ...etc



: >>> Here is the search query(BooleanQuery) I am passing to the
: >>> IndexSearcher
: >>> and QueryScorer:
: >>> +DocumentType:news
: >>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
: >>> +FilingDate:[20041201 TO 20051201]
: >>> +(Content:"cost saving" Content:"cost savings" Content:outsource
: >>> Content:outsources Content:downsize Content:downsizes
: >>> Content:restructuring Content:restructure)



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]