Re: problem in running lucene

2009-01-25 Thread garrod
_   












 2176 Chiapa Dr., South Lake Tahoe, 
CA 96150 
--Original Message--
From: Grant Ingersoll
To: java-user@lucene.apache.org
ReplyTo: java-user@lucene.apache.org
Sent: Jan 24, 2009 4:17 PM
Subject: Re: problem in running lucene

Can you share the steps you have taken?  The actual commands, that is.

-Grant

On Jan 24, 2009, at 2:33 PM, nitin gopi wrote:

> Hello , I have recently started downloaded lucene. This is the first  
> time i
> am using lucene.My project is to add LSI(Latent Semantic Indexing)  
> to the
> indexing method of the lucene, to improve the indexing of documents.
>I first want to index some webpages and see how does  
> search work
> in lucene.The problem I am facing is that whenver i run lucene jar  
> file
> through command prompt, i get error as "failed to load main-class  
> manifest
> attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05.  
> Please help
> me with this.
>
> Thanking You
> Nitin

--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Sent via BlackBerry from T-Mobile

Re: IndexReader.isDeleted

2009-01-25 Thread Michael McCandless


OK, interesting, thanks.  What do you use the deletedDocs iterator for?

Yes, MatchAllDocsQuery should soon be fixed to not use the  
synchronized IndexReader.isDeleted method internally:


https://issues.apache.org/jira/browse/LUCENE-1316

Mike

John Wang wrote:


Mike:
  "We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead."

 This is exactly what we are doing. We do have to however, build  
the
internal DocIdSet from isDeleted call. It would be great if this is  
provided

thru the api.

 I am also assuming MatchAllDocsQuery is fixed to avoid  
isDeleted call?


-John

On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:


We are considering replacing the current random-access
IndexReader.isDeleted(int docID) method with an iterator & skipTo
(DocIdSet) access that would let you iterate through the deleted
docIDs, instead.

At the same time we would move to a new API to replace
IndexReader.document(int docID) that would no longer check whether  
the

document is deleted.

This is being discussed now under several Jira issues and on
java-dev.

Would this be a problem for any Lucene applications out there?

How is isDeleted used today (outside of Lucene)?  Normally an
IndexSearcher would never return a deleted document, and so "in
theory" a deleted docID should never "escape" Lucene's APIs.

So I'm curious what applications in fact rely on isDeleted, and how
that method is being used...

Thanks,

Mike


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: why would a Field *vanish* from a Document?

2009-01-25 Thread Michael McCandless


rolaren...@earthlink.net wrote:


Hey Mike --

Thanks for prompt & clear reply!


This (the sneaky "difference" between an indexed Document and a the
newly-created-at-search-time Document) is a frequent confusion with
Lucene.

The field needs to be marked as stored (Field.Store.YES) in order for
it to appear in the retrieved document at search time.

But, TokenStream fields cannot be stored since Lucene can't  
regenerate

the original string for that field.

OK, so the way I was trying could never work, I guess? No surprise  
really that the TokenStream cannot be re-accessed. I just had no  
clue what else to try ...


Right.


Since you are storing the term vector, you could retrieve that using
IndexReader.getTermFreqVector.

OK, didn't see that coming, but glad it did -- I have tried that,  
and indeed I can get the TermFreqVector for the Field in which I am  
interested, and it contains the same sort of data as were once in  
the TokenStream, all fine.


Now I notice (from googling) that I can also downcast TermFreqVector  
to TermPositionVector, which contains the offsets (which I will need).


So -- under what conditions would that cast fail?


The cast fails if you had indexed the field with Field.TermVector.YES,  
which does not store positions nor offsets information.  If you always  
index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or  
WITH_POSITIONS_OFFSETS, the cast will always succeed.


Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: problem in running lucene

2009-01-25 Thread nitin gopi
Hello Sir, i downloaded lucene, then i went into the directory of jar file
lucene-core-2.4.0.jar . I typed the command java -jar lucene-core-2.4.0.jar
to run the jar file from command prompt. then the following error came
"failed to load main-class manifest attribute from lucene-core-2.4.0.jar" .
I want to index a web document and see the result after searching.

Regards
Nitin

On Sun, Jan 25, 2009 at 5:47 AM, Grant Ingersoll wrote:

> Can you share the steps you have taken?  The actual commands, that is.
>
> -Grant
>
>
> On Jan 24, 2009, at 2:33 PM, nitin gopi wrote:
>
>  Hello , I have recently started downloaded lucene. This is the first time
>> i
>> am using lucene.My project is to add LSI(Latent Semantic Indexing) to the
>> indexing method of the lucene, to improve the indexing of documents.
>>   I first want to index some webpages and see how does search work
>> in lucene.The problem I am facing is that whenver i run lucene jar file
>> through command prompt, i get error as "failed to load main-class manifest
>> attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05. Please help
>> me with this.
>>
>> Thanking You
>> Nitin
>>
>
> --
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: problem in running lucene

2009-01-25 Thread Raffaella Ventaglio

http://lucene.apache.org/java/docs/
Apache Lucene is a high-performance, full-featured text search engine 
***library*** written entirely in Java.


Lucene is a search engine library not an application.
You cannot execute it, you have to write your own code using the Lucene 
library to index or to search documents.


Have a look at this: 
http://wiki.apache.org/lucene-java/LuceneFAQ#head-fced767dd893d8828529074a26f99e0df7fe12ca



Regards,
Raf

- Original Message - 
From: "nitin gopi" 

To: 
Sent: Sunday, January 25, 2009 1:57 PM
Subject: Re: problem in running lucene



Hello Sir, i downloaded lucene, then i went into the directory of jar file
lucene-core-2.4.0.jar . I typed the command java -jar 
lucene-core-2.4.0.jar

to run the jar file from command prompt. then the following error came
"failed to load main-class manifest attribute from lucene-core-2.4.0.jar" 
.

I want to index a web document and see the result after searching.

Regards
Nitin

On Sun, Jan 25, 2009 at 5:47 AM, Grant Ingersoll 
wrote:



Can you share the steps you have taken?  The actual commands, that is.

-Grant


On Jan 24, 2009, at 2:33 PM, nitin gopi wrote:

 Hello , I have recently started downloaded lucene. This is the first 
time

i
am using lucene.My project is to add LSI(Latent Semantic Indexing) to 
the

indexing method of the lucene, to improve the indexing of documents.
  I first want to index some webpages and see how does search 
work

in lucene.The problem I am facing is that whenver i run lucene jar file
through command prompt, i get error as "failed to load main-class 
manifest
attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05. Please 
help

me with this.

Thanking You
Nitin



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: why would a Field *vanish* from a Document?

2009-01-25 Thread rolarenfan

>> Now I notice (from googling) that I can also downcast TermFreqVector  
>> to TermPositionVector, which contains the offsets (which I will need).
>>
>> So -- under what conditions would that cast fail?
>
>The cast fails if you had indexed the field with Field.TermVector.YES,  
>which does not store positions nor offsets information.  If you always  
>index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or  
>WITH_POSITIONS_OFFSETS, the cast will always succeed.
>
OK, cool. 

I see in the javadocs for TermPositionVector that it "not necessarily contains 
both positions and offsets, but at least one of these arrays exists"; does it 
work like this, I think: 

TermVector.WITH_OFFSETS => TermVectorOffsetInfo[] always exists (so far, works 
for me) 
TermVector.WITH_POSITIONS => positions int[] always exists
TermVector.WITH_POSITIONS_OFFSETS => both arrays always exist 

Right? And I guess the reason for using TermVector.WITH_POSITIONS => positions 
int[] is that it has a smaller memory footprint? 

thanks,
Paul 





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: why would a Field *vanish* from a Document?

2009-01-25 Thread Michael McCandless


rolaren...@earthlink.net wrote:




Now I notice (from googling) that I can also downcast TermFreqVector
to TermPositionVector, which contains the offsets (which I will  
need).


So -- under what conditions would that cast fail?


The cast fails if you had indexed the field with  
Field.TermVector.YES,
which does not store positions nor offsets information.  If you  
always

index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or
WITH_POSITIONS_OFFSETS, the cast will always succeed.


OK, cool.

I see in the javadocs for TermPositionVector that it "not  
necessarily contains both positions and offsets, but at least one of  
these arrays exists"; does it work like this, I think:


TermVector.WITH_OFFSETS => TermVectorOffsetInfo[] always exists (so  
far, works for me)

TermVector.WITH_POSITIONS => positions int[] always exists
TermVector.WITH_POSITIONS_OFFSETS => both arrays always exist


Right.

Right? And I guess the reason for using TermVector.WITH_POSITIONS =>  
positions int[] is that it has a smaller memory footprint?



Well, first: it's storing something different.  Position is (by  
default) the term count, ie first term is position 0, next is position  
1, etc.  Whereas start/end offset are normally the character locations  
where each term started and ended.  These are computed during analysis  
and stored into the index.


Storing only positions gives a smaller index size than only offsets or  
positions plus offsets.


The memory difference is typically a non-issue since an app normally  
doesn't store these instances around for a long time.  Ie normally you  
pull them from the index, do something interesting, and let them go,  
during a search request.


Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Re: Where to download package org.apache.lucene.search.trie

2009-01-25 Thread Uwe Schindler
Hi,

You can use the artifact from Hudson as Mike told, but the JAR file is not
compatible with Lucene 2.4 (because a new SortField constructor for sorting
against trie encoded fields and the new Superinterface FieldCache.Parser
leading to ClassNotFoundEx). If you want to use TrieRangeQuery/Filter, you
must also update Lucene to the trunk version (so best is to download the
whole snapshot build).

Keep me informed how it works for you! How many documents do you plan to
index using TrieUtils? The performance impact is immense for large indexes
(see my notes).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Posted At: Sunday, January 25, 2009 10:15 PM
> Posted To: Lucene-user
> Conversation: Re: Where to download package org.apache.lucene.search.trie
> Subject: Re: Where to download package org.apache.lucene.search.trie
> 
> TrieRangeQuery/Filter are only available on Lucene's trunk, under
> contrib in contrib/queries/*.  You can either download a recent
> nightly build, from here (click on a specific build, then click on
> "Build Artifacts"):
> 
>http://hudson.zones.apache.org/hudson/job/Lucene-trunk
> 
> Or you can checkout Lucene's full sources and go from there:
> 
>http://wiki.apache.org/lucene-java/SourceRepository
> 
> Mike
> 
> Zhibin Mai wrote:
> 
> > Hi
> >
> > We try to use package org.apache.lucene.search.trie to support
> > spatial index. Does anyone know whether it is ready, even just for
> > trial, and where to download it?
> >
> > Thank you,
> >
> > Zhibin
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



cross-field AND queries with field boosting

2009-01-25 Thread Muralidharan V
Hi,

We have documents with multiple fields conceptually, and a document is
considered a match if each of the terms in the query is in any one of the
fields(i.e a 'cross-field' AND). A simple way to do this would be to dump
all of these conceptual fields into one lucene field and do the query with a
default AND_OPERATOR. However another requirement is that some fields are
more important than others and need to be boosted with different weights.
One option that I can think of is a MultiFieldQuery that essentially looks
like (field1:term1 OR field2:term1 OR field3:term1) AND (field1:term2 OR
field2:term2 OR field3:term2) etc with appropriate field boosts. However I'm
concerned about the performance of this query for a large number of terms(We
might need to deal with 4-5 fields and 4-5 terms per query). Is there a
better solution?

Thanks,
Murali