Re: problem in running lucene
_ 2176 Chiapa Dr., South Lake Tahoe, CA 96150 --Original Message-- From: Grant Ingersoll To: java-user@lucene.apache.org ReplyTo: java-user@lucene.apache.org Sent: Jan 24, 2009 4:17 PM Subject: Re: problem in running lucene Can you share the steps you have taken? The actual commands, that is. -Grant On Jan 24, 2009, at 2:33 PM, nitin gopi wrote: > Hello , I have recently started downloaded lucene. This is the first > time i > am using lucene.My project is to add LSI(Latent Semantic Indexing) > to the > indexing method of the lucene, to improve the indexing of documents. >I first want to index some webpages and see how does > search work > in lucene.The problem I am facing is that whenver i run lucene jar > file > through command prompt, i get error as "failed to load main-class > manifest > attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05. > Please help > me with this. > > Thanking You > Nitin -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Sent via BlackBerry from T-Mobile
Re: IndexReader.isDeleted
OK, interesting, thanks. What do you use the deletedDocs iterator for? Yes, MatchAllDocsQuery should soon be fixed to not use the synchronized IndexReader.isDeleted method internally: https://issues.apache.org/jira/browse/LUCENE-1316 Mike John Wang wrote: Mike: "We are considering replacing the current random-access IndexReader.isDeleted(int docID) method with an iterator & skipTo (DocIdSet) access that would let you iterate through the deleted docIDs, instead." This is exactly what we are doing. We do have to however, build the internal DocIdSet from isDeleted call. It would be great if this is provided thru the api. I am also assuming MatchAllDocsQuery is fixed to avoid isDeleted call? -John On Fri, Jan 23, 2009 at 12:25 PM, Michael McCandless < luc...@mikemccandless.com> wrote: We are considering replacing the current random-access IndexReader.isDeleted(int docID) method with an iterator & skipTo (DocIdSet) access that would let you iterate through the deleted docIDs, instead. At the same time we would move to a new API to replace IndexReader.document(int docID) that would no longer check whether the document is deleted. This is being discussed now under several Jira issues and on java-dev. Would this be a problem for any Lucene applications out there? How is isDeleted used today (outside of Lucene)? Normally an IndexSearcher would never return a deleted document, and so "in theory" a deleted docID should never "escape" Lucene's APIs. So I'm curious what applications in fact rely on isDeleted, and how that method is being used... Thanks, Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: why would a Field *vanish* from a Document?
rolaren...@earthlink.net wrote: Hey Mike -- Thanks for prompt & clear reply! This (the sneaky "difference" between an indexed Document and a the newly-created-at-search-time Document) is a frequent confusion with Lucene. The field needs to be marked as stored (Field.Store.YES) in order for it to appear in the retrieved document at search time. But, TokenStream fields cannot be stored since Lucene can't regenerate the original string for that field. OK, so the way I was trying could never work, I guess? No surprise really that the TokenStream cannot be re-accessed. I just had no clue what else to try ... Right. Since you are storing the term vector, you could retrieve that using IndexReader.getTermFreqVector. OK, didn't see that coming, but glad it did -- I have tried that, and indeed I can get the TermFreqVector for the Field in which I am interested, and it contains the same sort of data as were once in the TokenStream, all fine. Now I notice (from googling) that I can also downcast TermFreqVector to TermPositionVector, which contains the offsets (which I will need). So -- under what conditions would that cast fail? The cast fails if you had indexed the field with Field.TermVector.YES, which does not store positions nor offsets information. If you always index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or WITH_POSITIONS_OFFSETS, the cast will always succeed. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: problem in running lucene
Hello Sir, i downloaded lucene, then i went into the directory of jar file lucene-core-2.4.0.jar . I typed the command java -jar lucene-core-2.4.0.jar to run the jar file from command prompt. then the following error came "failed to load main-class manifest attribute from lucene-core-2.4.0.jar" . I want to index a web document and see the result after searching. Regards Nitin On Sun, Jan 25, 2009 at 5:47 AM, Grant Ingersoll wrote: > Can you share the steps you have taken? The actual commands, that is. > > -Grant > > > On Jan 24, 2009, at 2:33 PM, nitin gopi wrote: > > Hello , I have recently started downloaded lucene. This is the first time >> i >> am using lucene.My project is to add LSI(Latent Semantic Indexing) to the >> indexing method of the lucene, to improve the indexing of documents. >> I first want to index some webpages and see how does search work >> in lucene.The problem I am facing is that whenver i run lucene jar file >> through command prompt, i get error as "failed to load main-class manifest >> attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05. Please help >> me with this. >> >> Thanking You >> Nitin >> > > -- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: problem in running lucene
http://lucene.apache.org/java/docs/ Apache Lucene is a high-performance, full-featured text search engine ***library*** written entirely in Java. Lucene is a search engine library not an application. You cannot execute it, you have to write your own code using the Lucene library to index or to search documents. Have a look at this: http://wiki.apache.org/lucene-java/LuceneFAQ#head-fced767dd893d8828529074a26f99e0df7fe12ca Regards, Raf - Original Message - From: "nitin gopi" To: Sent: Sunday, January 25, 2009 1:57 PM Subject: Re: problem in running lucene Hello Sir, i downloaded lucene, then i went into the directory of jar file lucene-core-2.4.0.jar . I typed the command java -jar lucene-core-2.4.0.jar to run the jar file from command prompt. then the following error came "failed to load main-class manifest attribute from lucene-core-2.4.0.jar" . I want to index a web document and see the result after searching. Regards Nitin On Sun, Jan 25, 2009 at 5:47 AM, Grant Ingersoll wrote: Can you share the steps you have taken? The actual commands, that is. -Grant On Jan 24, 2009, at 2:33 PM, nitin gopi wrote: Hello , I have recently started downloaded lucene. This is the first time i am using lucene.My project is to add LSI(Latent Semantic Indexing) to the indexing method of the lucene, to improve the indexing of documents. I first want to index some webpages and see how does search work in lucene.The problem I am facing is that whenver i run lucene jar file through command prompt, i get error as "failed to load main-class manifest attribute from lucene-core-2.4.0.jar .I m using java 1.6.0_05. Please help me with this. Thanking You Nitin -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: why would a Field *vanish* from a Document?
>> Now I notice (from googling) that I can also downcast TermFreqVector >> to TermPositionVector, which contains the offsets (which I will need). >> >> So -- under what conditions would that cast fail? > >The cast fails if you had indexed the field with Field.TermVector.YES, >which does not store positions nor offsets information. If you always >index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or >WITH_POSITIONS_OFFSETS, the cast will always succeed. > OK, cool. I see in the javadocs for TermPositionVector that it "not necessarily contains both positions and offsets, but at least one of these arrays exists"; does it work like this, I think: TermVector.WITH_OFFSETS => TermVectorOffsetInfo[] always exists (so far, works for me) TermVector.WITH_POSITIONS => positions int[] always exists TermVector.WITH_POSITIONS_OFFSETS => both arrays always exist Right? And I guess the reason for using TermVector.WITH_POSITIONS => positions int[] is that it has a smaller memory footprint? thanks, Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: why would a Field *vanish* from a Document?
rolaren...@earthlink.net wrote: Now I notice (from googling) that I can also downcast TermFreqVector to TermPositionVector, which contains the offsets (which I will need). So -- under what conditions would that cast fail? The cast fails if you had indexed the field with Field.TermVector.YES, which does not store positions nor offsets information. If you always index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or WITH_POSITIONS_OFFSETS, the cast will always succeed. OK, cool. I see in the javadocs for TermPositionVector that it "not necessarily contains both positions and offsets, but at least one of these arrays exists"; does it work like this, I think: TermVector.WITH_OFFSETS => TermVectorOffsetInfo[] always exists (so far, works for me) TermVector.WITH_POSITIONS => positions int[] always exists TermVector.WITH_POSITIONS_OFFSETS => both arrays always exist Right. Right? And I guess the reason for using TermVector.WITH_POSITIONS => positions int[] is that it has a smaller memory footprint? Well, first: it's storing something different. Position is (by default) the term count, ie first term is position 0, next is position 1, etc. Whereas start/end offset are normally the character locations where each term started and ended. These are computed during analysis and stored into the index. Storing only positions gives a smaller index size than only offsets or positions plus offsets. The memory difference is typically a non-issue since an app normally doesn't store these instances around for a long time. Ie normally you pull them from the index, do something interesting, and let them go, during a search request. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Re: Where to download package org.apache.lucene.search.trie
Hi, You can use the artifact from Hudson as Mike told, but the JAR file is not compatible with Lucene 2.4 (because a new SortField constructor for sorting against trie encoded fields and the new Superinterface FieldCache.Parser leading to ClassNotFoundEx). If you want to use TrieRangeQuery/Filter, you must also update Lucene to the trunk version (so best is to download the whole snapshot build). Keep me informed how it works for you! How many documents do you plan to index using TrieUtils? The performance impact is immense for large indexes (see my notes). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Posted At: Sunday, January 25, 2009 10:15 PM > Posted To: Lucene-user > Conversation: Re: Where to download package org.apache.lucene.search.trie > Subject: Re: Where to download package org.apache.lucene.search.trie > > TrieRangeQuery/Filter are only available on Lucene's trunk, under > contrib in contrib/queries/*. You can either download a recent > nightly build, from here (click on a specific build, then click on > "Build Artifacts"): > >http://hudson.zones.apache.org/hudson/job/Lucene-trunk > > Or you can checkout Lucene's full sources and go from there: > >http://wiki.apache.org/lucene-java/SourceRepository > > Mike > > Zhibin Mai wrote: > > > Hi > > > > We try to use package org.apache.lucene.search.trie to support > > spatial index. Does anyone know whether it is ready, even just for > > trial, and where to download it? > > > > Thank you, > > > > Zhibin > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
cross-field AND queries with field boosting
Hi, We have documents with multiple fields conceptually, and a document is considered a match if each of the terms in the query is in any one of the fields(i.e a 'cross-field' AND). A simple way to do this would be to dump all of these conceptual fields into one lucene field and do the query with a default AND_OPERATOR. However another requirement is that some fields are more important than others and need to be boosted with different weights. One option that I can think of is a MultiFieldQuery that essentially looks like (field1:term1 OR field2:term1 OR field3:term1) AND (field1:term2 OR field2:term2 OR field3:term2) etc with appropriate field boosts. However I'm concerned about the performance of this query for a large number of terms(We might need to deal with 4-5 fields and 4-5 terms per query). Is there a better solution? Thanks, Murali