Re: Apache Lucene v2.3.2
Probably depends on what you mean by supported. If you mean messages on this list, then yes, although be prepared for suggestions that you upgrade. If you mean bug fixes/code changes, I'd guess not. You really should upgrade ... -- Ian. On Tue, May 24, 2011 at 5:03 PM, Garry S Ditzler wrote: > Is Apache Lucene v2.3.2 still supported? > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
how to search multiple fields
Hi, Quite a few Lucene examples on lines shows how to insert multiple fields into a Document and how to query the indexed file with certain fields and queried text. I would like to know: 1. How to do a cross-field search? 2. How to specify some key fields as well as some less important fields? 3. How many fields would cause performance issue? Thanks!
Re: how to search multiple fields
> Quite a few Lucene examples on lines shows how to insert multiple fields > into a Document and how to query the indexed file with certain fields and > queried text. I would like to know: > > 1. How to do a cross-field search? http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F > 2. How to specify some key fields as well as some less important > fields? Boosting. See http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F > 3. How many fields would cause performance issue? Impossible to answer since there are too many variables but in general the fewer fields used in a search the faster it will be. There are many other factors, some of which are likely to outweigh this. See http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. -- Ian. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
dynamic frag size - highlighter
Hi, I'd like to make highlighting work as follows: length(all snippits) approx. 200 chars hl.snippits = 2 (2 snippits) e.g. if there is only 1 snippet available, length <= 200chars e.g. if there is >1 snippet, length each snippet == 100chars, so I take the first 2 and get 200 chars Is this possible with a custom fragmenter? Or does anyone know of any contrib fragmenter that might do this? Many thanks Dan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
dynamic frag size - highlighter
Hi, I'd like to make highlighting work as follows: length(all snippits) approx. 200 chars hl.snippits = 2 (2 snippits) e.g. if there is only 1 snippet available, length <= 200chars e.g. if there is >1 snippet, length each snippet == 100chars, so I take the first 2 and get 200 chars Is this possible with a custom fragmenter? Or does anyone know of any contrib fragmenter that might do this? Many thanks Dan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to search multiple fields
Hi lan, thanks. Still two questions. In the first link you presented, there is one comment that "Note that terms which occur in short fields have a higher effect on the result ranking." What does "short fields" mean? What are the differences between the impact of the short fields and that of the field boost? Cheng On Wed, May 25, 2011 at 6:20 PM, Ian Lea wrote: > > Quite a few Lucene examples on lines shows how to insert multiple fields > > into a Document and how to query the indexed file with certain fields and > > queried text. I would like to know: > > > > 1. How to do a cross-field search? > > > http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F > > > 2. How to specify some key fields as well as some less important > > fields? > > Boosting. See > http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F > > > 3. How many fields would cause performance issue? > > Impossible to answer since there are too many variables but in general > the fewer fields used in a search the faster it will be. There are > many other factors, some of which are likely to outweigh this. See > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. > > > -- > Ian. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: how to search multiple fields
> In the first link you presented, there is one comment that "Note that terms > which occur in short fields have a higher effect on the result ranking." > > What does "short fields" mean? This is a short sentence. This is a somewhat longer sentence that may get lower scores when matched by terms in a lucene query. > What are the differences between the impact > of the short fields and that of the field boost? It all feeds into oal.search.Similarity. Best to look at that or search for something like "lucene scoring" using your favourite search engine. -- Ian. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: is OpenBitSet / SortedVIntList compressed bit map index?
First Last wrote: > > Are there any other compressed bitmap index implementations which offer > bit > map compression at a decent performance assuming filters are sparse? > Have a look at EWAH by Daniel Lemire google: http://code.google.com/p/javaewah/ http://code.google.com/p/javaewah/ research paper: http://arxiv.org/abs/0901.3751 http://arxiv.org/abs/0901.3751 code: https://github.com/lemire/javaewah/tree/ https://github.com/lemire/javaewah/tree/ Gabriel -- View this message in context: http://lucene.472066.n3.nabble.com/is-OpenBitSet-SortedVIntList-compressed-bit-map-index-tp2213863p2983908.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Is there a limit on the size of the text for a single field?
Hi, I wonder if I can associate a text string of over 5MB with a single field. Thanks.
Re: Is there a limit on the size of the text for a single field?
Sure. See the javadocs for IndexWriter.setMaxFieldLength or LimitTokenCountAnalyzer if you are using 3.1.0. -- Ian. On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou wrote: > Hi, I wonder if I can associate a text string of over 5MB with a single > field. > > Thanks. > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Is there a limit on the size of the text for a single field?
thanks lan. On Wed, May 25, 2011 at 11:44 PM, Ian Lea wrote: > Sure. See the javadocs for IndexWriter.setMaxFieldLength or > LimitTokenCountAnalyzer if you are using 3.1.0. > > > -- > Ian. > > > On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou > wrote: > > Hi, I wonder if I can associate a text string of over 5MB with a single > > field. > > > > Thanks. > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
JobClient.runJob(job) in Fetcher.java
Hi, I notice that there are a few run() methods in Fetcher.java and that the following statement in Crawler.java calls the JobClient.runJob(job) in Fetcher.java. fetcher.fetch(segs[0], threads, org.apache.nutch.fetcher.Fetcher.isParsing(conf)); I would like to know which run() in Fetcher.java has been called by the above statetment. Thanks.
Passage retrieval with Lucene-based application
Hello! I am purchased "Lucene in Action", 2nd Ed., and posted the question below at the Manning Forum. Mike MCCandless suggested that I send it to you. Thanks in advance for your attention. the question I posted ___ I would like the search program to return with segments of a document ("paragraphs") that contain my search phrase, rather than simply pointers to the whole document. in searching among applications based upon the Lucene, I have found only one that seems to have this functionality. It is at http://www.crosswire.org/bibledesktop/ . Can someone point me to some other Lucene-based applications where the search engine returns text segments from within documents? Thanks in advance. N.B. I know Lucene can be modified to do what I wish. My problem is that my professional obligations do not allow the time for me to build the entire application that I need. Thus I am searching for one that exists already, that I can adapt quickly, and which has all the code with which I must surround Lucene to make a full-blown application. The Bible application I cite requires preprocessing of the documents into SWORD format. I will try that route if that is all that is available. I thought I would "look around" (with your help) before trying to take on the SWORD-format issue. Thanks. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Passage retrieval with Lucene-based application
https://issues.apache.org/jira/browse/LUCENE-1522 On Wed, May 25, 2011 at 3:46 PM, Leroy Stone wrote: > document ("paragraphs") that contain my search phrase, rather than simply > pointers to the whole document. in searching among applications based upon > the Lucene, I have found only one that seems to have this functionality. It > is at http://www.crosswire.org/bibledesktop/ . Can someone point me to some > other Lucene-based applications where the search engine returns text > segments from within documents? > Thanks in advance. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Passage retrieval with Lucene-based application
Hi Leroy, Would it make sense to index as Lucene documents the unit to be searched? So if you want paragraphs to be shown in search results, you could parse the source document during indexing into paragraphs and index them as separate Lucene documents. -sujit On Wed, 2011-05-25 at 15:46 -0400, Leroy Stone wrote: > Hello! > I am purchased "Lucene in Action", 2nd Ed., and posted the > question below at the Manning Forum. Mike MCCandless suggested that I > send it to you. > > Thanks in advance for your attention. > > the question I posted ___ > I would like the search program to return with segments of a document > ("paragraphs") that contain my search phrase, rather than simply > pointers to the whole document. in searching among applications based > upon the Lucene, I have found only one that seems to have this > functionality. It is at http://www.crosswire.org/bibledesktop/ . Can > someone point me to some other Lucene-based applications where the > search engine returns text segments from within documents? > Thanks in advance. > > > N.B. I know Lucene can be modified to do what I wish. My problem is > that my professional obligations do not allow the time for me to > build the entire application that I need. Thus I am searching for > one that exists already, that I can adapt quickly, and which has all > the code with which I must surround Lucene to make a full-blown > application. > > The Bible application I cite requires preprocessing of the documents > into SWORD format. I will try that route if that is all that is > available. I thought I would "look around" (with your help) before > trying to take on the SWORD-format issue. > > > Thanks. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org