Re: Apache Lucene v2.3.2

2011-05-25 Thread Ian Lea
Probably depends on what you mean by supported.  If you mean messages
on this list, then yes, although be prepared for suggestions that you
upgrade.  If you mean bug fixes/code changes, I'd guess not.

You really should upgrade ...


--
Ian.


On Tue, May 24, 2011 at 5:03 PM, Garry S Ditzler  wrote:
> Is Apache Lucene v2.3.2 still supported?
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



how to search multiple fields

2011-05-25 Thread zhoucheng2008
Hi,

 

Quite a few Lucene examples on lines shows how to insert multiple fields
into a Document and how to query the indexed file with certain fields and
queried text. I would like to know:

 

1.   How to do a cross-field search?

2.   How to specify some key fields as well as some less important
fields?

3.   How many fields would cause performance issue?

 

Thanks!



Re: how to search multiple fields

2011-05-25 Thread Ian Lea
> Quite a few Lucene examples on lines shows how to insert multiple fields
> into a Document and how to query the indexed file with certain fields and
> queried text. I would like to know:
>
> 1.       How to do a cross-field search?

http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F

> 2.       How to specify some key fields as well as some less important
> fields?

Boosting.  See 
http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F

> 3.       How many fields would cause performance issue?

Impossible to answer since there are too many variables but in general
the fewer fields used in a search the faster it will be.  There are
many other factors, some of which are likely to outweigh this.  See
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.


--
Ian.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



dynamic frag size - highlighter

2011-05-25 Thread dan sutton
Hi,

I'd like to make highlighting work as follows:

length(all snippits) approx. 200 chars
hl.snippits = 2 (2 snippits)

e.g. if there is  only 1 snippet available, length <= 200chars
e.g. if there is >1 snippet, length each snippet == 100chars, so I
take the first 2 and get 200 chars

Is this possible with a custom fragmenter?

Or does anyone know of any contrib fragmenter that might do this?

Many thanks
Dan

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



dynamic frag size - highlighter

2011-05-25 Thread dan sutton
Hi,

I'd like to make highlighting work as follows:

length(all snippits) approx. 200 chars
hl.snippits = 2 (2 snippits)

e.g. if there is  only 1 snippet available, length <= 200chars
e.g. if there is >1 snippet, length each snippet == 100chars, so I
take the first 2 and get 200 chars

Is this possible with a custom fragmenter?

Or does anyone know of any contrib fragmenter that might do this?

Many thanks
Dan

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to search multiple fields

2011-05-25 Thread Cheng Zhou
Hi lan, thanks. Still two questions.

In the first link you presented, there is one comment that "Note that terms
which occur in short fields have a higher effect on the result ranking."

What does "short fields" mean? What are the differences between the impact
of the short fields and that of the field boost?

Cheng
On Wed, May 25, 2011 at 6:20 PM, Ian Lea  wrote:

> > Quite a few Lucene examples on lines shows how to insert multiple fields
> > into a Document and how to query the indexed file with certain fields and
> > queried text. I would like to know:
> >
> > 1.   How to do a cross-field search?
>
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F
>
> > 2.   How to specify some key fields as well as some less important
> > fields?
>
> Boosting.  See
> http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F
>
> > 3.   How many fields would cause performance issue?
>
> Impossible to answer since there are too many variables but in general
> the fewer fields used in a search the faster it will be.  There are
> many other factors, some of which are likely to outweigh this.  See
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
>
>
> --
> Ian.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: how to search multiple fields

2011-05-25 Thread Ian Lea
> In the first link you presented, there is one comment that "Note that terms
> which occur in short fields have a higher effect on the result ranking."
>
> What does "short fields" mean?

This is a short sentence.

This is a somewhat longer sentence that may get lower scores when
matched by terms in a lucene query.

> What are the differences between the impact
> of the short fields and that of the field boost?

It all feeds into oal.search.Similarity.  Best to look at that or
search for something like "lucene scoring" using your favourite search
engine.


--
Ian.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: is OpenBitSet / SortedVIntList compressed bit map index?

2011-05-25 Thread ai114

First Last wrote:
> 
> Are there any other compressed bitmap index implementations which offer
> bit
> map compression at a decent performance assuming filters are sparse?
> 

Have a look at  EWAH by Daniel Lemire
google: http://code.google.com/p/javaewah/
http://code.google.com/p/javaewah/ 
research paper:  http://arxiv.org/abs/0901.3751
http://arxiv.org/abs/0901.3751 
code:  https://github.com/lemire/javaewah/tree/
https://github.com/lemire/javaewah/tree/ 

Gabriel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-OpenBitSet-SortedVIntList-compressed-bit-map-index-tp2213863p2983908.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Is there a limit on the size of the text for a single field?

2011-05-25 Thread Cheng Zhou
Hi, I wonder if I can associate a text string of over 5MB with a single
field.

Thanks.


Re: Is there a limit on the size of the text for a single field?

2011-05-25 Thread Ian Lea
Sure.  See the javadocs for IndexWriter.setMaxFieldLength or
LimitTokenCountAnalyzer if you are using 3.1.0.


--
Ian.


On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou  wrote:
> Hi, I wonder if I can associate a text string of over 5MB with a single
> field.
>
> Thanks.
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Is there a limit on the size of the text for a single field?

2011-05-25 Thread Cheng Zhou
thanks lan.

On Wed, May 25, 2011 at 11:44 PM, Ian Lea  wrote:

> Sure.  See the javadocs for IndexWriter.setMaxFieldLength or
> LimitTokenCountAnalyzer if you are using 3.1.0.
>
>
> --
> Ian.
>
>
> On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou 
> wrote:
> > Hi, I wonder if I can associate a text string of over 5MB with a single
> > field.
> >
> > Thanks.
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


JobClient.runJob(job) in Fetcher.java

2011-05-25 Thread Cheng
Hi, I notice that there are a few run() methods in Fetcher.java and that the
following statement in Crawler.java calls the JobClient.runJob(job) in
Fetcher.java.

fetcher.fetch(segs[0], threads,
org.apache.nutch.fetcher.Fetcher.isParsing(conf));

I would like to know which run() in Fetcher.java has been called by the
above statetment.

Thanks.


Passage retrieval with Lucene-based application

2011-05-25 Thread Leroy Stone


Hello!
I am purchased "Lucene in Action", 2nd Ed., and posted the 
question below at the Manning Forum. Mike MCCandless suggested that I 
send it to you.


Thanks in advance for your attention.

 the question I posted ___
I would like the search program to return with segments of a document 
("paragraphs") that contain my search phrase, rather than simply 
pointers to the whole document. in searching among applications based 
upon the Lucene, I have found only one that seems to have this 
functionality. It is at http://www.crosswire.org/bibledesktop/ . Can 
someone point me to some other Lucene-based applications where the 
search engine returns text segments from within documents?

Thanks in advance.


N.B. I know Lucene can be modified to do what I wish.  My problem is 
that my professional obligations do not allow the time for me to 
build the entire application that I need.  Thus I am searching for 
one that exists already, that I can adapt quickly, and which has all 
the code with which I must surround Lucene to make a full-blown 
application.


The Bible application I cite requires preprocessing of the documents 
into SWORD format.  I will try that route if that is all that is 
available. I thought I would "look around" (with your help) before 
trying to take on the SWORD-format issue.



Thanks.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Passage retrieval with Lucene-based application

2011-05-25 Thread Shashi Kant
https://issues.apache.org/jira/browse/LUCENE-1522


On Wed, May 25, 2011 at 3:46 PM, Leroy Stone  wrote:
> document ("paragraphs") that contain my search phrase, rather than simply
> pointers to the whole document. in searching among applications based upon
> the Lucene, I have found only one that seems to have this functionality. It
> is at http://www.crosswire.org/bibledesktop/ . Can someone point me to some
> other Lucene-based applications where the search engine returns text
> segments from within documents?
> Thanks in advance.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Passage retrieval with Lucene-based application

2011-05-25 Thread Sujit Pal
Hi Leroy,

Would it make sense to index as Lucene documents the unit to be
searched? So if you want paragraphs to be shown in search results, you
could parse the source document during indexing into paragraphs and
index them as separate Lucene documents.

-sujit

On Wed, 2011-05-25 at 15:46 -0400, Leroy Stone wrote:
> Hello!
>  I am purchased "Lucene in Action", 2nd Ed., and posted the 
> question below at the Manning Forum. Mike MCCandless suggested that I 
> send it to you.
> 
> Thanks in advance for your attention.
> 
>  the question I posted ___
> I would like the search program to return with segments of a document 
> ("paragraphs") that contain my search phrase, rather than simply 
> pointers to the whole document. in searching among applications based 
> upon the Lucene, I have found only one that seems to have this 
> functionality. It is at http://www.crosswire.org/bibledesktop/ . Can 
> someone point me to some other Lucene-based applications where the 
> search engine returns text segments from within documents?
> Thanks in advance.
> 
> 
> N.B. I know Lucene can be modified to do what I wish.  My problem is 
> that my professional obligations do not allow the time for me to 
> build the entire application that I need.  Thus I am searching for 
> one that exists already, that I can adapt quickly, and which has all 
> the code with which I must surround Lucene to make a full-blown 
> application.
> 
> The Bible application I cite requires preprocessing of the documents 
> into SWORD format.  I will try that route if that is all that is 
> available. I thought I would "look around" (with your help) before 
> trying to take on the SWORD-format issue.
> 
> 
> Thanks.
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org