Hello,
inspired by this thread, I also tried to implement a MoreLikeThis
search. But I have the same Problem of a null query.
I did set the Fieldname to a Field that is stored in the Index.
But like just returns null.
Here is my Code:
Hits hits = this.is.search(new
Does your index use StandardAnalyzer? Are your fields stored (Field.Store.YES)?
MoreLikeThis uses StandardAnalyzer by default to read the stored content from
the example doc which may produce tokens that do not match those of the indexed
content. Use setAnalyzer() to ensure they are in sync.
hi mark,
Does your index use StandardAnalyzer? Are your fields stored
(Field.Store.YES)?
Thanks! that was the hint in the right direction, the FIeld was Stored
but not indexed:
titleDocument.add(new Field(kurz, title.getKurz(), Field.Store.YES,
Field.Index.NO));
(That was the field for the
Hello all,
I want to realize a drill-down Function aka narrow search aka refine
search.
I want to have something like:
Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)
But not only DateRanges but also for other Categories.
What I have found in the
Hello
The lucene 2.0.0 StandardAnalyzer does treat the _(underscore) as a
token. Is there a way I can make StandardAnalyzer don't tokenize for
_ or any given characters?
I'd like to keep all features that StandardAnalyzer have but want to
modified it a bit for my need? How do I control what
Martin Braun wrote:
I want to realize a drill-down Function aka narrow search aka refine
search.
I want to have something like:
Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)
But not only DateRanges but also for other Categories.
What I have found in the
Hi
you can't have a boolean query containing only MUST_NOT clauses (which is
what (-(FILE:abstract.htm)) is. it matches no documents, so the mandatory
qualification on it causes the query to fail for all docs.
This is true for the search queries, but it makes sense in a query
filter IMHO. I
Hello all,
I want to realize a drill-down Function aka narrow search aka refine
search.
I want to have something like:
Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)
But not only DateRanges but also for other Categories.
What I have found in the
My index contains approximately 5 millions documents. During a
search, I need to grab the value of a field for every document in the
result set. I am currently using a HitCollector to search. Below is
my code:
searcher.search(query, new HitCollector(){
public
On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:
The lucene 2.0.0 StandardAnalyzer does treat the _(underscore) as a
token. Is there a way I can make StandardAnalyzer don't tokenize for
_ or any given characters?
You need to add _ to the #LETTER definition in
I haven't had the chance to use this new feature yet, but have you tried with
selective field loading, so that you can load only that 1 field from your index
and not all of them?
Otis
- Original Message
From: Ryan O'Hara [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Friday,
Ryan O'Hara wrote:
My index contains approximately 5 millions documents. During a
search, I need to grab the value of a field for every document in the
result set. I am currently using a HitCollector to search. Below is
my code:
searcher.search(query, new HitCollector(){
Ken Krugler wrote:
Hello all,
I want to realize a drill-down Function aka narrow search aka refine
search.
I want to have something like:
Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)
But not only DateRanges but also for other Categories.
What I have
Provides a new api, IndexReader.document(int doc, String[] fields). A document
containing
only the specified fields is created. The other fields of the document are not
loaded, although
unfortunately uncompressed strings still have to be scanned because the length
information
in the index
I haven't had the chance to use this new feature yet, but have you
tried with selective field loading, so that you can load only that
1 field from your index and not all of them?
I have not tried selective field loading, but it sounds like a good
idea. What class is it in? Any more
What is #LETTER definition in SnardarTokernize.jj?
I saw:
| #P: (_|-|/|.|,)
| #HAS_DIGIT:// at least one digit
(LETTER|DIGIT)*
DIGIT
(LETTER|DIGIT)*
Should I remove _ and recompile the source code?
Sincerely,
Anh Ngo
-Original
I do not beleive so. If you look above you will see that #P is only used
when looking for a num: a host ip, a phone number, etc. You will be removing
that ability to recognize a _ while rooting those tokens out. It will
still be parsed when tokenizing an EMAIL as well. I dont think this is the
Hello Mark,
Please show me how to add - to #LETTER definition
Thanks,
Anh Ngo
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Friday, July 21, 2006 3:51 PM
To: java-user@lucene.apache.org
Subject: Re: StandardAnalyzer question
I do not beleive so. If you look
I take it back. Probably exactley what you want. Watch out if you're not
compiling all of lucene...you need to avoid a ParserException using ant if
you try to just extract the Standard Analyzer package (the recommended
approach).
On 7/21/06, Mark Miller [EMAIL PROTECTED] wrote:
I do not
| #LETTER: // unicode letters
[
\u0041-\u005a,
\u0061-\u007a,
\u00c0-\u00d6,
\u00d8-\u00f6,
\u00f8-\u00ff,
\u0100-\u1fff
]
becomes
| #LETTER: // unicode letters
[
\u0041-\u005a,
have you tried to only collect doc-ids and see if the speed problem is there,
or maybe to fetch only field values? If you have dense results it can easily be
split() or addSymbolsToHash() what takes the time.
I see 3 possibilities what could be slow, getting doc-ids, fetching field
value or
\u002d would add -.
Originally request was for _ - \u005f
Mark Miller [EMAIL PROTECTED] wrote on 21/07/2006 13:09:28:
| #LETTER: // unicode letters
[
\u0041-\u005a,
\u0061-\u007a,
\u00c0-\u00d6,
\u00d8-\u00f6,
\u00f8-\u00ff,
I did try it and recompile the whole package but it did not work
My #LETTER is:
| #LETTER: // unicode letters
[
\u0041-\u005a,
\u005f,
\u0061-\u007a,
\u00c0-\u00d6,
\u00d8-\u00f6,
\u00f8-\u00ff,
Ngo, Anh (ISS Southfield) wrote:
I did try it and recompile the whole package but it did not work
My #LETTER is:
| #LETTER: // unicode letters
[
\u0041-\u005a,
\u005f,
\u0061-\u007a,
\u00c0-\u00d6,
\u00d8-\u00f6,
It works now.
Thank you very much.
I forgot to run javacc for the StandardTokenizer.jj
Sincerely,
Anh Ngo
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Friday, July 21, 2006 5:33 PM
To: java-user@lucene.apache.org
Subject: Re: StandardAnalyzer question
Interesting and thanks for the answer. I guess I won't write code to
control the order clauses get added--one less thing to do :-)
-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 20, 2006 6:47 PM
To: java-user@lucene.apache.org
Subject: Re:
26 matches
Mail list logo