Daniel Naber wrote:
AND always refers to the terms on both sides, +/- only refers to the term
on the right. So a AND b - +a +b is correct.
*slap forehead* - you're right. Wasn't there something about operator
precedence way back when ;-)
Anyway, thanks to my stupidity and the help on this
Ulrich Mayring writes:
Daniel Naber wrote:
AND always refers to the terms on both sides, +/- only refers to the term
on the right. So a AND b - +a +b is correct.
*slap forehead* - you're right. Wasn't there something about operator
precedence way back when ;-)
Yes. January. And
Is there a simple way to iterate through all the documents to
get their TermFreqVectors? Do I need to write a custom
IndexSearcher for this? Or can I just get an enumeration of
the document ID and call IndexReader.getTermFreqVector(int)?
Thanks,
Will
--
William (Will) Lee
Email: [EMAIL
thanks doug,
that works... but i have to do this setNorm() for each document, which
has been indexed up to now, right? there are round about 1 mio. docs in
the index... i dont think it's a good idea to perform a search and do it
for every doc (and every field of the doc...).
is there any
Dear Lucene-Users,
is there any possibility to use proximity for long queries (10 and more
terms) automatically? I need a kind of ranking feature, that would give
higher relevance scores to those documents, that contain query terms (or
some of query terms) with a lower distance between them. I
not without changing the contents of the index structure to store word locations.
Herb...
-Original Message-
From: Olena Medelyan [mailto:[EMAIL PROTECTED]
Sent: Friday, September 24, 2004 9:28 AM
To: Lucene Users List
Subject: Using Proximity for Ranking
Dear Lucene-Users,
is there
On Friday 24 September 2004 15:27, Olena Medelyan wrote:
I know that I can
use the slop operator for phrase search (red fox~3), but what I need
should work for partial matching as well.
You can use the value of Integer.MAX_VALUE instead of 3 in your example,
something like:
+red +fox +red
Luceners,
When a field is not tokenized should I replace every space
for a ??
I'm looking up for : my dear
If I test with luke, it splits the words in 'my' and 'dear'. So I can't
find in my not tokenized field.
The same happens for
my dear
In these case I don't know why it
Hi all,
I'm trying to understand what's going on with the query parser
and keyword fields.
I've got a large subset of my documents which are publications.
So as to be able to query these, I've got this in the indexer:
doc.add(Field.Keyword(is_pub, 1));
However, if I run a query:
is_pub:1
I
Hi,
Thanks very much for helps, I will try that.
Best regards, Lisheng
-Original Message-
From: Magnus Johansson [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 23, 2004 11:15 PM
To: Lucene Users List
Subject: Re: Power Point Processing
I've had some success with the code found at
The StandardAnalyzer removes the 1 as it is a stop word.
There are two ways you can work around this problem.
1 as you mentioned is to create a Query object programmatically.
2 You can use WhiteSpace Analyzer instead of StandardAnalyzer.
Aviran
-Original Message-
From: Fred Toth
Thanks for the tip. However, since the index is
constantly updated, I won't have to check whether it
has changed. I'm just puzzled as to why I'm running
out of memory when I'm closing the searcher, setting
it to null, running the garbage collector, then
getting a new searcher.
Ed
--- [EMAIL
Hi,
I was hoping it wouldn't come to this:
I've got unicode in my source HTML. In particular, within meta tags,
and it's getting broken by the indexer. Note that I'm not trying to
query on any of this, just store and retrieve document titles with
unicode characters.
Has anyone else experienced
Sorry, that didn't cure it.
Again, anyone want to point me to the quickest replacement
HTML parser (that's unicode clean)?
Thanks,
Fred
At 03:17 PM 9/24/2004, you wrote:
On Friday 24 September 2004 19:58, Fred Toth wrote:
I've got unicode in my source HTML. In particular, within meta tags,
and
In org.apache.lucene.demo.HTMLDocument you need to change the input stream
to use a different encoding. Replace the fis with this:
fis = new InputStreamReader(new FileInputStream(f), UTF-16);
-Original Message-
From: Fred Toth [mailto:[EMAIL PROTECTED]
Sent: Friday, September 24, 2004
Hi,
Thanks for the tip, but that didn't work in my case. Presumably
with this patch, and the changes in CVS, this makes the parser
work with UTF-16. I can't really tell because the index appears
now to be completely UTF-16 and I can't search for anything.
My input is actually UTF-8 anyway, and if
16 matches
Mail list logo