Hi,
Sorry, I (the aramorph maintainer ;-) was absent from the office...
Daniel Naber a crit :
Analyzers that provide ambiguous terms (i.e. a token with more than one term
at the same position) don't work in Lucene 1.4.
The is the correct answer. I've filled a bug about this :
On Sunday 19 December 2004 23:05, Steve Skillcorn wrote:
Hello All;
I bought the Lucene in Action ebook, which is
excellent and I can strongly recommend. One question
that has arisen from the book though is custom
filters.
I have the situation where the text of my docs is in
Lucene,
How to find out the percentages of matched terms in the document(s) using
Lucene ?
Here is an example, of what i am trying to do:
The search query has 5 terms(ibm, risc, tape, dirve, manual) and there are 4
matching
documents with the following attributes:
Doc#1: contains terms(ibm,drive)
I've to show to my boss if Lucene is the best option for create a search
engine of a new portal.
I want to now how many documents do you have in your index?
And how many bigger is your DB?
the types of formats who has to support the portal are html jsp txt doc
pdf ppt
another question that I
Paul already replied, but I'll add my thoughts below to the thread
also...
On Dec 19, 2004, at 5:05 PM, Steve Skillcorn wrote:
I bought the Lucene in Action ebook, which is
excellent and I can strongly recommend.
Thank you
Does the IndexReader that is passed to the bits
method of the
On Dec 20, 2004, at 4:08 AM, Daniel Cortes wrote:
I've to show to my boss if Lucene is the best option for create a
search engine of a new portal.
I want to now how many documents do you have in your index?
And how many bigger is your DB?
I highly recommend you use Luke to examine the index. It
I'm still new to Lucene, but wouldn't that be the coord()? My
understanding is that the coord() is the fraction of the boolean query
that matched a given document.
Again, I'm new, so somebody else will have to confirm or deny...
-Mike
On Mon, 20 Dec 2004 00:33:21 -0800 (PST), Gururaja H
Hi,
But, How to calculate the coord() fraction ? I know by default,
in DefaultSimilarity the coord() fraction is defined as below:
/** Implemented as codeoverlap / maxOverlap/code. */
public float coord(int overlap, int maxOverlap) {
return overlap / (float)maxOverlap;
}
How to get the
Hi
I am building an index of texts, each related to a unique id. The unique ids
might contain a number of underscores which will make the standardanalyzer
shorten them after it sees the second underscore in a row. Furthermore many
of the texts I am indexing is in Italian so the removal of
Hello all,
I have a question regarding the determination of the set of matching
documents, in particular (I guess) related to the NOT operator.
In my case I have a document containing the terms A and B. When I query
for either A or for B, I get this document back, just as expected. Now
when I
Hello, I want to know is there a difference between queries:
+city(+London Amsterdam) +address(1_street 2_street)
And
+city(+London) +city(Amsterdam) +address(1_street) +address(2_street)
Thanks in advance
Alex Kiselevsky
Speech Technology Tel:972-9-776-43-46
RD, Amdocs -
Alex Kiselevski writes:
Hello, I want to know is there a difference between queries:
+city(+London Amsterdam) +address(1_street 2_street)
And
+city(+London) +city(Amsterdam) +address(1_street) +address(2_street)
I guess you mean city:(... and so on.
The first query searches
When searching for phrases, what's important is the position of each
token/word extracted by the Analyzer.
WhitespaceAnalyzer/LowerCaseFilter don't do anything with the
positional information. There is nothing else in your Analyzer?
In any case, the following should help you see what your
Thanks Morus
So if I understand right
If the seqond query is :
+city(London) +city(Amsterdam) +address(1_street) +address(2_street)
Both queries have the same value ?
-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: Monday, December 20, 2004 6:11 PM
To: Lucene Users
Alex, I think you want this:
+city:London +city:Amsterdam +address:1_street +address:2_street
Otis
--- Alex Kiselevski [EMAIL PROTECTED] wrote:
Thanks Morus
So if I understand right
If the seqond query is :
+city(London) +city(Amsterdam) +address(1_street) +address(2_street)
Both
Christian,
Please simplify your situation. Use a plain TermQuery for B and see
what is returned. Then use a simple BooleanQuery for A -B. I
suspect MultiFieldQueryParser is the culprit. What does the toString
of the generated Query return? MFQP is known to be trouble, and an
overhaul to
Hi all,
I am getting null pointer exception when I am sorting on a field that has null
value for some documents. Order by in sql does work on such fields and I
think it puts all results with null values at the end of the list. Shouldn't
lucene also do the same thing instead of throwing null
Hi all,
I am getting null pointer exception when I am sorting on a field that has null
value for some documents. Order by in sql does work on such fields and I
think it puts all results with null values at the end of the list. Shouldn't
lucene also do the same thing instead of throwing null
The coord() value is not saved anywhere so you would need to recompute
it. You could either call explain() and parse the result string, or
better, look at explain() and implement what it does more efficiently
just for coord(). If your queries are all BooleanQuery's of
TermQuery's, then this is
Hi again
Thanks for your answer, Otis. My analyzer did not do anything else than
using the WhitespaceAnalyzer/LowerCaseFilter.
However I found out that I got problems with characters such as ,.: when
searching because of my simple analyzer. (E.g. I would not be able to search
for world in the
I believe your sole problem is that you need to tone down your
lengthNorm. Because doc4 is 10 times longer than doc2, its lengthNorm
is less than 1/3 of that of doc2 (1/sqrt(10) to be precise). This is a
larger effect than the higher coord factor (1/.8) and the extra matching
term in doc4.
In
On Monday 20 December 2004 15:09, Gururaja H wrote:
Hi,
But, How to calculate the coord() fraction ? I know by default,
in DefaultSimilarity the coord() fraction is defined as below:
/** Implemented as codeoverlap / maxOverlap/code. */
public float coord(int overlap, int maxOverlap) {
On Dec 20, 2004, at 12:43 PM, Peter Posselt Vestergaard wrote:
Therefore I turned back to the standard analyzer and now do some
replacing
of the underscores in my ID string to avoid my original problem. This
solved
my phrase problem so that I can now search for phrases. However I
still have
the
ok, I feel a bit stupid now ;) Turns out this issue has been discussed a
while ago on both mailing lists and I even participated in one of
them... shame on me.
The problem is indeed in how MFQP parses my query: the query A -B becomes:
(text:A -text:B) (title:A -title:B) (path:A -path:B)
This is not the official recommendation, but I'd suggest you are least
consider: http://issues.apache.org/bugzilla/show_bug.cgi?id=32674
If you're not using Java 1.5 and you decide you want to use it, you'd
need to take out those dependencies. If you improve it, please share.
Chuck
I'm testing the rebuilding of the index. I add several hundred documents,
optimize and add another few hundred and so on. Right now I have around
7000 files. I observed after the index gets to certain size. Everytime
after optimize, the are two files roughly the same size like below:
Hi Chuck Williams, Paul Elschot,
Thanks so much for the reply.
By overriding the coord() as follows, able to get the right order for the
example that i gave in this thread.
public float coord(int overlap, int maxOverlap) {
return (float) Math.pow((overlap / (float)maxOverlap),
Thanks much for the reply.
Paul Elschot [EMAIL PROTECTED] wrote:On Monday 20 December 2004 15:09,
Gururaja H wrote:
Hi,
But, How to calculate the coord() fraction ? I know by default,
in DefaultSimilarity the coord() fraction is defined as below:
/** Implemented as overlap / maxOverlap.
Thanks much for the reply.
Chuck Williams [EMAIL PROTECTED] wrote:The coord() value is not saved
anywhere so you would need to recompute
it. You could either call explain() and parse the result string, or
better, look at explain() and implement what it does more efficiently
just for coord(). If
29 matches
Mail list logo