Chris Fraschetti writes:
So i decicded to move my epoch date to the 20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000)
but now as soon as I do a query like ... a*
I get the boolean error again. Google obviously can handle this
Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.
The lucene API is very limited as to it's
H all,
I try to create different indices using different Analyzer-classes. I
tried standard, german, russian, and cjk. They all produce exactly the
same index file (md5-wise). There are over 280 pages so I expected at
least some differences.
Any ideas anyone?
--
The information contained
There are some articles about Lucene. You can find the links on
Lucene's Wiki. Lucene in Action is almost done:
http://www.manning.com/catalog/view.php?book=hatcher2
I don't think you can pre-order it from the publisher, but you can
probably pre-order it from Amazon. I don't know of any other
Daan Hoogland wrote:
H all,
I try to create different indices using different Analyzer-classes. I
tried standard, german, russian, and cjk. They all produce exactly the
same index file (md5-wise). There are over 280 pages so I expected at
least some differences.
Take a look in the lucene
Kevin,
You could try setting index-time field length-dependent boosts.
Another possibility may be your own sorting, that takes field length in
consideration, but I'm not sure how well that would work.
Finally, you could use your own Similarity and implement your own
...or just set a lower boost on fileds with less than $x amount of
characters while indexing.
John
Otis Gospodnetic wrote:
Kevin,
You could try setting index-time field length-dependent boosts.
Another possibility may be your own sorting, that takes field length in
consideration, but I'm not
sergiu gordea writes:
Daan Hoogland wrote:
H all,
I try to create different indices using different Analyzer-classes. I
tried standard, german, russian, and cjk. They all produce exactly the
same index file (md5-wise). There are over 280 pages so I expected at
least some differences.
See IndexReader#getTermFreqVector() in the javadocs
[EMAIL PROTECTED] 10/4/2004 10:29:30 AM
hi all
i am indexing documents consisting of fields for a database id, and
text
the text field is created as new Field(FULL_TEXT,text, false,true,
true,
true)
in order to store the Term Vector
You should not have more then one IndexWriter. (You can have multiple
IndexReaders, but only one IndexWriter).
Aviran
-Original Message-
From: Justin Swanhart [mailto:[EMAIL PROTECTED]
Sent: Friday, October 01, 2004 19:14 PM
To: [EMAIL PROTECTED]
Subject: multiple threads
As I
BTW, what's wrong with the DateFilter solution, I mentionned earlier?
I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.
sv
On Mon, 4 Oct 2004, Chris Fraschetti wrote:
Surely some folks out
The date portion of my code works great now.. no problems there, so
let me thank you now for your date filter solution... but my current
problem is in regards to a stand alone a* query giving me
the too many clauses exception
On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James
Ok, got it, got a small comment though.
For large wildcard queries, please note that google does not support wild
cards. Search hell*, and there will be no correct matches with hello.
Is there a reason why you wish to allow such large queries? We might
be able to find alternative ways of helping
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a* but not
I've used the simple message that the user's request was too vague and
that he should modify it. I haven't had too many complaints about this
especially when I explained why to a client:
If one user of many does a*, the whole system will grind to a halt as that
one request will use up all of the
Chris Fraschetti wrote:
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user
Thanks Daniel
Can you tell me two more things.
1. How difficult it is to implement our own Similarity class that can do the
things we want ?
2. If there are more than one field that are percentage match like HP, can
we also specify which field gets the preference while search.
For example, in the
On Monday 04 October 2004 22:22, you wrote:
1. How difficult it is to implement our own Similarity class that can do
the things we want ?
It should be very easy. The API is described here:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
I think in your case
Dmitry,
Thanks for the help and pointers thus far. I know (or believe at the least)
that the files are not referenced by opening segments and deletable with a
hex editor.
I've explored the possibility of an exception that is not recorded to a log
file or written out to screen, so have double
There was a broken version of Lucene in there - (I think the 1.4 release?) which was
not cleaning up old files after you did an optimize in certain cases. For me,
upgrading to 1.4.1, and re-optimizing automatically cleaned up the index.
You may have to add and remove a dummy document first,
On Fri, 1 Oct 2004, Robinson Raju wrote:
analyzer is StandardAnalyzer.
i use MultiFieldQueryParser to parse.
The flow is this:
I have indexed a Database view. Now i need to search against a few columns
i take in the search criteria and search field ,
construct a wildcard query and add it
Doug Cutting writes:
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1798116
Yes, the approach there is similar. I attempted to complete the
solution and provide a working replacement for MultiFieldQueryParser.
But, inspired by that message, couldn't MultiFieldQueryParser
22 matches
Mail list logo