Hey
Dev Guys
Apologies
I have a Quick Problem...
The no of Hits on set of Documents indexed using 1.3-final is not same
on 1.4-final version
[ The only modification done to the src is , I have upgraded my
CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ]
Does doing
Hey
Dev Guys
Apologies
Can Some body Explain me
Why for an I/P word TA to the StopAnalyzer.java returns [ta]
instead of [ta]
TA == [ta] instead of [ta]
$125.96 === [125.95] instead of [$125.95]
Is it something wrong I have been missing.
with
If i do it by sorting the input before sending it to lucene, it could become
unmanageable to handle and could also throw unexpected results for the user.
e.g . if i type: winston churchill and world war and germany
i could split the string by and and get the sorted string as (churchill
winston)
Niraj Alok wrote:
Hi Guys,
Finally I have sorted the problem of hits score thanks to the great help of
Franck.
I have hit another problem with the boolean operators now.
When I search for Winston and churchill i get a set of perfectly
acceptable results.
But when I change the order, churchill and
Hi John,
The source code is available from CVS, make it non-final and do what you need to do.
Of course, you may have a hard time finding help later if you aren't using something
everyone else is and your solution doesn't work... :-)
If I understand correctly what you are trying to do, you
What could actually be done is perhaps sort the search result by document
id. Of course your relevancy will be all shot, but at least you would have
control over the sorting order.
At 09:05 AM 07/07/2004, you wrote:
Hi Guys,
Finally I have sorted the problem of hits score thanks to the great
Thanks a lot for your help.
I have one more question:
How would you handle a query consisting of two fields combined with a
Boolean operator, where one field is only indexed and stored (a Keyword)
and another is tokenized, indexed and store ?
Is it possible to have parts of the same query
Hi Grant:
Thanks for the options. How likely will the lucene file formats change?
Are there really no more optiosn? :(...
Thanks
-John
On Thu, 08 Jul 2004 08:50:44 -0400, Grant Ingersoll [EMAIL PROTECTED] wrote:
Hi John,
The source code is available from CVS, make it non-final
Hi Grant:
I have something that would extract only the important words from
a document along with its importance, furthermore, these important
words may not be physically in the document, it could be synonyms to
some of the words in the document. So the output of a process for a
document is
You might try merging the existing index into a new index located on a ram
disk. Once it is done, you can move the directory from ram disk back to
your hard disk. I think this will work as long as the old index did not
finish merging. You might do a strings command on the segments file to
make
Hello
I have downloaded the lucene 1.4 to a windows machine, and it all works
fine, when i tries to move this to a solaris machine i get the following
error:
/opt/tomcat/common/lib/lucene-1.4-final.jar: cannot execute
If i then tries to change the permission (777) on the above file, i get
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most
recent segment that was created? I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine. I don't
think you should even have to delete that segment.
Also, to elaborate
MATL (Mats Lindberg) wrote:
When i copied the lucene jar file to the solaris machine from the
windows machine i used a ftp program.
FTP probably mangled the file. You need to use FTP's binary mode.
Doug
-
To unsubscribe, e-mail:
Peter M Cipollone wrote:
You might try merging the existing index into a new index located on a ram
disk. Once it is done, you can move the directory from ram disk back to
your hard disk. I think this will work as long as the old index did not
finish merging. You might do a strings command on
Doug Cutting wrote:
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize? Ideally it wouldn't
take 6 hours.
Was this the index with the mergeFactor of 5000? If so, that's why
it's so slow: you've delayed all of the work until the end. Indexing
on a ramfs will make things
Doug Cutting wrote:
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most
recent segment that was created? I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine. I
don't think you should even have to delete that segment.
[EMAIL PROTECTED] wrote:
Hi,
a couple of weeks ago we migrated from Lucene 1.2 to 1.4rc3. Everything went
smoothly, but we are experiencing some problems with that new constant limit
maxClauseCount=1024
which leeds to Exceptions of type
org.apache.lucene.search.BooleanQuery$TooManyClauses
Thanks Doug. I will do just that.
Just for my education, can you maybe elaborate on using the
implement an IndexReader that delivers a
synthetic index approach?
Thanks in advance
-John
On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting [EMAIL PROTECTED] wrote:
John Wang wrote:
The
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that
it logs merges? If so, it would be interesting to see that output,
especially the last
Otis Gospodnetic wrote:
Hey Kevin,
Not sure if you're aware of it, but you can specify the lock dir, so in
your example, both JVMs could use the exact same lock dir, as long as
you invoke the VMs with the same params.
Most people won't do this or won't even understand WHY they need to do
this
Doug Cutting wrote:
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that
it logs merges? If so, it would be interesting to see that output,
Kevin A. Burton wrote:
This is why I think it makes more sense to use our own java.io.tmpdir to
be on the safe side.
I think the bug is that Tomcat changes java.io.tmpdir. I thought that
the point of the system property java.io.tmpdir was to have a portable
name for /tmp on unix,
John Wang wrote:
Just for my education, can you maybe elaborate on using the
implement an IndexReader that delivers a
synthetic index approach?
IndexReader is an abstract class. It has few data fields, and few
non-static methods that are not implemented in terms of abstract
methods. So, in
Doug Cutting wrote:
Kevin A. Burton wrote:
This is why I think it makes more sense to use our own java.io.tmpdir
to be on the safe side.
I think the bug is that Tomcat changes java.io.tmpdir. I thought that
the point of the system property java.io.tmpdir was to have a portable
name for /tmp on
I'm trying to do a search and sort the results using a Sort object.
The 1.4-final API says that Searcher has the following method.
Hits search(Query query, Sort sort)
However, when I try to use it in the code below:
IndexSearcher is = new IndexSearcher(fsDir);
Query query =
Kevin A. Burton wrote:
During an optimize I assume Lucene starts writing to a new segment and
leaves all others in place until everything is done and THEN deletes them?
That's correct.
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has
Doug Cutting wrote:
Something sounds very wrong for there to be that many files.
The maximum number of files should be around:
(7 + numIndexedFields) * (mergeFactor-1) *
(log_base_mergeFactor(numDocs/minMergeDocs))
With 14M documents, log_10(14M/1000) is 4, which gives, for you:
(7 +
I would like to implement the following functionality:
- Search a specific field (category) and limit the search where the
title field begins with a given letter, and return the results sorted in
alphabetical order by title. Both the category and title fields are
tokenized, indexed and stored in
Hi Don,
After months of struggling with lucene and finally achieving the complex
relevancy desired, the client would kill me if i now make that relevancy all
lost.
I am trying to do it with the way Franck suggested by sorting the words the
user has entered, but otherwise, isn't this a bug of
29 matches
Mail list logo