Erik Hatcher schrieb:
I have checked out our current site to the lucene.apache.org area,
and I've also set up a redirect from the jakarta.apache.org/lucene area.
Things are redirecting fine for me. Let me know if you encounter any
issues, but also be patient in case the DNS updates for
Doug Cutting schrieb:
Erik Hatcher wrote:
It also might be a good time to think about mailing list names.
There was a request on infrastructure@ to move [EMAIL PROTECTED] to
[EMAIL PROTECTED], would it make more sense to move it to [EMAIL PROTECTED]
NOW you tell me :)
I think until we
Doug Cutting schrieb:
Bernhard Messer wrote:
Doug, you placed a copy of the website in the java directory. In
both, the original and the java directory the api directory is
missing. I can't copy it into because of the access rights :-(
Argh. The group protection is 'lucene', as it should
Is the SearchBean code in the Sandbox still useful now that we have
sorting in Lucene 1.4? If so, what does it offer that the core does
not provide now?
i just checked the lucene repository. The only reference to SearchBean i
found, is it's own package sandbox/contributions/searchbean. I
If we have the flexibility to add sub-areas where we can store projects
not directly using lucene, like part of speech tagger, clustering ..., i
vote for lucene.apache.org. It is very well branded and all the existing
ports fit under that umbrella.
Bernhard
I also think that lucene.apache.org
On Jan 11, 2005, at 1:11 PM, Doug Cutting wrote:
Should we upgrade the JVM requirements to 1.4 for Lucene's 1.9/2.0
releases and update the locking code?
+1
I think the time being backward comaptible to JDK 1.3 has gone
Bernhard
Doug Cutting schrieb:
Bernhard Messer wrote:
I understand the technical reason for main() there, but logically this
belongs to an external utility class, I think.
Otis you are right, i already thought about it. It could be simply
moved to a newly created class in org.apache.lucene.util package
the visibility of CompoundFileReader to public. I have no
problems with a public CompoundFileReader class. Does anybody see a
reason that the visibility of CompoundFileReader should not be changed
to public ?
Bernhard
--- Bernhard Messer [EMAIL PROTECTED] wrote:
hi,
i already had a look at Garrett's
hi,
i already had a look at Garrett's implementation. I made some smaller
changes to improve the performance when extracting the files from the
compound. All tests work fine and the index is usable after extraction.
The new functionality is added as a public static void main () to
Anwsering the questions found at
http://wiki.apache.org/jakarta/JakartaPMCTopLevelProjectApplication,
lucene would be a good candidate for becoming a TLP. Bringing Nutch and
all the available Lucene ports under one big umbrella sounds really cool
to me.
I think many other lucene developers,
Otis,
I'm not sure which properties you are talking about. As far as I can
tell, systemproperties.html covers all properties. Please add anything
that you see missing.
for example:
System.getProperty(org.apache.lucene.FSDirectory.class,
FSDirectory.class.getName()); in FSDirectory
Alexey,
can you please open a new entry in Bugzilla and mark it as patch by
starting the title with [PATCH]. There you can add your patched files
to the generated, new entry in Bugzilla. This ensures that your patch
didn't get lost within the mailing list.
thanks
Bernhard
Hello !
The attached
systemproperties.html
xdocssystemproperties.xml
Log:
small typo fix
PR:32432
Reviewed by: Bernhard Messer
Revision ChangesPath
1.8 +2 -2 jakarta-lucene/docs/systemproperties.html
Index: systemproperties.html
Bernhard Messer schrieb:
Or maybe we could put together an all-encompassing
TestDeprecatedMethods that at least had calls to all the methods
we've deprecated, but doesn't necessarily test return values or
behavior. Then we could fix all the deprecation warnings in the
other test cases
hi,
you could split your big pdf file into small pieces and index the small
pieces instead of processing the big one. But i think this is much
more effort than to increase the memory size within the jvm.
bernhard
Good morning everybody:
We are trying to index a pdf document (610 pages).
check the lucene users list. There are many threads talking about how to
index PDF documents with lucene.
Bernhard
PROYECTA.Fernandez Garcia, Ivan schrieb:
Good morning everybody,
Are there anyone that was indexed PDF files?
If yes, could you say us how do you make it?
Thanks you
On Tuesday 16 November 2004 22:56, [EMAIL PROTECTED] wrote:
+ throw new RuntimeException(cannot load SegmentReader class:
+ e.getMessage());
I think it's better to leave out the call to getMessage(), as the
toString() which is then used automatically is slightly more verbose, it
On Monday 15 November 2004 13:47, Bernhard Messer wrote:
Hi,
since the last changes in lucene, we are not longer backward compatible
with jdk 1.3. All the pure guys, running IBM WebSphere 4.x with IBM JDK
1.3, lost their chances to run lucene newer than version 1.4.2.
Especially in huge
Hi,
since the last changes in lucene, we are not longer backward compatible
with jdk 1.3. All the pure guys, running IBM WebSphere 4.x with IBM JDK
1.3, lost their chances to run lucene newer than version 1.4.2.
Especially in huge companies, where it is not so trivial to upgrade to a
new java
Kevin,
your idea sounds reasonable to me. Could you add a new patch to bugzilla
and attach the diff to it. This ensures that the patch didn't get lost
within the thousands of emails and if there is some time we will have a
look on it.
thanks in advance
Bernhard
Kevin Oliver schrieb:
While
hi,
is there any reason why the following 3 properties are not documented in
systemproperties.html, or is it just a lack in the documentation ?
org.apache.lucene.SegmentReader.class (SegmentReader.java)
org.apache.lucene.FSDirectory.class (FSDirectory.java)
line.separator (QueryParser.java)
Brian,
the line of code you pasted is not very useful to see where the problem
exists. Could you please provide a small sample how the index is
generated and the query is sent to lucene. If that is not possible,
maybe the stacktrace could be helpful also.
Thanks
Bernhard
Brian Sperryn schrieb:
Be sure to also check in the generated HTML when xdocs are changed.
To regenerate the HTML files, check out the jakarta-site2 repository
beside your jakarta-lucene working directory. Then run the Lucene
build target docs.
this exactly what i already did without finding a problem.
Bernhard
Hi,
thanks for your work. Usually Bugzilla would be the best way to place
your code, so it doesn't get lost. Open a new entry in Bugzilla, prefix
the summary line with [PATCH], and then attach your code.
thanks
Bernhard
Hi all,
I sent a patch regarding wildcard search a couple of days ago(that
hi,
many thanks for proposing and voting me as a committer. I already got my
userid from ASF and can successfully connect to apache.org. In the mail
from ASF, there is a note, that the Project Management Committee
responsible for the project has to grant me access to CVS. Does anybody
know how
Erik Hatcher schrieb:
On Nov 9, 2004, at 8:57 AM, Daniel Naber wrote:
On Tuesday 09 November 2004 10:44, Bernhard Messer wrote:
I already got my
userid from ASF and can successfully connect to apache.org. In the mail
from ASF, there is a note, that the Project Management Committee
responsible
Daniel Naber wrote:
On Monday 11 October 2004 10:53, Christoph Goller wrote:
Maybe the default should remain 0 and
folks with big indices should decide by themselve to use a prefix.
I agree that the default should stay 0, even for Lucene 2.0.
yeap, if going for a default of 0, we also
hi Christoph,
first of all, many thanks for reviewing and at least adding the binary
and compression patch to lucene. What we still have to do, before
finalizing the implementation, is to update the documentation describing
the field data in fileformats.html. The sentence
Currently only the
Daniel Naber wrote:
On Monday 11 October 2004 18:21, Bernhard Messer wrote:
Currently only the low-order bit is used of Bits is used. It is one for
tokenized fields, and zero for non-tokenized fields.
is outdated now and should be updated. Any idea how to proceed ?
Is that the only
Doug Cutting wrote:
Does anyone have fuzzy-query benchmarks for, e.g., ~1M document
indexes, where each document contains a few k of text? Ideally with
such indexes, even complex queries should take less than a second,
no? How long does a fuzzy query take? And how much does a prefix of
Dmitry,
Bernhard Messer wrote:
hi,
CompoundFileReader class contains some code where i can't follow the
idea behind it. Maybe somebody else can switch on the light for me,
so i can see the track. There are 2 public methods which definitly
don't work as expected. I know, extending Directory
hi,
CompoundFileReader class contains some code where i can't follow the
idea behind it. Maybe somebody else can switch on the light for me, so i
can see the track. There are 2 public methods which definitly don't work
as expected. I know, extending Directory forces one to implement the
Hi Christoph,
I reviewed your patch. Looks great for me. However, I wonder why we need
isCompressed in FieldInfo? Beeing compressed or not seems to be a
property of an
individual field more than of all fields in the index with a given name.
Furthermore, the isCompressed flag in FieldInfo is
hi all,
Daniel did a great job when cleaning up the Field class to make it more
readable for the user. Wouldn't it be the best time to clean up the 3
IndexReader methods which are directly related to field names ?
Currently there are 3 different methods available to get the field names
from an
Doug Cutting wrote:
Bernhard Messer wrote:
a few month ago, there was a very interesting discussion about field
compression and the possibility to store binary field values within a
lucene document. Regarding to this topic, Drew Farris came up with a
patch to add the necessary functionality. I
will be affected when using compression.
Bernhard
Otis Gospodnetic wrote:
Bernhard,
Sounds good to me.
I would, however, also be interested in the performance impact of
text-field compression. While adapting Drew's patch, it may be nice to
make the compression mechanism pluggable.
Otis
--- Bernhard Messer
hi developers,
a few month ago, there was a very interesting discussion about field
compression and the possibility to store binary field values within a
lucene document. Regarding to this topic, Drew Farris came up with a
patch to add the necessary functionality. I ran all the necessary tests
Hi Daniel,
i just got the latest version from cvs. You modified BooleanClause,
adding a inner class Occur. Occur has to be implement
java.io.Serializable. If not, RemoteSearchable will not work anylonger.
Is it necessary to add a new patch to bugzilla to fix it, at seems that
you are online
Hui,
there is some truth in what you are saying. But at least, the change is
reflected in the changes.txt which is available in the internet. Look at
the note from Christoph on
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/CHANGES.txt?view=markup
Fixed inconsistencies with index closing.
Otis,
the English class is in cvs, that's where i found it. It is also used by
other test classes like TestTermVectors e.g.
The IOException was something where i wasn't sure how to process. I
think you're right, the best idea would be to pop it up to the caller.
Looking at the original code,
.
Thanks,
Otis
--- Bernhard Messer [EMAIL PROTECTED] wrote:
Otis,
the English class is in cvs, that's where i found it. It is also used
by
other test classes like TestTermVectors e.g.
The IOException was something where i wasn't sure how to process. I
think you're right, the best idea would
, or am I missing a
piece of diff? I don't see any synchronized methods/blocks removed
from the code, so I'm confused about this optimization.
Otis
--- Bernhard Messer [EMAIL PROTECTED] wrote:
Sorry, but there was a bug in the patch i provided several minutes
ago.
A NullpointerException can occur
Grant,
I just updated my local files with the latest version from head. All
tests pass fine, not one is creating an error. If you are working with
eclipse, you could make a diff of your local files against cvs files
from a specified date.
I just compared the current head against the version
Hi Daniel,
just looked at your changes you made on the whiteboard. You moved the
callback interface idea to Other Changes. I think that such an
implementation would raise a change in the current api. Maybe we can
make the new code backward compatible, but at least, we have to add
additional
org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.English;
/**
* @author Bernhard Messer
* @version $rcs = ' $Id: Exp $ ' ;
*/
public class TestMultiThreadTermVectors extends TestCase {
private IndexReader reader;
//private RAMDirectory directory
Sorry, but there was a bug in the patch i provided several minutes ago.
A NullpointerException can occur in SegmentReader doClose method. The
change in the new diff file checks if the ThreadLocal object was created
and is not null before trying to get the TermVectorReader from it.
best regards
Grant,
my idea was to put the TermVectorsReader member from SegmentReader in a
ThreadLocal object and remove synchronization from TermVectorsReader get
methods. But for some reason, storing a TermVectorsReader object in a
ThreadLocal doesn't work. The ThreadLocal get method always returns
Christoph,
doesn't matter which solution you choose. The originally idea was to try
some optimizations which could be implemented in a simple way and has as
less negative side effects as possible. I think this is done within the
first patch. The complexity of the overall system will grow
hi all,
i just made a test case to measure the TermVectorsReader performance
when running one IndexReader in several threads. To do this, i'm adding
1000 documents with one field and different term for each in a
RAMDirectory. Then starting up 1 to 10 threads with the same instance of
like a tips and tricks section on the
lucene website ?
Bernhard
Dmitry Serebrennikov wrote:
Bernhard Messer wrote:
hi developers,
may be there is a small, but effective possibility to optimize the
SegmentMerger class when compound file option is enabled, which is
default since lucene 1.4
Christoph,
very clever implementation and bad news for all disk manufacturer ;-).
The patch works as expected and reduces the max. disk usage the same way
announced in the first message introducing this patch.
thanks
Bernhard
Christoph Goller wrote:
Bernhard Messer wrote:
Hi Christoph,
just
hi developers,
may be there is a small, but effective possibility to optimize the
SegmentMerger class when compound file option is enabled, which is
default since lucene 1.4.
The current implementation creates and writes the compound index file
every time the merge() method is called. Due to
hi developers,
i made some measurements on lucene disk usage during index creation.
It's no surprise that during index creation, within the index
optimization, more disk space is necessary than the final index size
will reach. What i didn't expect is such a high difference in disk size
usage,
Hi developers,
in the attachments you will find to small cleanups for IndexReader and
TermVectorsWriter. In TermVectorsWriter, the visibility of some public
members are changed to protected.
In IndexReader, there is a public method directory(), where classes
outside lucene can get the current
out?
Regards
Raju
- Original Message -
From: Bernhard Messer [EMAIL PROTECTED]
To: Lucene Developers List [EMAIL PROTECTED]
Sent: Tuesday, July 27, 2004 7:28 PM
Subject: Re: docfaq of IndexReader is showing the deleted document also
Hi Raju,
read the documentation
Hi Raju,
read the documentation for the IndexReader.delete method and you will
find your way ;-)
/** Deletes the document numbered codedocNum/code. Once a document is
deleted it will not appear in TermDocs or TermPostitions enumerations.
Attempts to read its field with the [EMAIL PROTECTED]
this sound? Is there
(should there be) an isCurrent() method on the IndexReader that could
encapsulate this process?
Dmitry.
Bernhard Messer wrote:
Hi,
I'm sending a patch which should help to fix a problem using the new
method IndexReader.getCurrentVersion(). As far as i understand the
current
1+ to Christoph's proposal ;-)
Christoph Goller wrote:
Bernhard Messer wrote:
Hi Dmitry,
from the view of keeping the interface clean, it would be much better
to have a seperate method in IndexReader like isCurrent() or even
nicer isValid() which combines the system time of the index
creation
Hi,
I'm sending a patch which should help to fix a problem using the new
method IndexReader.getCurrentVersion(). As far as i understand the
current lucene documentation, developers should use this new method to
verify if an index is out of date. The older method
IndexReader.lastModified() is
hi all,
just played around with the MultiFieldQueryParser and didn't find a
working way to change the operator value.
The problem is that MultiFieldQueryParser is implementing two public
static methods parse only. Calling one of those, in the extended
superclass, the static method
Hi,
the topic you are focusing on is a never ending story in content
retrieval in general. There is no perfect solution which fits in every
environment. Retrieving a document's context based on a single query
term seems to be very difficult also. In Lucene it isn't de very
difficult to
61 matches
Mail list logo