Re: lucene.apache.org

2005-02-14 Thread Bernhard Messer
Erik Hatcher schrieb: I have checked out our current site to the lucene.apache.org area, and I've also set up a redirect from the jakarta.apache.org/lucene area. Things are redirecting fine for me. Let me know if you encounter any issues, but also be patient in case the DNS updates for

Re: lucene.apache.org

2005-02-14 Thread Bernhard Messer
Doug Cutting schrieb: Erik Hatcher wrote: It also might be a good time to think about mailing list names. There was a request on infrastructure@ to move [EMAIL PROTECTED] to [EMAIL PROTECTED], would it make more sense to move it to [EMAIL PROTECTED] NOW you tell me :) I think until we

Re: lucene.apache.org

2005-02-14 Thread Bernhard Messer
Doug Cutting schrieb: Bernhard Messer wrote: Doug, you placed a copy of the website in the java directory. In both, the original and the java directory the api directory is missing. I can't copy it into because of the access rights :-( Argh. The group protection is 'lucene', as it should

Re: SearchBean?

2005-02-07 Thread Bernhard Messer
Is the SearchBean code in the Sandbox still useful now that we have sorting in Lucene 1.4? If so, what does it offer that the core does not provide now? i just checked the lucene repository. The only reference to SearchBean i found, is it's own package sandbox/contributions/searchbean. I

Re: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Bernhard Messer
If we have the flexibility to add sub-areas where we can store projects not directly using lucene, like part of speech tagger, clustering ..., i vote for lucene.apache.org. It is very well branded and all the existing ports fit under that umbrella. Bernhard I also think that lucene.apache.org

Re: what if the IndexReader crashes, after delete, before close.

2005-01-12 Thread Bernhard Messer
On Jan 11, 2005, at 1:11 PM, Doug Cutting wrote: Should we upgrade the JVM requirements to 1.4 for Lucene's 1.9/2.0 releases and update the locking code? +1 I think the time being backward comaptible to JDK 1.3 has gone Bernhard

Re: CFS file and file formats

2005-01-03 Thread Bernhard Messer
Doug Cutting schrieb: Bernhard Messer wrote: I understand the technical reason for main() there, but logically this belongs to an external utility class, I think. Otis you are right, i already thought about it. It could be simply moved to a newly created class in org.apache.lucene.util package

Re: CFS file and file formats

2004-12-31 Thread Bernhard Messer
the visibility of CompoundFileReader to public. I have no problems with a public CompoundFileReader class. Does anybody see a reason that the visibility of CompoundFileReader should not be changed to public ? Bernhard --- Bernhard Messer [EMAIL PROTECTED] wrote: hi, i already had a look at Garrett's

Re: CFS file and file formats

2004-12-30 Thread Bernhard Messer
hi, i already had a look at Garrett's implementation. I made some smaller changes to improve the performance when extracting the files from the compound. All tests work fine and the index is usable after extraction. The new functionality is added as a public static void main () to

Re: kick-start: Lucene to top-level project

2004-12-11 Thread Bernhard Messer
Anwsering the questions found at http://wiki.apache.org/jakarta/JakartaPMCTopLevelProjectApplication, lucene would be a good candidate for becoming a TLP. Bringing Nutch and all the available Lucene ports under one big umbrella sounds really cool to me. I think many other lucene developers,

Re: missing values in systemproperties.html

2004-12-02 Thread Bernhard Messer
Otis, I'm not sure which properties you are talking about. As far as I can tell, systemproperties.html covers all properties. Please add anything that you see missing. for example: System.getProperty(org.apache.lucene.FSDirectory.class, FSDirectory.class.getName()); in FSDirectory

Re: Patch to increse visibility of some classes/methods

2004-12-01 Thread Bernhard Messer
Alexey, can you please open a new entry in Bugzilla and mark it as patch by starting the title with [PATCH]. There you can add your patched files to the generated, new entry in Bugzilla. This ensures that your patch didn't get lost within the mailing list. thanks Bernhard Hello ! The attached

Re: cvs commit: jakarta-lucene/xdocs systemproperties.xml

2004-11-29 Thread Bernhard Messer
systemproperties.html xdocssystemproperties.xml Log: small typo fix PR:32432 Reviewed by: Bernhard Messer Revision ChangesPath 1.8 +2 -2 jakarta-lucene/docs/systemproperties.html Index: systemproperties.html

Re: cvs commit: jakarta-lucene/src/test/org/apache/lucene/queryParser TestQueryParser.java

2004-11-28 Thread Bernhard Messer
Bernhard Messer schrieb: Or maybe we could put together an all-encompassing TestDeprecatedMethods that at least had calls to all the methods we've deprecated, but doesn't necessarily test return values or behavior. Then we could fix all the deprecation warnings in the other test cases

Re: Indexing Error

2004-11-22 Thread Bernhard Messer
hi, you could split your big pdf file into small pieces and index the small pieces instead of processing the big one. But i think this is much more effort than to increase the memory size within the jvm. bernhard Good morning everybody: We are trying to index a pdf document (610 pages).

Re: Queries Lucene 1.3

2004-11-18 Thread Bernhard Messer
check the lucene users list. There are many threads talking about how to index PDF documents with lucene. Bernhard PROYECTA.Fernandez Garcia, Ivan schrieb: Good morning everybody, Are there anyone that was indexed PDF files? If yes, could you say us how do you make it? Thanks you

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/index SegmentReader.java

2004-11-18 Thread Bernhard Messer
On Tuesday 16 November 2004 22:56, [EMAIL PROTECTED] wrote: + throw new RuntimeException(cannot load SegmentReader class: + e.getMessage()); I think it's better to leave out the call to getMessage(), as the toString() which is then used automatically is slightly more verbose, it

Re: jdk 1.3 versus jdk 1.4

2004-11-16 Thread Bernhard Messer
On Monday 15 November 2004 13:47, Bernhard Messer wrote: Hi, since the last changes in lucene, we are not longer backward compatible with jdk 1.3. All the pure guys, running IBM WebSphere 4.x with IBM JDK 1.3, lost their chances to run lucene newer than version 1.4.2. Especially in huge

jdk 1.3 versus jdk 1.4

2004-11-15 Thread Bernhard Messer
Hi, since the last changes in lucene, we are not longer backward compatible with jdk 1.3. All the pure guys, running IBM WebSphere 4.x with IBM JDK 1.3, lost their chances to run lucene newer than version 1.4.2. Especially in huge companies, where it is not so trivial to upgrade to a new java

Re: compound files question/patch

2004-11-13 Thread Bernhard Messer
Kevin, your idea sounds reasonable to me. Could you add a new patch to bugzilla and attach the diff to it. This ensures that the patch didn't get lost within the thousands of emails and if there is some time we will have a look on it. thanks in advance Bernhard Kevin Oliver schrieb: While

missing values in systemproperties.html

2004-11-13 Thread Bernhard Messer
hi, is there any reason why the following 3 properties are not documented in systemproperties.html, or is it just a lack in the documentation ? org.apache.lucene.SegmentReader.class (SegmentReader.java) org.apache.lucene.FSDirectory.class (FSDirectory.java) line.separator (QueryParser.java)

Re: problem getting a nullpointer exception when porting to linux

2004-11-10 Thread Bernhard Messer
Brian, the line of code you pasted is not very useful to see where the problem exists. Could you please provide a small sample how the index is generated and the query is sent to lucene. If that is not possible, maybe the stacktrace could be helpful also. Thanks Bernhard Brian Sperryn schrieb:

Re: Fwd: cvs commit: jakarta-lucene/xdocs whoweare.xml

2004-11-10 Thread Bernhard Messer
Be sure to also check in the generated HTML when xdocs are changed. To regenerate the HTML files, check out the jakarta-site2 repository beside your jakarta-lucene working directory. Then run the Lucene build target docs. this exactly what i already did without finding a problem. Bernhard

Re: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

2004-11-10 Thread Bernhard Messer
Hi, thanks for your work. Usually Bugzilla would be the best way to place your code, so it doesn't get lost. Open a new entry in Bugzilla, prefix the summary line with [PATCH], and then attach your code. thanks Bernhard Hi all, I sent a patch regarding wildcard search a couple of days ago(that

Re: Propose Bernhard as committer

2004-11-09 Thread Bernhard Messer
hi, many thanks for proposing and voting me as a committer. I already got my userid from ASF and can successfully connect to apache.org. In the mail from ASF, there is a note, that the Project Management Committee responsible for the project has to grant me access to CVS. Does anybody know how

Re: Propose Bernhard as committer

2004-11-09 Thread Bernhard Messer
Erik Hatcher schrieb: On Nov 9, 2004, at 8:57 AM, Daniel Naber wrote: On Tuesday 09 November 2004 10:44, Bernhard Messer wrote: I already got my userid from ASF and can successfully connect to apache.org. In the mail from ASF, there is a note, that the Project Management Committee responsible

Re: FuzzyQuery prefix length

2004-10-11 Thread Bernhard Messer
Daniel Naber wrote: On Monday 11 October 2004 10:53, Christoph Goller wrote: Maybe the default should remain 0 and folks with big indices should decide by themselve to use a prefix. I agree that the default should stay 0, even for Lucene 2.0. yeap, if going for a default of 0, we also

documentation in fileformats.html

2004-10-11 Thread Bernhard Messer
hi Christoph, first of all, many thanks for reviewing and at least adding the binary and compression patch to lucene. What we still have to do, before finalizing the implementation, is to update the documentation describing the field data in fileformats.html. The sentence Currently only the

Re: documentation in fileformats.html

2004-10-11 Thread Bernhard Messer
Daniel Naber wrote: On Monday 11 October 2004 18:21, Bernhard Messer wrote: Currently only the low-order bit is used of Bits is used. It is one for tokenized fields, and zero for non-tokenized fields. is outdated now and should be updated. Any idea how to proceed ? Is that the only

Re: FuzzyQuery prefix length

2004-10-11 Thread Bernhard Messer
Doug Cutting wrote: Does anyone have fuzzy-query benchmarks for, e.g., ~1M document indexes, where each document contains a few k of text? Ideally with such indexes, even complex queries should take less than a second, no? How long does a fuzzy query take? And how much does a prefix of

Re: strange behaviour in CompoundFileReader fileModified and touchFile

2004-10-01 Thread Bernhard Messer
Dmitry, Bernhard Messer wrote: hi, CompoundFileReader class contains some code where i can't follow the idea behind it. Maybe somebody else can switch on the light for me, so i can see the track. There are 2 public methods which definitly don't work as expected. I know, extending Directory

strange behaviour in CompoundFileReader fileModified and touchFile

2004-09-30 Thread Bernhard Messer
hi, CompoundFileReader class contains some code where i can't follow the idea behind it. Maybe somebody else can switch on the light for me, so i can see the track. There are 2 public methods which definitly don't work as expected. I know, extending Directory forces one to implement the

Re: DO NOT REPLY [Bug 31149] - [PATCH] to store binary fields with compression

2004-09-27 Thread Bernhard Messer
Hi Christoph, I reviewed your patch. Looks great for me. However, I wonder why we need isCompressed in FieldInfo? Beeing compressed or not seems to be a property of an individual field more than of all fields in the index with a given name. Furthermore, the isCompressed flag in FieldInfo is

API cleanup for Field and future cleanup for IndexReader

2004-09-01 Thread Bernhard Messer
hi all, Daniel did a great job when cleaning up the Field class to make it more readable for the user. Wouldn't it be the best time to clean up the 3 IndexReader methods which are directly related to field names ? Currently there are 3 different methods available to get the field names from an

Re: Binary fields and data compression

2004-09-01 Thread Bernhard Messer
Doug Cutting wrote: Bernhard Messer wrote: a few month ago, there was a very interesting discussion about field compression and the possibility to store binary field values within a lucene document. Regarding to this topic, Drew Farris came up with a patch to add the necessary functionality. I

Re: Binary fields and data compression

2004-08-31 Thread Bernhard Messer
will be affected when using compression. Bernhard Otis Gospodnetic wrote: Bernhard, Sounds good to me. I would, however, also be interested in the performance impact of text-field compression. While adapting Drew's patch, it may be nice to make the compression mechanism pluggable. Otis --- Bernhard Messer

Binary fields and data compression

2004-08-30 Thread Bernhard Messer
hi developers, a few month ago, there was a very interesting discussion about field compression and the possibility to store binary field values within a lucene document. Regarding to this topic, Drew Farris came up with a patch to add the necessary functionality. I ran all the necessary tests

Re: RemoteSearchable will not work anylonger, due to changes in BooleanClause

2004-08-28 Thread Bernhard Messer
Hi Daniel, i just got the latest version from cvs. You modified BooleanClause, adding a inner class Occur. Occur has to be implement java.io.Serializable. If not, RemoteSearchable will not work anylonger. Is it necessary to add a new patch to bugzilla to fix it, at seems that you are online

Re: lucene 1.4 index file closing

2004-08-26 Thread Bernhard Messer
Hui, there is some truth in what you are saying. But at least, the change is reflected in the changes.txt which is available in the internet. Look at the note from Christoph on http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/CHANGES.txt?view=markup Fixed inconsistencies with index closing.

Re: DO NOT REPLY [Bug 30736] - [PATCH] to remove synchronized code from TermVectorsReader

2004-08-19 Thread Bernhard Messer
Otis, the English class is in cvs, that's where i found it. It is also used by other test classes like TestTermVectors e.g. The IOException was something where i wasn't sure how to process. I think you're right, the best idea would be to pop it up to the caller. Looking at the original code,

Re: DO NOT REPLY [Bug 30736] - [PATCH] to remove synchronized code from TermVectorsReader

2004-08-19 Thread Bernhard Messer
. Thanks, Otis --- Bernhard Messer [EMAIL PROTECTED] wrote: Otis, the English class is in cvs, that's where i found it. It is also used by other test classes like TestTermVectors e.g. The IOException was something where i wasn't sure how to process. I think you're right, the best idea would

Re: optimize TermVectorsReader, remove synchronization from code

2004-08-18 Thread Bernhard Messer
, or am I missing a piece of diff? I don't see any synchronized methods/blocks removed from the code, so I'm confused about this optimization. Otis --- Bernhard Messer [EMAIL PROTECTED] wrote: Sorry, but there was a bug in the patch i provided several minutes ago. A NullpointerException can occur

Re: Changes in recent days

2004-08-18 Thread Bernhard Messer
Grant, I just updated my local files with the latest version from head. All tests pass fine, not one is creating an error. If you are working with eclipse, you could make a diff of your local files against cvs files from a specified date. I just compared the current head against the version

Re: [Jakarta Lucene Wiki] Updated: Lucene2Whiteboard

2004-08-16 Thread Bernhard Messer
Hi Daniel, just looked at your changes you made on the whiteboard. You moved the callback interface idea to Other Changes. I think that such an implementation would raise a change in the current api. Maybe we can make the new code backward compatible, but at least, we have to add additional

optimize TermVectorsReader, remove synchronization from code

2004-08-15 Thread Bernhard Messer
org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.English; /** * @author Bernhard Messer * @version $rcs = ' $Id: Exp $ ' ; */ public class TestMultiThreadTermVectors extends TestCase { private IndexReader reader; //private RAMDirectory directory

Re: optimize TermVectorsReader, remove synchronization from code

2004-08-15 Thread Bernhard Messer
Sorry, but there was a bug in the patch i provided several minutes ago. A NullpointerException can occur in SegmentReader doClose method. The change in the new diff file checks if the ThreadLocal object was created and is not null before trying to get the TermVectorReader from it. best regards

Re: TermVectorsReader performance

2004-08-12 Thread Bernhard Messer
Grant, my idea was to put the TermVectorsReader member from SegmentReader in a ThreadLocal object and remove synchronization from TermVectorsReader get methods. But for some reason, storing a TermVectorsReader object in a ThreadLocal doesn't work. The ThreadLocal get method always returns

Re: optimized disk usage when creating a compound index

2004-08-10 Thread Bernhard Messer
Christoph, doesn't matter which solution you choose. The originally idea was to try some optimizations which could be implemented in a simple way and has as less negative side effects as possible. I think this is done within the first patch. The complexity of the overall system will grow

TermVectorsReader performance

2004-08-09 Thread Bernhard Messer
hi all, i just made a test case to measure the TermVectorsReader performance when running one IndexReader in several threads. To do this, i'm adding 1000 documents with one field and different term for each in a RAMDirectory. Then starting up 1 to 10 threads with the same instance of

Re: possible SegmentMerger optimization

2004-08-08 Thread Bernhard Messer
like a tips and tricks section on the lucene website ? Bernhard Dmitry Serebrennikov wrote: Bernhard Messer wrote: hi developers, may be there is a small, but effective possibility to optimize the SegmentMerger class when compound file option is enabled, which is default since lucene 1.4

Re: optimized disk usage when creating a compound index

2004-08-08 Thread Bernhard Messer
Christoph, very clever implementation and bad news for all disk manufacturer ;-). The patch works as expected and reduces the max. disk usage the same way announced in the first message introducing this patch. thanks Bernhard Christoph Goller wrote: Bernhard Messer wrote: Hi Christoph, just

possible SegmentMerger optimization

2004-08-07 Thread Bernhard Messer
hi developers, may be there is a small, but effective possibility to optimize the SegmentMerger class when compound file option is enabled, which is default since lucene 1.4. The current implementation creates and writes the compound index file every time the merge() method is called. Due to

optimized disk usage when creating a compound index

2004-08-06 Thread Bernhard Messer
hi developers, i made some measurements on lucene disk usage during index creation. It's no surprise that during index creation, within the index optimization, more disk space is necessary than the final index size will reach. What i didn't expect is such a high difference in disk size usage,

IndexReader and TermVectorsWriter cleanup

2004-08-05 Thread Bernhard Messer
Hi developers, in the attachments you will find to small cleanups for IndexReader and TermVectorsWriter. In TermVectorsWriter, the visibility of some public members are changed to protected. In IndexReader, there is a public method directory(), where classes outside lucene can get the current

Re: docfaq of IndexReader is showing the deleted document also

2004-07-28 Thread Bernhard Messer
out? Regards Raju - Original Message - From: Bernhard Messer [EMAIL PROTECTED] To: Lucene Developers List [EMAIL PROTECTED] Sent: Tuesday, July 27, 2004 7:28 PM Subject: Re: docfaq of IndexReader is showing the deleted document also Hi Raju, read the documentation

Re: docfaq of IndexReader is showing the deleted document also

2004-07-27 Thread Bernhard Messer
Hi Raju, read the documentation for the IndexReader.delete method and you will find your way ;-) /** Deletes the document numbered codedocNum/code. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the [EMAIL PROTECTED]

Re: IndexReader.getCurrentVersion() and IndexReader.lastModified()

2004-06-03 Thread Bernhard Messer
this sound? Is there (should there be) an isCurrent() method on the IndexReader that could encapsulate this process? Dmitry. Bernhard Messer wrote: Hi, I'm sending a patch which should help to fix a problem using the new method IndexReader.getCurrentVersion(). As far as i understand the current

Re: IndexReader.getCurrentVersion() and IndexReader.lastModified()

2004-06-03 Thread Bernhard Messer
1+ to Christoph's proposal ;-) Christoph Goller wrote: Bernhard Messer wrote: Hi Dmitry, from the view of keeping the interface clean, it would be much better to have a seperate method in IndexReader like isCurrent() or even nicer isValid() which combines the system time of the index creation

IndexReader.getCurrentVersion() and IndexReader.lastModified()

2004-06-02 Thread Bernhard Messer
Hi, I'm sending a patch which should help to fix a problem using the new method IndexReader.getCurrentVersion(). As far as i understand the current lucene documentation, developers should use this new method to verify if an index is out of date. The older method IndexReader.lastModified() is

MultiFieldQueryParser, can't change default search operator

2003-10-29 Thread Bernhard Messer
hi all, just played around with the MultiFieldQueryParser and didn't find a working way to change the operator value. The problem is that MultiFieldQueryParser is implementing two public static methods parse only. Calling one of those, in the extended superclass, the static method

Re: Normalization of Documents

2002-04-13 Thread Bernhard Messer
Hi, the topic you are focusing on is a never ending story in content retrieval in general. There is no perfect solution which fits in every environment. Retrieving a document's context based on a single query term seems to be very difficult also. In Lucene it isn't de very difficult to