Re: new facet parameter: facet.exists=true

2010-03-30 Thread Erik Hatcher
One trick to doing this is to index a field that lists the facet field names that each document possesses. Then you can facet on the field of field names (sounds confusing, sorry) and you'll know if there are any documents in a result set that have values in, say, a "category" field. The

Re: new facet parameter: facet.exists=true

2010-03-30 Thread Erik Hatcher
My aim is to keep the memory footprint low while still beeing able to facet >10^7 of documents. A problem i am dealing with right now. Original-Nachricht Datum: Tue, 30 Mar 2010 08:46:23 -0400 Von: Erik Hatcher An: java-dev@lucene.apache.org Betreff: Re: new facet paramet

Re: Checking an index

2005-03-03 Thread Erik Hatcher
Please contribute your code to a Bugzilla issue (http://issues.apache.org/bugzilla/ - first create a new issue, then make an attachment). Erik On Mar 3, 2005, at 3:12 PM, Ravi Rao wrote: All, I have written a utility that will check if an index has been corrupted. The implementation is

Re: ANN: MUTIS Alpha 1 Released

2005-03-05 Thread Erik Hatcher
On Mar 5, 2005, at 4:04 PM, Mario Alejandro M. wrote: P.D. 2: How can be MUTIS listed in the wiki of Lucene? Create an account, and then edit the page that it belongs on. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: svn commit: r156600 - in lucene/java/trunk/src: java/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.java java/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.jj test

2005-03-09 Thread Erik Hatcher
On Mar 9, 2005, at 3:57 AM, Daniel Naber wrote: On Wednesday 09 March 2005 04:21, [EMAIL PROTECTED] wrote: remove pesky static parse method that stymies flexibility That will make it difficult to make this the new default parser (i.e. rename it toQueryParser) as people will get a compile error the

Re: svn commit: r156600 - in lucene/java/trunk/src: java/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.java java/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.jj test

2005-03-09 Thread Erik Hatcher
On Mar 9, 2005, at 9:37 AM, Daniel Naber wrote: On Wednesday 09 March 2005 10:52, Erik Hatcher wrote: It's a nuisance to have that static method when making a subclass of QueryParser - since static methods are not overridable it would be easy to mistakenly call the parent static parse m

Re: Precedence parser: NOT/AND, disableCoord

2005-03-14 Thread Erik Hatcher
On Mar 13, 2005, at 2:35 AM, Paul Elschot wrote: I had a short look through the new precedence parser and noticed a possible issue. Adding this in the TestPrecedenceParser testSimple() method: assertQueryEquals("NOT a AND b", null, "-a +b"); // currently parses as -(+a +b) fails the test bec

Re: Interfaces

2005-03-17 Thread Erik Hatcher
I think, though I'm not speaking for anyone here but myself, that the Lucene team is open to API improvements that _do not adversely affect performance_ and that have _a real benefit_. While I'm as IoC and design pattern savvy as the next developer, I'm also highly pragmatic. I've not been con

Re: Precedence parser: NOT/AND, disableCoord

2005-03-17 Thread Erik Hatcher
On Mar 15, 2005, at 3:57 PM, Paul Elschot wrote: On Tuesday 15 March 2005 01:55, Erik Hatcher wrote: I'd welcome others to give it a try, though. I'm still learning how to accomplish things with JavaCC. The basic rule is the deeper the nesting of the grammar construct, the higher t

Re: Interfaces

2005-03-17 Thread Erik Hatcher
On Mar 17, 2005, at 9:15 AM, Maik Schreiber wrote: Pragmatically, have you ever had addDocument fail? If not, then what peace of mind are you getting from such a test? The test I've shown is not supposed to test if addDocument() fails or not, but to test if addDocument() is invoked at all. The t

Re: snowball analyzer uismo issue in spanish stemmer

2005-03-17 Thread Erik Hatcher
On Mar 17, 2005, at 2:12 PM, Doug Cutting wrote: Erik Hatcher wrote: I just tried regenerating, which automatically pulls from CVS, and got this error: /Users/erik/dev/lucene/java/contrib/snowball/snowball/website/p/ generator.c:425: internal compiler error: in extract_insn, at recog.c:2175

Re: Interfaces

2005-03-18 Thread Erik Hatcher
I didn't mean to imply that using interfaces themselves were a potential performance issue what I meant was all the IoC mechanisms that might be tossed in as factory overhead to construct things indirectly. Erik On Mar 18, 2005, at 10:22 AM, Maik Schreiber wrote: However, the primar

Re: Ok, for a Java newbie, how setup NetBeans 4 + Lucene

2005-03-23 Thread Erik Hatcher
On Mar 22, 2005, at 10:40 PM, Chuck Williams wrote: As a side note, I just ran the tests and got two errors: These are known test failures. I've just made this test an exception and it will not run by default. To run this particular test: ant test -Dtestcase=TestPrecedenceQueryParser E

[VOTE] Two new committers

2005-03-27 Thread Erik Hatcher
I propose two separate votes here. * Andi Vajda for committer-ship on the /contrib/db directory tree for his continued maintenance of DBDirectory, which OSAF is making solid use of within Chandler. * Brian McCallister for commiter-ship on a new /ruby directory tree to begin work from scratch

Re: [VOTE] Two new committers

2005-03-27 Thread Erik Hatcher
this through the incubator since you're the most Pythonic Lucene committer?! :) Erik On Mar 27, 2005, at 4:23 PM, Otis Gospodnetic wrote: 2 x +1. I'd also love to see PyLucene move towards Apache Incubator, but that's a separate thing. Otis --- Erik Hatcher <[EMAIL PR

Re: [VOTE] Two new committers

2005-03-28 Thread Erik Hatcher
On Mar 27, 2005, at 11:52 PM, Otis Gospodnetic wrote: I'm spread super-thin currently Welcome to the club! , but I'll try my best. Thanks! Erik I'll start by forwarding some email from Garrett about steps he took to get lucene4c into the Incubator. Otis --- Erik

Re: null pointer exception when try to get contents out of hits

2005-03-28 Thread Erik Hatcher
On Mar 28, 2005, at 5:35 PM, Xiaozheng Ma wrote: Hi guys, I indexed the file by title and contents such as: document.add(Field.Text("title", file.getTitle())); document.add(Field.Text("contents", file.getTextContents())); //the getTextContents() return the Reader type obj so far so go

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Erik Hatcher
I added Paul's SkipFilter and overwrote my FilteredQuery class from the code here. However, your FilteredQuery depends on two additional classes: DocNrSkipper and SortedVIntList that are not provided. Please attach those classes to this issue. I'm not personally fond of the abbreviation o

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Erik Hatcher
Oh, and one other thing Paul's code relies on JDK 1.4's assert keyword. It seems this is an unnecessary reason to jump to 1.4 dependence. What do folks think about JDK 1.4 as a minimum Lucene requirement? Erik On Apr 3, 2005, at 5:22 PM, [EMAIL PROTECTED] wrote: DO NOT REPLY TO TH

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Erik Hatcher
On Apr 4, 2005, at 8:18 AM, Andrzej Bialecki wrote: Erik Hatcher wrote: Oh, and one other thing Paul's code relies on JDK 1.4's assert Erhm.. you meant 1.5 (five), right? No, 1.4. Assert's were added in JDK 1.4. I live in a Mac-centric world and 1.5 (errr, 5.0!), and I&#x

Re: build.xml requires asf.site.home

2005-04-05 Thread Erik Hatcher
Dan, I have corrected the issue you discovered with the asf.site.home property and committed the change. On Apr 4, 2005, at 10:21 PM, Dan Climan wrote: I was trying to compile the head version of lucene that I checked out of SVN. When I run an ant command to build it, I get the following error:

Re: BooleanQuery.equals() change

2005-04-12 Thread Erik Hatcher
On Apr 11, 2005, at 5:57 PM, Yonik Seeley wrote: Erik, why was the last change to BooleanQuery made? The comment was "Correct BooleanQuery.equals such that every clause is compared". It looks like Vector.equals() should have worked, and the new code is probably slower as it creates two new arrays.

Re: BooleanQuery.equals() change

2005-04-12 Thread Erik Hatcher
On Apr 12, 2005, at 10:59 AM, Yonik Seeley wrote: On Apr 11, 2005, at 5:57 PM, Yonik Seeley wrote: Erik, why was the last change to BooleanQuery made? The comment was "Correct BooleanQuery.equals such that every clause is compared". It looks like Vector.equals() should have worked, and the new co

Re: About search books?

2005-04-15 Thread Erik Hatcher
On Apr 15, 2005, at 7:33 AM, [EMAIL PROTECTED] wrote: Hello everybody, I am trying to put together a search engine specific for books. Interesting! Could you tell us more about what you're building? Is there anybody that can give me some advice? in particular, i have some questions: - beside hardw

Re: running out of file handles

2005-04-15 Thread Erik Hatcher
There was an issue in an early 1.4.x release that kept files from being deleted properly. I recommend you upgrade to 1.4.3 and see if that fixes things. Erik On Apr 15, 2005, at 12:33 PM, Guillermo Payet wrote: Hi Doug, I'm using: lucene-1.4-final.jar On Fri, Apr 15, 2005 at 08:52:54A

Re: [Performance] Streaming main memory indexing of single strings

2005-04-15 Thread Erik Hatcher
On Apr 14, 2005, at 5:11 PM, Robert Engels wrote: It is really not that involved. Just implement the abstract methods of IndexReader. And many cane be no-op'd because they will never be called in a "read only" situation. Methods related to normalization and such can also be no-op'd because you a

Re: [Performance] Streaming main memory indexing of single strings

2005-04-15 Thread Erik Hatcher
On Apr 15, 2005, at 6:15 PM, Wolfgang Hoschek wrote: Cool! For my use case it would need to be able to handle arbitrary queries (previously parsed from a general lucene query string). Something like: float match(String Text, Query query) it's fine with me if it also works for flo

Re: [Performance] Streaming main memory indexing of single strings

2005-04-15 Thread Erik Hatcher
Wolfgang. On Apr 15, 2005, at 5:08 PM, Erik Hatcher wrote: On Apr 15, 2005, at 6:15 PM, Wolfgang Hoschek wrote: Cool! For my use case it would need to be able to handle arbitrary queries (previously parsed from a general lucene query string). Something like: float match(String Text,

Re: [Performance] Streaming main memory indexing of single strings

2005-04-16 Thread Erik Hatcher
On Apr 15, 2005, at 9:50 PM, Wolfgang Hoschek wrote: So, all the text analyzed is in a given field... that means that anything in the Query not associated with that field has no bearing on whether the text matches or not, correct? Right, it has no bearing. A query wouldn't specify any fields, it

Re: [Performance] Streaming main memory indexing of single strings

2005-04-16 Thread Erik Hatcher
On Apr 16, 2005, at 1:17 PM, Wolfgang Hoschek wrote: Note that "fish*~" is not a valid query expression :) Perhaps the Lucene QueryParser should throw an exception then. Currently 1.4.3 accepts the expression as is without grumbling... Several minor QueryParser weirdnesses like this have turned up

Re: lucene 2.0?

2005-04-19 Thread Erik Hatcher
On Apr 19, 2005, at 6:16 PM, Doug Cutting wrote: Bernhard Messer wrote: I'm not a fan of outdated software or historical systems. So i think the best would be to keep lucene still backward compatible with version 1.9 and perform the switch to JDK 1.4 with lucene 2.0. That sounds like a good plan.

Re: lucene 2.0?

2005-04-19 Thread Erik Hatcher
ssues identified with the tests from this? ant test -Dtestcase=TestPrecedenceQueryParser If not, we should probably pull it out so as to not distribute it with an official release. Thoughts? Erik On Apr 19, 2005, at 9:15 PM, Erik Hatcher wrote: On Apr 19, 2005, at 6:16 PM, Doug Cu

Re: lucene 2.0?

2005-04-20 Thread Erik Hatcher
On Apr 19, 2005, at 10:06 PM, Mario Alejandro M. wrote: LIke maybe know, I'm porting Lucene to Delphi. Taking in account that this is in progress and not a functional release as be made (however, today I archive ALPHA 2 when all indexing stuff is working) what can help me in not wast time on it? T

Re: [Performance] Streaming main memory indexing of single strings

2005-04-20 Thread Erik Hatcher
On Apr 20, 2005, at 12:11 PM, Wolfgang Hoschek wrote: By the way, by now I have a version against 1.4.3 that is 10-100 times faster (i.e. 3 - 20 index+query steps/sec) than the simplistic RAMDirectory approach, depending on the nature of the input data and query. From some preliminary te

Re: wiki configuration and commit messages

2005-04-21 Thread Erik Hatcher
On Apr 21, 2005, at 4:12 PM, Daniel Naber wrote: I didn't get a commit message about Otis' and my SVN commits yesterday. Is that a problem only on my side? Commits come to [EMAIL PROTECTED] - subscribe to that using [EMAIL PROTECTED] Also, the Wiki start page is a German page if your browser if

Re: wiki configuration and commit messages

2005-04-21 Thread Erik Hatcher
On Apr 21, 2005, at 4:59 PM, Daniel Naber wrote: On Thursday 21 April 2005 22:34, Erik Hatcher wrote: Commits come to [EMAIL PROTECTED] - subscribe to that using [EMAIL PROTECTED] So is the information on this page, that says *all* commits go to commits@, not correct anymore?: http

Fwd: [jira] Closed: (INFRA-272) 3 new Lucene mailing lists

2005-04-21 Thread Erik Hatcher
e-mail list creation Key: INFRA-259 URL: http://issues.apache.org/jira/browse/INFRA-259 Project: Infrastructure Type: New Feature Components: Mailing Lists Reporter: Erik Hatcher Please create a new [EMAIL PROTECTED] e-mail list with me (e

broken compilation

2005-04-22 Thread Erik Hatcher
I don't normally run this target, but one of our deprecated tests has a compilation issue. I haven't researched when this broke, but could someone fix this please? Here are the details: $ ant compile-test-deprecated -Djavac.deprecation=off Buildfile: build.xml javacc-uptodate-check: javacc-n

Fwd: Lucene and Groovy...

2005-04-22 Thread Erik Hatcher
om: Jeremy Rayner <[EMAIL PROTECTED]> Date: April 22, 2005 3:18:31 PM EDT To: Erik Hatcher <[EMAIL PROTECTED]> Subject: Re: Lucene and Groovy... Reply-To: Jeremy Rayner <[EMAIL PROTECTED]> Hi Erik, hits.each { println(it["filename"]) } OK, I've implemented the

Re: Lucene and Groovy...

2005-04-25 Thread Erik Hatcher
On Apr 25, 2005, at 2:56 PM, Doug Cutting wrote: Erik Hatcher wrote: There are two .java files attached that may not make it through to the list. These are simple wrappers that do exactly what you'd expect. The idea is to make dealing with Lucene Hits more "Java like" with an

HighlighterTest failure

2005-04-25 Thread Erik Hatcher
I get a failure running HighlighterTest from the Subversion trunk. Below are the details. What's the fix? Thanks, Erik [junit] Searching for: multi* [junit] - --- [junit] Testcase: testMultiSearcher(org.apache.lucene.search.highligh

Re: svn commit: r164695 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/search/Hit.java src/java/org/apache/lucene/search/HitIterator.java src/java/org/apache/lucene/search/Hits.java s

2005-04-25 Thread Erik Hatcher
On Apr 25, 2005, at 10:36 PM, Otis Gospodnetic wrote: Would it be better to explicitly check for out of bounds hitNumber instead of catching ArrayIndexOutOfBoundsException? if (hitNumber > hits.length()) { throw new NoSuchElementException(); } Good eye. I've added a test case that identified thi

Re: SortTest failing

2005-04-25 Thread Erik Hatcher
On Apr 25, 2005, at 10:55 PM, Otis Gospodnetic wrote: Hm, Erik is not alone with unit tests failing. My HighlighterTest passes (I didn't do svn update today yet) Let me know what happens after you fully update and build the main Lucene JAR, then run the highlighter build. , but I see SortTest fa

Re: HighlighterTest failure

2005-04-25 Thread Erik Hatcher
On Apr 25, 2005, at 10:02 PM, Chuck Williams wrote: Erik Hatcher wrote: I get a failure running HighlighterTest from the Subversion trunk. Below are the details. What's the fix? I don't have the code here to run, but the problem is that MultiSearcher.rewrite(line 298) is calling Que

Re: a new way to update without delete+add

2005-04-26 Thread Erik Hatcher
On Apr 26, 2005, at 1:38 PM, Nicolas Maisonneuve wrote: One of the pb in Lucene is the updating of document. With this patch, you can update documents very quickly. (see test case) http://issues.apache.org/bugzilla/show_bug.cgi?id=34563 This is not the link you meant. Here's the one you just s

Re: broken compilation

2005-04-26 Thread Erik Hatcher
On Apr 26, 2005, at 2:02 PM, Daniel Naber wrote: On Friday 22 April 2005 05:29, Erik Hatcher wrote: I don't normally run this target, but one of our deprecated tests has a   compilation issue.  I haven't researched when this broke, but could someone fix this please? I fixed it. TermIn

Re: svn commit: r164695 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/search/Hit.java src/java/org/apache/lucene/search/HitIterator.java src/java/org/apache/lucene/search/Hits.java s

2005-04-26 Thread Erik Hatcher
On Apr 26, 2005, at 2:38 PM, Daniel Naber wrote: On Tuesday 26 April 2005 02:21, [EMAIL PROTECTED] wrote: +  public String toString() { +    try { +      return getDocument().toString(); +    } catch (IOException e) { +      return null; +    } +  } Wouldn't it be better here to re-throw the except

Re: too many classes visible with "ant javadocs"

2005-04-26 Thread Erik Hatcher
On Apr 26, 2005, at 6:10 PM, Daniel Naber wrote: Hi, the java API documentation now seems to contain classes which have no useful documentation and thus probably shouldn't be part of the API docs, e.g. Among, Testapp, SnowballProgram (maybe more). Also, build.xml has a typo: "MorLikeThis" (Mor in

Re: too many classes visible with "ant javadocs"

2005-04-26 Thread Erik Hatcher
On Apr 26, 2005, at 6:10 PM, Daniel Naber wrote: the java API documentation now seems to contain classes which have no useful documentation and thus probably shouldn't be part of the API docs, e.g. Among, Testapp, SnowballProgram (maybe more). This is now fixed. I excluded net.sf.* from being jav

Re: [Performance] Streaming main memory indexing of single strings

2005-04-26 Thread Erik Hatcher
tion. Before turning to a performance patch discussion I'd a this point rather be most interested in folks giving it a spin, comments on the API, or any other issues. Cheers, Wolfgang. On Apr 20, 2005, at 11:26 AM, Wolfgang Hoschek wrote: On Apr 20, 2005, at 9:22 AM, Erik Hatcher wrote: O

Re: Correct of Query.combine() bugs with new MultiSearcher

2005-04-26 Thread Erik Hatcher
I've confirmed Chuck's patch does fix the Highlighter test. I'm set to commit it once it gets the thumbs-up from Doug. Erik On Apr 26, 2005, at 4:58 PM, Chuck Williams wrote: As noted in the patch description I just submitted, it should be a complete, correct, robust (relative to possib

Re: [Performance] Streaming main memory indexing of single strings

2005-04-27 Thread Erik Hatcher
On Apr 27, 2005, at 12:22 PM, Doug Cutting wrote: Erik Hatcher wrote: I'm not quite sure where to put MemoryIndex - maybe it deserves to stand on its own in a new contrib area? That sounds good to me. Ok... once Wolfgang gives me one last round up updates (JUnit tests instead of main(

Re: ParallelReader

2005-04-29 Thread Erik Hatcher
On Apr 28, 2005, at 5:19 PM, Doug Cutting wrote: Please find attached something I wrote today. It has not been yet tested extensively, and the documentation could be improved, but I thought it would be good to get comments sooner rather than later. Would folks find this useful? Should it go int

Re: Jakarta image on lucene.apache.org

2005-04-29 Thread Erik Hatcher
On Apr 28, 2005, at 6:03 PM, Daniel Naber wrote: On Thursday 28 April 2005 00:16, Daniel Naber wrote: This 'site' SVN repo is for Jakarta projects.  Is there one for TLPs? Apache httpd has its own "site" directory. We might need that, too. I just had a closer look and got it working on my machine.

Re: too many classes visible with "ant javadocs"

2005-05-01 Thread Erik Hatcher
On May 1, 2005, at 3:06 PM, Daniel Naber wrote: On Wednesday 27 April 2005 02:53, Erik Hatcher wrote: By all means feel free to take over the build process refactorings if you'd like. I think you as the ant expert can do that much better, and my mail wasn't meant as a complaint that th

build process changes

2005-05-01 Thread Erik Hatcher
I have done some extensive rearranging of the build system to facilitate the inclusion of the contrib pieces into future source and binary distributions. I'll describe it in more detail below. First - if you have any problems with the Lucene build process as it currently stands in Subversion,

Re: too many classes visible with "ant javadocs"

2005-05-02 Thread Erik Hatcher
On May 1, 2005, at 9:32 PM, Brian Goetz wrote: junit.jar really ought to be removed from our repository. Due to classloader issues, doesn't work with junit.jar anywhere but in the classpath that launches Ant. The Ant best practice is to put junit.jar in ANT_HOME/lib anyway. I have adjusted the

Re: [Performance] Streaming main memory indexing of single strings

2005-05-02 Thread Erik Hatcher
On May 1, 2005, at 10:20 PM, Wolfgang Hoschek wrote: I've uploaded code that now runs against the current SVN, plus junit test cases, plus some minor internal updates to the functionality itself. For details see http://issues.apache.org/bugzilla/show_bug.cgi?id=34585 Be prepared for the test

Re: [Performance] Streaming main memory indexing of single strings

2005-05-02 Thread Erik Hatcher
On May 2, 2005, at 5:21 PM, Wolfgang Hoschek wrote: Finally found and fixed the bug! The fix is simply to replace MemoryIndex.MemoryIndexReader skipTo() with the following: public boolean skipTo(int target) { if (DEBUG) System.err.println(".skipTo: " + targe

Re: build process changes

2005-05-02 Thread Erik Hatcher
On May 2, 2005, at 2:52 PM, Doug Cutting wrote: Thanks for doing all this! It looks great! *whew* - thanks. As always, let me know if there is anything further I can do. I'll tidy things up as I go with it. What are your thoughts on what files we should actually distribute? There is merit

Re: [Performance] Streaming main memory indexing of single strings

2005-05-03 Thread Erik Hatcher
Applied!! Erik On May 3, 2005, at 1:31 PM, Wolfgang Hoschek wrote: Here's a performance patch for MemoryIndex.MemoryIndexReader that caches the norms for a given field, avoiding repeated recomputation of the norms. Recall that, depending on the query, norms() can be called over and over a

Re: contrib: keywordTokenStream

2005-05-03 Thread Erik Hatcher
Wolfgang, I've now added this. I'm not seeing how this could be generally useful. I'm curious how you are using it and why it is better suited for what you're doing than any other analyzer. "keyword tokenizer" is a bit overloaded terminology-wise, though - look in the contrib/analyzers/src

Re: build process changes

2005-05-05 Thread Erik Hatcher
On May 5, 2005, at 3:52 PM, Doug Cutting wrote: I'd be happy to change it if that is the desire though. I think that all of the jars we create should be prefixed with 'lucene-' and end with the version. That will make it easier for folks to copy them into lib directories and still know what th

Re: [EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2005-05-08 Thread Erik Hatcher
Jason or other Gump folks - We'd love to get Lucene's build working fine with Gump. From the error message it doesn't appear this is Lucene-related though. Let me know if there is anything I can do to fix it. I have been refactoring the build process so that all of Lucene's contrib compone

Re: [EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2005-05-09 Thread Erik Hatcher
On May 9, 2005, at 3:57 AM, Stefan Bodewig wrote: On Sun, 8 May 2005, Erik Hatcher <[EMAIL PROTECTED]> wrote: Jason or other Gump folks - You realize that the mail has been auto-generated and Jason is only the sender because he added the entry and we never bothered to change it, don&#x

Re: Helping PyLucene and RubyLucene incubate

2005-05-09 Thread Erik Hatcher
On May 9, 2005, at 10:48 PM, Otis Gospodnetic wrote: This may be more appropriate for general@, but I don't know who's on it. There is a [EMAIL PROTECTED] list :) (CC'd) I was supposed to help Andi Vajda and a few more guys get going with PyLucene and RubyLucene incubation. I'm not finding time f

Re: [EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2005-05-10 Thread Erik Hatcher
On May 9, 2005, at 9:34 AM, Stefan Bodewig wrote: On Mon, 9 May 2005, Erik Hatcher <[EMAIL PROTECTED]> wrote: On May 9, 2005, at 3:57 AM, Stefan Bodewig wrote: Feel free to add any "new" dependencies Gump is missing. What's the process for me to add new dependencies? If the

Re: Helping PyLucene and RubyLucene incubate

2005-05-10 Thread Erik Hatcher
I'm cross-posting to java-dev just to let folks know we've moved this conversation over to [EMAIL PROTECTED] On May 10, 2005, at 8:14 AM, Brian McCallister wrote: The ruby part of it is from scratch, but I am using Andi's work extensively, and it relies on that work for all practical purposes.

Re: patch to build.xml

2005-05-11 Thread Erik Hatcher
I'll apply this patch tomorrow if you don't do it before hand. Looks good to me. I thought I had made the accommodation for *Test and Test* - I must have, but then ditched my changes for some reason as I'm certain I had made that change locally at one point. Erik On May 11, 2005, at 2:

Fwd: [EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2005-05-15 Thread Erik Hatcher
Gumpers, What is the issue with Lucene building with Gump? To me it looks as if its not doing a clean build and thus cannot find the JAR that was successfully built in a past run because its looking for it by a different dated name. What does it take to get a clean build going? Should that

Re: Lucene vs. Ruby/Odeum

2005-05-17 Thread Erik Hatcher
On May 16, 2005, at 10:41 PM, Otis Gospodnetic wrote: Some interesting stuff... http://www.zedshaw.com/projects/ruby_odeum/performance.html That's nice flamebait for sure. The fact of the matter is that JVM startup speed is a well-known issue and to truly compare indexing/ searching speed the s

Re: svn commit: r178059 - /lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanNearQuery.java

2005-05-24 Thread Erik Hatcher
On May 24, 2005, at 3:05 AM, Paul Elschot wrote: + public int hashCode() { +int result; +result = clauses.hashCode(); +result = 29 * result + slop; How about: result += slop * 29; +result = 29 * result + (inOrder ? 1 : 0); result += (inOrder ? 1 : 0); // or some othe

Re: Contributing

2005-05-25 Thread Erik Hatcher
Danilo, Welcome! We'd love to have your Italian analyzer as part of the contrib/ analyzers codebase. The easiest way to get your contribution accepted is to follow these steps: * Be sure to use the Apache Software License 2.0 * Fit your code into the structure in Lucene's Subversio

Re: contrib/queryParsers/surround

2005-05-28 Thread Erik Hatcher
On May 28, 2005, at 10:04 AM, Paul Elschot wrote: Dear readers, I've started moving the surround query language http://issues.apache.org/bugzilla/show_bug.cgi?id=34331 into the directory named by the title in my working copy of the lucene trunk. When the tests pass I'll repost it there. In case

Re: contrib/queryParsers/surround

2005-05-28 Thread Erik Hatcher
On May 28, 2005, at 1:07 PM, Paul Elschot wrote: A little bit of deprecation is left in the CharStream (getLine and getColumn) in the parser. Would you have any idea how to deal with that? This is due to Java 1.5, right? I'm seeing the same thing in my project but haven't looked into it y

Re: contrib/queryParsers/surround

2005-05-29 Thread Erik Hatcher
I concur with Daniel on this. For the moment, my preference is to bring in Paul's parser into contrib/surround and let it gain some additional exposure there. I don't believe its possible or even preferable to attempt to build one query parser to rule them all. While a decent general pur

Re: Lucene vs. Ruby/Odeum

2005-06-01 Thread Erik Hatcher
On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote: On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote: http://www.zedshaw.com/projects/ruby_odeum/performance.html Here's a follow up: http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html Now the claim is that Lucene is faster

Re: Lucene vs. Ruby/Odeum

2005-06-01 Thread Erik Hatcher
tudes, or flame- bait we may encounter. Erik On Jun 1, 2005, at 7:48 PM, Robert Engels wrote: I think I am going to start a new Blog - "Zed's an Idiot". -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 01, 2005 6:39 PM To: java-d

Re: Lucene vs. Ruby/Odeum

2005-06-02 Thread Erik Hatcher
Zed has updated his second part with more experiments with different JVM's and memory settings: http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html On Jun 2, 2005, at 12:27 AM, Robert Engels wrote: I read all of Zed's posts on the subject and I feel he certainly presents a

Re: Getting patches into the main code

2005-06-02 Thread Erik Hatcher
Reece - your patch has been applied! Thank you for fixing this just in time for my project needing it :) I renamed the packages/classes of the tests and added the ASL header. There are some minor tweaks to the comments in the test that can be changed now that the bug is fixed - I'll adju

Re: contrib/surround

2005-06-05 Thread Erik Hatcher
2005 02:44, Erik Hatcher wrote: I concur with Daniel on this. For the moment, my preference is to bring in Paul's parser into contrib/surround and let it gain some additional exposure there. I don't believe its possible or even preferable to attempt to build one query parser to rule th

Re: How to index Chinese text?

2005-06-13 Thread Erik Hatcher
On Jun 13, 2005, at 12:01 PM, Zsolt Koppany wrote: Our application works with lucene-1.4.3 stable even for German text but we have problems with Chinese text. Which analyzer should we use to index Chinese text? This question is best posted to java-user, not java-dev, but I'll reply here f

Re: Term.compareTerm and MemoryIndex

2005-06-29 Thread Erik Hatcher
On Jun 29, 2005, at 4:26 PM, markharw00d wrote: Anyone have any objections to committing this addition to Term.java? _http://www.mail-archive.com/java-dev@lucene.apache.org/msg00618.html_ This change looks good to me. I would have committed it earlier if others had ok'd it. Erik -

Re: [VOTE] Wolfgang as committer

2005-07-01 Thread Erik Hatcher
+1 MemoryIndex is a great addition to Lucene and Wolfgang has shown he understands the internals (better than I do!). Erik On Jul 1, 2005, at 1:34 PM, markharw00d wrote: I'd like to propose Wolfgang Hoschek should be given commit rights to maintain his MemoryIndex contribution. Tho

Re: Unexpected: ordered

2005-07-04 Thread Erik Hatcher
On Jul 4, 2005, at 4:51 PM, Dave Kor wrote: *chuckles* It seems I can post to this list without subscribing to it. :) I moderate in messages that are on topic but from unsubscribed addresses quite often. Perhaps this was the case? Erik --

Re: 2nd call - [Vote] Wolfgang Hoschek for committer

2005-07-11 Thread Erik Hatcher
Again a big +1 from me. Wolfgang - if you're tuned in here, please submit a CLA to Apache to get the ball rolling on the administrative side of things. Erik On Jul 11, 2005, at 12:38 PM, mark harwood wrote: Responses were light last time around: I'd like to propose Wolfgang Hoschek s

Re: IndexWriter and system properties

2005-07-12 Thread Erik Hatcher
On Jul 11, 2005, at 5:56 PM, Daniel Naber wrote: there's a bug report (#34359) asking to catch and ignore access exceptions when reading system properties so Lucene can be used in an applet. I wanted to apply that patch, but now I'm not sure anymore: does it make sense for Lucene to read setti

Re: getting Analyzer's stop words

2005-07-15 Thread Erik Hatcher
On Jul 15, 2005, at 7:50 AM, Daniel Naber wrote: I'd like to add the following extension to the abstract analyzer class: public abstract Set getStopwords(); This method returns the stop words in use. Subclasses that don't use stop words at all will have to return an empty HashSet (or nu

Re: AW: Topic Maps/Clustering + Lucene... how?

2005-07-16 Thread Erik Hatcher
On Jul 16, 2005, at 6:51 AM, [EMAIL PROTECTED] wrote: I programmed a hierarchical and a partioning Clustering based on the lucene API. the lucene API offers some great methods, which are very useful for clustering. if you want to programm your own solution: look for scatter-gather, especially

Re: ant javacc broken?

2005-07-18 Thread Erik Hatcher
Mike, Thanks for letting us know. I have just correcting the build file and committed it. PrecedenceQueryParser moved to contrib/ miscellaneous a while ago, but the build file was not adjusted - my fault. Erik On Jul 18, 2005, at 1:10 PM, Mike Hanafey wrote: With the latest svn tr

BooleanScorer2 ArrayIndexOutOfBoundsException

2005-07-20 Thread Erik Hatcher
I've seen this issue appear before, but I don't recall seeing if a solution was posted. Using a sophisticated query with nested SpanQuery's the following exception occurred (using a recent Subversion HEAD version). Anyone have ideas on what caused it and what may fix it? Thanks, Eri

Re: Extending the similarity class

2005-07-22 Thread Erik Hatcher
On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote: Hello, I am using lucene to search plain text, but the order of the search results is not satisfying to my needs. First, I want to know how the similarity works. Then, I need to extend it. Use IndexSearcher.explain() to see how each individu

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-22 Thread Erik Hatcher
Paul, I don't have a test case handy (yet), but we're still seeing the exception even after applying the patch from #35823. Do I need to apply some of the code from 34154 as well? Thanks, Erik On Jul 22, 2005, at 2:31 PM, [EMAIL PROTECTED] wrote: DO NOT REPLY TO THIS EMAIL, BUT PLE

Re: Extending the similarity class

2005-07-23 Thread Erik Hatcher
perhaps SpanOrQuery already does this sort of thing - though I don't think so. Erik On 7/22/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote: Hello, I am using lucene to search plain text, but the order of the search resu

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-25 Thread Erik Hatcher
On Jul 24, 2005, at 11:23 AM, Paul Elschot wrote: On Friday 22 July 2005 21:18, Erik Hatcher wrote: Paul, I don't have a test case handy (yet), but we're still seeing the exception even after applying the patch from #35823. Do I need to I'm sorry to hear that. However,

Re: DO NOT REPLY [Bug 34154] - Further improvements to BooleanScorer2

2005-07-26 Thread Erik Hatcher
On Jul 26, 2005, at 3:04 AM, Paul Elschot wrote: +(spanNear([FULLTEXT:cat, FULLTEXT:dog, FULLTEXT:bird], 1, true) spanNear([FULLTEXT:horse, FULLTEXT:cow, FULLTEXT:pig], 1, true) spanNear([FULLTEXT:snake, FULLTEXT:camel], 0, true)) +(FULLTEXT:zebra FULLTEXT:insect spanNear([FULLTEXT:feline, FULLT

Re: Map-Reduce

2005-08-08 Thread Erik Hatcher
On Aug 4, 2005, at 1:27 PM, Otis Gospodnetic wrote: [1] http://wiki.apache.org/nutch-data/attachments/Presentations/ attachments/oscon05.pdf Does anyone have any more info from Doug's MapReduce presentation (transcript, notes, audio, video)? I was at Doug's OSCON presentation but did not se

Re: ISOLatin1AccentFilter + KeywordAnalyzer to core?

2005-08-08 Thread Erik Hatcher
On Aug 5, 2005, at 2:54 PM, Daniel Naber wrote: shouldn't only those analyzers be in contrib that are language specific? I don't see this as a necessary distinction. It seems fine to allow all sorts of various analyzers and filters/tokenizers to aggregate in that contrib area, for pieces

Re: NullPointerException: FSDirectory.create(FSDirectory.java:174)

2005-08-08 Thread Erik Hatcher
Mike, I have committed your patch and a slightly modified (reformatted, added Apache Software License) test case. Thanks for the patch AND test case! Erik On Aug 8, 2005, at 11:12 AM, Michael Goddard wrote: Hi All, To follow up that earlier post: here is a JUnit test to illustrate

Re: Hit Score

2005-08-10 Thread Erik Hatcher
On Aug 10, 2005, at 11:51 AM, Rajesh Munavalli wrote: I indexed a single document containing only one word. When I search for the same word I get a hit score of "0.3". Shouldn't I be getting "1.0"? Try out IndexSearcher.explain() to see why. There are numerous factors involved. Eri

  1   2   3   4   5   >