Re: Usefulness of Similarity.queryNorm()

2008-02-12 Thread Paul Elschot
Op Wednesday 13 February 2008 04:48:31 schreef Marvin Humphrey: ... > > > Heck, I'd love to eliminate ALL the automatic normalization code... if > only I could figure out what all the hidden side effects are. :( > > My goal is to de-voodoofy the Query-Weight-Scorer compilation phase so > th

Re: Usefulness of Similarity.queryNorm()

2008-02-12 Thread Marvin Humphrey
On Feb 12, 2008, at 5:04 PM, Grant Ingersoll wrote: I don't know a lot about it, but my understanding has always been that comparing across queries is difficult at best, so that would argue for removing it, but I haven't done any research into it. I think it has been in Lucene for a good

Build failed in Hudson: Lucene-trunk #375

2008-02-12 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/375/changes Changes: [mikemccand] LUCENE-1176: fix corruption case when adding docs with no term vectors followed by docs with term vectors [doronc] LUCENE-997: Add search timeout (partial) support. [mikemccand] LUCENE-1175: add missin

[ANN] Luke 0.8.1 released

2008-02-12 Thread Andrzej Bialecki
Hi all, I decided to make a quick update to the previous release and to address some issues related to the way you can work with TermVectors and Payloads. As usually, you can get the binaries and sources here: http://www.getopt.org/luke New features and improvements: ---

Re: Usefulness of Similarity.queryNorm()

2008-02-12 Thread Grant Ingersoll
:-) I don't know a lot about it, but my understanding has always been that comparing across queries is difficult at best, so that would argue for removing it, but I haven't done any research into it. I think it has been in Lucene for a good long time, so it may be that the history of why

Re: Usefulness of Similarity.queryNorm()

2008-02-12 Thread Marvin Humphrey
On Feb 12, 2008, at 9:08 AM, Marvin Humphrey wrote: What would the consequences be of eliminating Similarity.queryNorm()? I cargo-culted that method when porting, but now I'm going through and trying to refactor for simplicity's sake. If I can zap it, I'd like to. I infer from the deaf

[jira] Updated: (LUCENE-1176) TermVectors corruption case when autoCommit=false

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1176: --- Attachment: LUCENE-1176.take2.patch Attached patch fixes the corruption case. It ha

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568316#action_12568316 ] Michael McCandless commented on LUCENE-1175: {quote} FYI, I wasn't able to rep

[jira] Resolved: (LUCENE-997) Add search timeout support to Lucene

2008-02-12 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-997. Resolution: Fixed Lucene Fields: [Patch Available] (was: [New, Patch Available]) Committed

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-12 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568302#action_12568302 ] Yonik Seeley commented on LUCENE-1175: -- FYI, I wasn't able to reproduce on the 2.3 br

Re: Index with payloads needed

2008-02-12 Thread Andrzej Bialecki
Grant Ingersoll wrote: The contrib/analyzer module has several TokenFilters that create Payloads using the offset or type information from a Token. See o.a.l.analysis.payloads. That should be sufficient for your testing in that it adds payloads to tokens based on readily available Token info

[jira] Assigned: (LUCENE-997) Add search timeout support to Lucene

2008-02-12 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-997: -- Assignee: Doron Cohen > Add search timeout support to Lucene >

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2008-02-12 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568292#action_12568292 ] Yonik Seeley commented on LUCENE-997: - > My preference would be for core o.a.l.search.

Re: Index with payloads needed

2008-02-12 Thread Grant Ingersoll
The contrib/analyzer module has several TokenFilters that create Payloads using the offset or type information from a Token. See o.a.l.analysis.payloads. That should be sufficient for your testing in that it adds payloads to tokens based on readily available Token information. -Grant On

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2008-02-12 Thread Timo Nentwig (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568286#action_12568286 ] Timo Nentwig commented on LUCENE-997: - I agree, core. > Add search timeout support to

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2008-02-12 Thread Sean Timm (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568280#action_12568280 ] Sean Timm commented on LUCENE-997: -- "If there are no more major concerns I think this is n

Index with payloads needed

2008-02-12 Thread Andrzej Bialecki
Hi all, I'm testing the payloads support in Luke, and I need a small index with payloads - if you happen to have one, please contact me off the list. Thank you! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Informat

Usefulness of Similarity.queryNorm()

2008-02-12 Thread Marvin Humphrey
Greets, What would the consequences be of eliminating Similarity.queryNorm()? I cargo-culted that method when porting, but now I'm going through and trying to refactor for simplicity's sake. If I can zap it, I'd like to. First, the theoretical angle: According to the Similarity docs, que

Re: Lingustically-enhanced indexing for Lucene

2008-02-12 Thread Grant Ingersoll
On Feb 12, 2008, at 9:47 AM, [EMAIL PROTECTED] wrote: The best way to do this is to create a patch and attach it to a JIRA issue. http://wiki.apache.org/lucene-java/HowToContribute has the details. Ok, I will read it. Thanks Sounds like an interesting project. What are the licensing ter

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-12 Thread Michael Busch
Grant Ingersoll wrote: > > > I'd suggest at least a week, as it sounds like we need to put this > through the wringer a bit more. > I agree! Shall we add a news item to the website where we list these known issues and announce that there will be a 2.3.1 release in aprox. 1-2 weeks? -Michael -

Re: Lingustically-enhanced indexing for Lucene

2008-02-12 Thread fsanchez
> The best way to do this is to create a patch and attach it to a JIRA > issue. http://wiki.apache.org/lucene-java/HowToContribute has the > details. Ok, I will read it. Thanks > > Sounds like an interesting project. What are the licensing terms for > Apertium? On a side note, you might

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-12 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568145#action_12568145 ] Yonik Seeley commented on LUCENE-1175: -- I'll try on Lucene 2.3 soon. I had assumed th

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-12 Thread Grant Ingersoll
On Feb 11, 2008, at 6:49 PM, Michael Busch wrote: Yonik Seeley (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567878 #action_12567878 ] Yonik Seeley commented on LUCENE-1173: --

[jira] Updated: (LUCENE-1177) IW.optimize() can do too many merges at the very end

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1177: --- Attachment: LUCENE-1177.patch Attached patch. Will commit shortly to 2.3. > IW.opt

[jira] Created: (LUCENE-1177) IW.optimize() can do too many merges at the very end

2008-02-12 Thread Michael McCandless (JIRA)
IW.optimize() can do too many merges at the very end Key: LUCENE-1177 URL: https://issues.apache.org/jira/browse/LUCENE-1177 Project: Lucene - Java Issue Type: Bug Components: In

[jira] Updated: (LUCENE-1175) occasional MergeException while indexing

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1175: --- Attachment: LUCENE-1175.patch Yonik, are you able to repro this on 2.3? I can't. A

[jira] Updated: (LUCENE-1176) TermVectors corruption case when autoCommit=false

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1176: --- Attachment: LUCENE-1176.patch Attached patch that extends TestStressIndexing2 to als

[jira] Created: (LUCENE-1176) TermVectors corruption case when autoCommit=false

2008-02-12 Thread Michael McCandless (JIRA)
TermVectors corruption case when autoCommit=false - Key: LUCENE-1176 URL: https://issues.apache.org/jira/browse/LUCENE-1176 Project: Lucene - Java Issue Type: Bug Components: Index

Re: Lingustically-enhanced indexing for Lucene

2008-02-12 Thread Grant Ingersoll
The best way to do this is to create a patch and attach it to a JIRA issue. http://wiki.apache.org/lucene-java/HowToContribute has the details. Sounds like an interesting project. What are the licensing terms for Apertium? On a side note, you might be interested in Mahout (http://lucene.

Re: Creating a index scheduler with Java.

2008-02-12 Thread Grant Ingersoll
Hi, While this question is best asked on the java-user mailing list, I would have a look at the OpenSymphony Quartz Java scheduler. Just search for Quartz Java. -Grant On Feb 12, 2008, at 5:47 AM, galford23 wrote: Hi all, I am trying to do some scheduling / cron job for lucene indexing

Lingustically-enhanced indexing for Lucene

2008-02-12 Thread fsanchez
The Transducens Group (http://transducens.dlsi.ua.es) at University of Alicante (http://www.ua.es) has developed a tool that allows the Lucene search engine to use morphological information while indexing and then process smarter queries in which morphological attributes can be used to specify que

[jira] Commented: (LUCENE-1163) CharArraySet.contains(char[] text, int off, int len) does not work

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568056#action_12568056 ] Michael McCandless commented on LUCENE-1163: Backported to 2.3 > CharArraySet

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568053#action_12568053 ] Michael McCandless commented on LUCENE-1173: Backported to 2.3. {quote} Patc

[jira] Updated: (LUCENE-1166) A tokenfilter to decompose compound words

2008-02-12 Thread Thomas Peuss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Peuss updated LUCENE-1166: - Attachment: CompoundTokenFilter.patch Changes: * added unittest * minor tweaks for getting the e

[jira] Commented: (LUCENE-1163) CharArraySet.contains(char[] text, int off, int len) does not work

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568050#action_12568050 ] Michael McCandless commented on LUCENE-1163: I'll port this one to 2.3.1 as we

[jira] Commented: (LUCENE-1168) TermVectors index files can become corrupt when autoCommit=false

2008-02-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568048#action_12568048 ] Michael McCandless commented on LUCENE-1168: Backported to 2.3 branch. > Term

Creating a index scheduler with Java.

2008-02-12 Thread galford23
Hi all, I am trying to do some scheduling / cron job for lucene indexing. I am very new to Lucene and Java. Can I get some advices on how can I achieve it? Books or url link or technology required . I have been searching the web for quite some time but just cannot get the correct result.. maybe

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-12 Thread Michael McCandless
Michael Busch wrote: OK, I suggest that we should wait a couple of days before we cut 2.3.1 in case there are more problems. We should backport the patches and commit them to the 2.3 branch. I'll then end of this week create a 2.3.1 tag, build release artifacts and call a vote. Sounds good?