Re: detected corrupted index / performance improvement

2008-02-07 Thread Michael McCandless
DM Smith wrote: On Feb 6, 2008, at 6:42 PM, Mark Miller wrote: Hey DM, Just to recap an earlier thread, you need the sync and you need hardware that doesn't lie to you about the result of the sync. Here is an excerpt about Digg running into that issue: They had problems with their

Re: detected corrupted index / performance improvement

2008-02-07 Thread Michael McCandless
But then you're back to syncing in a BG thread, right? We've come full circle. Asynchronously syncing give the best performance we've seen so far, and so that's the current patch on LUCENE-1044 (using CMS's threads). Using a transaction log would also require async. syncing, but then would

Re: Background Merges

2008-02-07 Thread suresh guvvala
I think, I have a test case to reproduce java.io.IOException: read past EOF execption while merging. The attached code generates this exception upon executing it. Suresh. On 12/19/07, Michael McCandless [EMAIL PROTECTED] wrote: Grant Ingersoll wrote: The field that is causing the problem

[jira] Commented: (LUCENE-1166) A tokenfilter to decompose compound words

2008-02-07 Thread Thomas Peuss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566568#action_12566568 ] Thomas Peuss commented on LUCENE-1166: -- A Swedish hyphenation grammar is available at

Re: detected corrupted index / performance improvement

2008-02-07 Thread Michael McCandless
Good idea; I'll call this (if your hardware ignores the sync() call then you're in trouble) out in the javadocs with LUCENE-1044. Mike Mark Miller wrote: We should really probably mention it in the JavaDoc when the issue is done. I think both yonik and robert pointed it out, and ever

[jira] Updated: (LUCENE-1141) WikipediaTokenizer incorrectly splits certain syntax into multiple tokens

2008-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1141: Attachment: LUCENE-1141-test.patch Here's a test case for the problem

Re: detected corrupted index / performance improvement

2008-02-07 Thread robert engels
This is simply not true. Two different issues are at play. You cannot have a true 'commit' unless it is synchronous! Lucene-1044 might allow the index to be brought back to a consistent state, but not one that is consistent with a synchronization point. For example, I write three documents

[jira] Reopened: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reopened LUCENE-1084: - Assignee: Steven Rowe (was: Michael McCandless) Lucene Fields: [New, Patch Available]

[jira] Updated: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-1084: Attachment: LUCENE-1084.part2.take2.patch The javadoc description of the MaxFieldLength parameter

[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)

2008-02-07 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566715#action_12566715 ] Doron Cohen commented on LUCENE-1157: - Ok great! Now we can link to this page, and

[jira] Assigned: (LUCENE-1169) Search with Filter does not work!

2008-02-07 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reassigned LUCENE-1169: - Assignee: Michael Busch Search with Filter does not work!

[jira] Commented: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566717#action_12566717 ] Michael McCandless commented on LUCENE-1084: This looks good! Thanks Steven.

[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)

2008-02-07 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566738#action_12566738 ] Hoss Man commented on LUCENE-1157: -- note the commit mesg: Separate project's web site

[jira] Updated: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-1084: Attachment: LUCENE-1084.part3.patch Patch replacing [IW constructor, MFL setter] sequences with

postings without position information ?

2008-02-07 Thread robert engels
I think there are many uses of Lucene that would benefit from 'enum' fields, aka categories. When classifying documents, they are often in one or more categories. Lucene could write these posting very efficiently using VINT and RLE (run length encoding) if the positions information was not

[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)

2008-02-07 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566723#action_12566723 ] Steven Rowe commented on LUCENE-1157: - bq. But developer-resuorces.xml was deleted

[jira] Commented: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566728#action_12566728 ] Steven Rowe commented on LUCENE-1084: - Mike, I see you added a test for the

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Doug Cutting
Ning, I am also interested in starting a new project in this area. The approach I have in mind is slightly different, but hopefully we can come to some agreement and collaborate. My current thinking is that the Solr search API is the appropriate model. Solr's facets are an important

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-07 Thread Andrzej Bialecki
Doug Cutting wrote: Ning, I am also interested in starting a new project in this area. The approach I have in mind is slightly different, but hopefully we can come to some agreement and collaborate. I'm interested in this too. My current thinking is that the Solr search API is the

[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)

2008-02-07 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566745#action_12566745 ] Doron Cohen commented on LUCENE-1157: - Thanks Hoss, got it now. Formatable changes

Re: postings without position information ?

2008-02-07 Thread Grant Ingersoll
Search the archive for flexible indexing. There have been a number of discussions on things like this, although I don't know that your specific issue was ever covered, but it seems like it fits in that model. I think there was even a patch at one point in time. -Grant On Feb 7, 2008, at

Re: postings without position information ?

2008-02-07 Thread eks dev
yap, also without frequencies, this should not be all that difficult (imho), especially now when we have DocSetIdIterator as superclass, as a matter of fact you could even today get DocSetIterator from TermDocs or whatever and use it as Filter as a lightweight, in memory solution ... real

Re: detected corrupted index / performance improvement

2008-02-07 Thread robert engels
I might be misunderstanding 1044. There were several approaches, and I am not certain what was the final??? I reread the bug and am still a bit unclear. If the segments are sync'd as part of the commit, then yes, that would suffice. The merges don't need to commit, you just can't delete

[jira] Created: (LUCENE-1169) Search with Filter does not work!

2008-02-07 Thread Eks Dev (JIRA)
Search with Filter does not work! - Key: LUCENE-1169 URL: https://issues.apache.org/jira/browse/LUCENE-1169 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Eks Dev

[jira] Commented: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566624#action_12566624 ] Michael McCandless commented on LUCENE-1084: Yeah, the intention here was that

[jira] Commented: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566791#action_12566791 ] Michael McCandless commented on LUCENE-1084: Good -- I'll commit. Thanks!

[jira] Resolved: (LUCENE-1084) increase default maxFieldLength?

2008-02-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1084. Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch

Re: detected corrupted index / performance improvement

2008-02-07 Thread Michael McCandless
robert engels wrote: I might be misunderstanding 1044. There were several approaches, and I am not certain what was the final??? The final approach (take 7) is to make the index consistent (sync the files) after finishing a merge. Also, a new method (commit) is added which will force

[jira] Updated: (LUCENE-1169) Search with Filter does not work!

2008-02-07 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1169: -- Attachment: lucene-1169.patch The problem is that in IndexSearcher#search() scorer.skipTo()

[jira] Commented: (LUCENE-1169) Search with Filter does not work!

2008-02-07 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566840#action_12566840 ] Paul Elschot commented on LUCENE-1169: -- The patch looks correct to me, I missed the