[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-1096:
Fix Version/s: 2.3
Lucene Fields: [New, Patch Available] (was: [New])
> Deleting docs of all
[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-1096:
Attachment: lucene-1096.patch
Patch with tests for the two scenarios described above, and a fix fo
[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553817
]
Doron Cohen commented on LUCENE-1096:
-
It seems that this is a serious problem with Hits based search. An applic
[
https://issues.apache.org/jira/browse/LUCENE-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1097:
---
Attachment: LUCENE-1097.patch
Patch attached. I plan to commit in a day or two.
I
How about defaulting to a max token size of 16K in StandardTokenizer, so
that it never causes an IndexWriter exception, with an option to reduce
that size?
The backward incompatibilty is limited then - tokens exceeding 16K will
NOT causing an IndexWriter exception. In 3.0 we can reduce that d
[
https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553778
]
Daniel Naber commented on LUCENE-770:
-
I think there's a small issue which is also in IndexReader.main: the javad
Gabi Steinberg wrote:
On balance, I think that dropping the document makes sense. I
think Yonik is right in that ensuring that keys are useful - and
indexable - is the tokenizer's job.
StandardTokenizer, in my opinion, should behave similarly to a
person looking at a document and decidin
OK I will take this approach... create TermTooLongException
(subclasses RuntimeException), listed in the javadocs but not the
throws clause of add/updateDocument. DW throws this if it encounters
any term >= 16383 chars in length.
Whenever that exception (or others) are thrown from within
This is now complete. Please let me know if you see any problems
with Hudson.
Nige
On Dec 18, 2007, at 10:59 PM, Nigel Daley wrote:
I'd like to upgrade Hudson (http://lucene.zones.apache.org:8080/
hudson/) from 1.136 to 1.161 tomorrow (Dec 19). I'll also be
upgrading some existing plugin
On Dec 20, 2007 2:25 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Makes sense. I wasn't sure if declaring new exceptions to be thrown
> is violating back-compat. issues or not (even if they are runtime
> exceptions)
That's a good question... I know that declared RuntimeExceptions are
containe
Makes sense. I wasn't sure if declaring new exceptions to be thrown
is violating back-compat. issues or not (even if they are runtime
exceptions)
On Dec 20, 2007, at 1:47 PM, Yonik Seeley wrote:
On Dec 20, 2007 1:36 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
But, I can see the value i
On Dec 20, 2007 1:36 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> But, I can see the value in the throw the exception
> case too, except I think the API should declare the exception is being
> thrown. It could throw an extension of IOException.
To be robust, user indexing code needs to catch
On Dec 20, 2007, at 11:57 AM, Michael McCandless wrote:
Yonik Seeley wrote:
On Dec 20, 2007 11:33 AM, Gabi Steinberg
<[EMAIL PROTECTED]> wrote:
It might be a bit harsh to drop the document if it has a very long
token
in it.
There is really two issues here.
For long tokens, one could ei
On balance, I think that dropping the document makes sense. I think
Yonik is right in that ensuring that keys are useful - and indexable -
is the tokenizer's job.
StandardTokenizer, in my opinion, should behave similarly to a person
looking at a document and deciding which tokens should be in
On Dec 20, 2007 10:07 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Hmmm, I will have to take a look at Token.clone. I must admit I don't
> know a lot about the perf. differences between clone and new, but I
> would think the cost should be on par, if not a little cheaper,
> otherwise what's th
20 dec 2007 kl. 16.07 skrev Grant Ingersoll:
I must admit I don't know a lot about the perf. differences between
clone and new, but I would think the cost should be on par, if not a
little cheaper, otherwise what's the point?
My guess is that clone() is a convenience implementation that to
On Dec 20, 2007 11:57 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> > On Dec 20, 2007 11:33 AM, Gabi Steinberg
> > <[EMAIL PROTECTED]> wrote:
> >> It might be a bit harsh to drop the document if it has a very long
> >> token
> >> in it.
> >
> > There is really two issues
Yonik Seeley wrote:
On Dec 20, 2007 11:33 AM, Gabi Steinberg
<[EMAIL PROTECTED]> wrote:
It might be a bit harsh to drop the document if it has a very long
token
in it.
There is really two issues here.
For long tokens, one could either ignore them or generate an
exception.
I can see th
On Dec 20, 2007 11:33 AM, Gabi Steinberg <[EMAIL PROTECTED]> wrote:
> It might be a bit harsh to drop the document if it has a very long token
> in it.
There is really two issues here.
For long tokens, one could either ignore them or generate an exception.
For all exceptions generated while index
On Dec 20, 2007, at 10:55 AM, Yonik Seeley wrote:
On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
I'm wondering if the IndexWriter should throw an explicit exception
in
this case as opposed to a RuntimeException,
RuntimeExceptions can happen in analysis components durin
Yonik Seeley wrote:
On Dec 20, 2007 11:15 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Though ... we could simply immediately delete the document when any
exception occurs during its processing. So if we think whenever any
doc hits an exception, then it should be deleted, it's not so ha
It might be a bit harsh to drop the document if it has a very long token
in it. I can imagine documents with embedded binary data, where the
text around the binary data is still useful for search.
My feeling is that long tokens (longer than 128 or 256 bytes) are not
useful for search, and sho
On Dec 20, 2007 11:15 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> Though ... we could simply immediately delete the document when any
> exception occurs during its processing. So if we think whenever any
> doc hits an exception, then it should be deleted, it's not so hard to
> implement th
Yonik Seeley wrote:
On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
I'm wondering if the IndexWriter should throw an explicit
exception in
this case as opposed to a RuntimeException,
RuntimeExceptions can happen in analysis components during indexing
anyway, so it seems
Yonik Seeley wrote:
as it seems to me really
long tokens should be handled more gracefully. It seems strange that
the message says the terms were skipped (which the code does in fact
do), but then there is a RuntimeException thrown which usually
indicates to me the issue is not recoverable.
On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I'm wondering if the IndexWriter should throw an explicit exception in
> this case as opposed to a RuntimeException,
RuntimeExceptions can happen in analysis components during indexing
anyway, so it seems like indexing code shou
Hmmm, I will have to take a look at Token.clone. I must admit I don't
know a lot about the perf. differences between clone and new, but I
would think the cost should be on par, if not a little cheaper,
otherwise what's the point? It also seems like we shouldn't have to
go through nulling
IndexWriter.close(false) does not actually stop background merge threads
Key: LUCENE-1097
URL: https://issues.apache.org/jira/browse/LUCENE-1097
Project: Lucene - Java
On Dec 20, 2007 9:41 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I am getting the following exception when running against trunk:
> java.lang.IllegalArgumentException: at least one term (length 20079)
> exceeds max term length 16383; these terms were skipped
> at
> org
> .apache.lucene.ind
I am getting the following exception when running against trunk:
java.lang.IllegalArgumentException: at least one term (length 20079)
exceeds max term length 16383; these terms were skipped
at
org
.apache.lucene.index.IndexWriter.checkMaxTermLength(IndexWriter.java:
1545)
at
org.apac
[
https://issues.apache.org/jira/browse/LUCENE-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1094.
Resolution: Fixed
> Exception in DocumentsWriter.addDocument can corrupt stored fi
[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-1096:
Attachment: TestSearchDelete.java
Test failing with this bug.
> Deleting docs of all returned Hit
[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-1096:
Description:
For background user discussion:
http://www.nabble.com/document-deletion-problem-to144
[
https://issues.apache.org/jira/browse/LUCENE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen reassigned LUCENE-1096:
---
Assignee: Doron Cohen
> Deletion from index causes ArrayIndexOutOfBoundsException
>
34 matches
Mail list logo