errorMsg("Invalid or corrupted index.");
return;
}
{code}
So, what Luke really means is there are 0 fields found in the
index, ie it's an empty index.
You're lucky that I spotted this message ... ;)
I'll fix it in the next minor release of Luke, pretty soon.
Howev
1454:
You're right, Luke is saying that!
But, that's a misleading error -- here are the sources in Luke for that error:
{code}
fn = ir.getFieldNames(IndexReader.FieldOption.ALL);
if (fn.size() == 0) {
errorMsg("Invalid or corrupted index.");
return;
}
{code}
So, what Lu
when
opening an empty index.
> Corrupted index produced by lucene 2.4
> --
>
> Key: LUCENE-1454
> URL: https://issues.apache.org/jira/browse/LUCENE-1454
> Project: Lucene - Java
>
2 AM
Andrew Zhang - 16/Nov/08 05:02 AM
:) I'll close the jira. Thanks!
> Corrupted index produced by lucene 2.4
> --
>
> Key: LUCENE-1454
> URL: https://issues.apache.org/jira/browse/LUCENE-1454
>
e of Luke.java, ~Ln 800:
fn = ir.getFieldNames(IndexReader.FieldOption.ALL);
if (fn.size() == 0) {
errorMsg("Invalid or corrupted index.");
return;
}
It seems that normal empty index will be reported as "Invalid or corrupted
index".
I tried to create
is saying that!
But, that's a misleading error -- here are the sources in Luke for that error:
{code}
fn = ir.getFieldNames(IndexReader.FieldOption.ALL);
if (fn.size() == 0) {
errorMsg("Invalid or corrupted index.");
return;
}
{code}
So, what Luke really means is there are 0 f
ndex shows the index is OK, while Luke (both 0.8.1 and 0.9)
shows "Invalid or corrupted index".
Luke "Tools -> Check index tool" also shows no problem of the index.
Looks like a bug of Luke "Open Index". I'll take a close look soon. Thanks
again!
and ran CheckIndex on it, and it did not report
any exception.
There is a leftover write.lock, which you'll need to remove before opening
another writer.
Can you post the full exception you're hitting?
> Corrupted index produ
[
https://issues.apache.org/jira/browse/LUCENE-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Zhang updated LUCENE-1454:
-
Description:
Hi,
I found corrupted index produced by lucene-2.4. I can't find a w
[
https://issues.apache.org/jira/browse/LUCENE-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Zhang updated LUCENE-1454:
-
Attachment: index.zip
> Corrupted index produced by lucene
Corrupted index produced by lucene 2.4
--
Key: LUCENE-1454
URL: https://issues.apache.org/jira/browse/LUCENE-1454
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions
Doug Cutting wrote:
The linux
kernel dynamically increases the readahead window based on the access
pattern: the more you read sequentially, the larger the readahead window.
Sorry, it appears that's in 2.6.23, which isn't yet broadly used.
http://kernelnewbies.org/Linux_2_6_23#head-102af26593
robert engels wrote:
But that would mean we should be using at least 250k buffers for the
IndexInput ? Not the 16k or so that is the default.
Is the OS smart enough to figure out that the file is being sequentially
read, and adjust its physical read size to 256k, based on the other
concurrent
But that would mean we should be using at least 250k buffers for the
IndexInput ? Not the 16k or so that is the default.
Is the OS smart enough to figure out that the file is being
sequentially read, and adjust its physical read size to 256k, based
on the other concurrent IO operations. See
Michael McCandless wrote:
Merging is far more IO intensive. With mergeFactor=10, we read from
40 input streams and write to 4 output streams when merging the
tii/tis/frq/prx files.
If your disk can transfer at 50MB/s, and takes 5ms/seek, then 250kB
reads and writes are the break-even point, w
Mike, you're right: all lucene files are written sequentially
(flushing or merging).
It's just a matter of how many are open at once, and whether we are
also reading from source(s) files, which affects IO throughput far
less than truly random access writes.
Plus, as of LUCENE-843, bytes are wri
Oh, it certainly causes some random access--I don't deny that. I
just want to emphasize that this isn't at all the same as all "random
writes", which would be expected to perform an order-mag slower.
Just did a test where I wrote out a 1gig file in 1K chunks. Then
wrote it out in 2files,
I don't think that is true - but I'm probably wrong though :).
My understanding is that several files are written in parallel
(during the merge), causing random access. After the files are
written, then they are all reread and written as a CFS file
(essential sequential - although the read
On 7-Feb-08, at 2:00 PM, robert engels wrote:
My point is that commit needs to be used in most applications, and
the commit in Lucene is very slow.
You don't have 2x the IO cost, mainly because only the log file
needs to be sync'd. The index only has to be sync'd eventually, in
order to
My point is that commit needs to be used in most applications, and
the commit in Lucene is very slow.
You don't have 2x the IO cost, mainly because only the log file needs
to be sync'd. The index only has to be sync'd eventually, in order
to prune the logfile - this can be done in the back
robert engels wrote:
I might be misunderstanding 1044. There were several approaches,
and I am not certain what was the final???
The final approach (take 7) is to make the index consistent (sync the
files) after finishing a merge. Also, a new method ("commit") is
added which will force
I might be misunderstanding 1044. There were several approaches, and
I am not certain what was the final???
I reread the bug and am still a bit unclear.
If the segments are sync'd as part of the commit, then yes, that
would suffice. The merges don't need to commit, you just can't delete
t
This is simply not true. Two different issues are at play. You cannot
have a true 'commit' unless it is synchronous!
Lucene-1044 might allow the index to be brought back to a consistent
state, but not one that is consistent with a synchronization point.
For example, I write three documents
Good idea; I'll call this ("if your hardware ignores the sync() call
then you're in trouble") out in the javadocs with LUCENE-1044.
Mike
Mark Miller wrote:
We should really probably mention it in the JavaDoc when the issue
is done. I think both yonik and robert pointed it out, and ever
We should really probably mention it in the JavaDoc when the issue is
done. I think both yonik and robert pointed it out, and ever since then
I have seen issues regarding it everywhere.
http://hardware.slashdot.org/article.pl?sid=05/05/13/0529252
Apparently, your just not ACID unless you have
In fact this is exactly the approach in the final patch on
LUCENE-1044 and it gives far better performance than the simply
synchronous (original) approach of syncing every segment file on close.
Using a transaction log would also require periodic syncing.
LUCENE-1044 syncs files after ever
DM Smith wrote:
On Feb 6, 2008, at 6:42 PM, Mark Miller wrote:
Hey DM,
Just to recap an earlier thread, you need the sync and you need
hardware that doesn't lie to you about the result of the sync.
Here is an excerpt about Digg running into that issue:
"They had problems with their sto
But then you're back to syncing in a BG thread, right? We've come
full circle.
Asynchronously syncing give the best performance we've seen so far,
and so that's the current patch on LUCENE-1044 (using CMS's threads).
Using a transaction log would also require async. syncing, but then
would also
That is the problem, waiting for the full sync (of all of the segment
files) takes quite a while... syncing a single log file is much more
efficient.
On Feb 6, 2008, at 9:41 PM, Andrew Zhang wrote:
On Feb 7, 2008 7:22 AM, robert engels <[EMAIL PROTECTED]> wrote:
That doesn't help, with la
On Feb 7, 2008 7:22 AM, robert engels <[EMAIL PROTECTED]> wrote:
> That doesn't help, with lazy writing/buffering by the OS, there is no
> guarantee that if the last written block is ok, that earlier blocks
> in the file are
>
> The OS/drive is going to physically write them in the most effici
On Feb 6, 2008, at 6:42 PM, Mark Miller wrote:
Hey DM,
Just to recap an earlier thread, you need the sync and you need
hardware that doesn't lie to you about the result of the sync.
Here is an excerpt about Digg running into that issue:
"They had problems with their storage system telling
Hey DM,
Just to recap an earlier thread, you need the sync and you need hardware
that doesn't lie to you about the result of the sync.
Here is an excerpt about Digg running into that issue:
"They had problems with their storage system telling them writes were on
disk when they really weren't
That doesn't help, with lazy writing/buffering by the OS, there is no
guarantee that if the last written block is ok, that earlier blocks
in the file are
The OS/drive is going to physically write them in the most efficient
manner. Only after a sync would this hold true (which is what we
Yes, but this pruning could be more efficient. On a background
thread, get current segment from segments file, call the system wide
sync ( e.g. System.exec("fsync"), then you can purge the transaction
logs for all segments up to that one. Since it is a background
operation, you are not bloc
On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).
Just a thought.
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).
If so, a much more efficient transactional system could be developed.
S
I had a recent sidebar with another user, and it got me to thinking.
Do we have any way of determining if a segment is definitely OK/VALID ?
If so, a much more efficient transactional system could be developed.
Serialize the updates to a log file. Sync the log. Update the lucene
index WITHOUT
last time
--
--
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Monday, April 02, 2002 11:51:42 GMT
To: lucene-dev@jakarta.apache.org
Cc: [EMAIL PROTECTED]
Subject: RE: corrupted index
Doug,
Yep, I think waiting until after 1.2 would be a good idea.
ic [mailto:[EMAIL PROTECTED]
Sent: Monday, April 02, 2002 11:51:42 GMT
To: lucene-dev@jakarta.apache.org
Cc: [EMAIL PROTECTED]
Subject: RE: corrupted index
Doug,
Yep, I think waiting until after 1.2 would be a good idea. As I find
time over the next couple of weeks, I'll try to start put
39 matches
Mail list logo