Sorry, for rtf it throws the following exception:
Unable to read entire header; 100 bytes read; expected 512 bytes
Is it a issue with POI of Lucene?. If so which build of POI contains fix for
this problem where i can get it?. Please tell me asap.
Thanks.
- Original Message
From:
[
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541874
]
Doug Cutting commented on LUCENE-1044:
--
Is a sync before every file close really needed [...] ?
It might be
robert engels wrote:
I don't think this would be any difference performance wise, and might
actually be slower.
When you call FD.sync() it only needs to ensure the dirty blocks
associated with that descriptor need to be saved.
The potential benefit is that you wouldn't have to wait for
robert engels wrote:
Would it not be simpler to pure Java...
Add the descriptor that needs to be sync'd (and closed) to a Queue.
Start a Thread to sync/close descriptors.
In commit(), wait for all sync threads to terminate using join().
+1
Doug
: Sorry, for rtf it throws the following exception:
: Unable to read entire header; 100 bytes read; expected 512 bytes
: Is it a issue with POI of Lucene?. If so which build of POI contains fix
: for this problem where i can get it?. Please tell me asap.
1) java-dev if for discussiong
On Nov 12, 2007 1:41 PM, robert engels [EMAIL PROTECTED] wrote:
Would it not be simpler to pure Java...
Add the descriptor that needs to be sync'd (and closed) to a Queue.
Start a Thread to sync/close descriptors.
In commit(), wait for all sync threads to terminate using join().
This would
I would be wary of the additional complexity of doing this.
It would be my vote to making 'sync' an option, and if set, all files
are sync'd before close.
With proper hardware setup, this should be a minimal performance
penalty.
What about writing a marker at the end of each file? I am
robert engels [EMAIL PROTECTED] wrote:
I would be wary of the additional complexity of doing this.
It would be my vote to making 'sync' an option, and if set, all files
are sync'd before close.
This is the way it is now: doSync is an option to FSDirectory,
which defaults to true.
I agree
The else clause in SegmentTermPositions.readDeltaPosition() is
redundant and could be removed, yes?
It's a pretty minor improvement, but this is very inner-loop stuff.
-Yonik
private final int readDeltaPosition() throws IOException {
int delta = proxStream.readVInt();
if
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541955
]
Michael McCandless commented on LUCENE-743:
---
I think the cause of the intermittant failure in the test is
Why doesn't reopen get the 'read' lock, since commit has the write
lock, it should wait...
On Nov 12, 2007, at 3:35 PM, Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-743?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
Then how can the commit during reopen be an issue?
I am not very family with this new code, but it seems that you need
to write segments.XXX.new and then rename to segments.XXX.
As long as the files are sync'd, even on nfs the reopen should not
see segments.XXX until is is ready.
robert engels [EMAIL PROTECTED] wrote:
Then how can the commit during reopen be an issue?
This is what happens:
* Reader opens latest segments_N reads all SegmentInfos
successfully.
* Writer writes new segments_N+1, and then deletes now un-referenced
files.
* Reader tries to
But merging segments doesn't delete the old, it only creates new,
unless the segments meet the purge old criteria.
A reopen() is supposed to open the latest version in the directory by
definition, so this seems rather a remote possibility.
If it occurs due to low system resources (meaning
Yonik Seeley [EMAIL PROTECTED] wrote:
On Nov 12, 2007 5:08 PM, robert engels [EMAIL PROTECTED] wrote:
As long as the files are sync'd, even on nfs the reopen should not
see segments.XXX until is is ready.
Right, but then there is a race on the other side... a reader may open
the segments
What are you basing the rename is not reliable on windows on? That
a virus scanner has the file open. If that is the case, that should
either be an incorrect setup, or the operation retried until it
completes.
Writing directly to a file that someone else can open for reading is
bound to
Not just virus scanners: any program that uses the Microsoft API for
being notified of file changes. I think TortoiseSVN was one such
example.
People who embed Lucene can't control what their users install on
their desktops. Virus scanners are naturally very common on
desktops. I think we want
Horse poo poo. If you are working in a local environment, the files
should be opened with exclusive access. This guarantees that the
operations will succeed for the calling process.
That NFS is a viable solution is highly debatable, and IMO shows a
lack of understanding of NFS and the
robert engels wrote:
I was talking about Windows in particular - as stated, unix/linux does
not have the problem - under Windows the delete will (should) fail.
As I said, delete does fail on Windows in that case, and the
IndexFileDeleter (called by the IndexWriter) catches the IOException
That is not true - at least it didn't use to be. if there were
readers open the files/segments would not be deleted. they would be
deleted at next open.
The purge criteria was based on the next commit sets. To make
this work, and be able to roll back or open a previous version, you
need
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541998
]
Michael Busch commented on LUCENE-743:
--
I think the cause of the intermittant failure in the test is a missing
I would still argue that it is an incorrect setup - almost as bad as
not plugging the computer in.
If a user runs a virus scanner or file system indexer on the lucene
index directory, their system is going to slow to a crawl and
indexing will be abominably slow.
The installation guide
On Nov 12, 2007 7:19 PM, robert engels [EMAIL PROTECTED] wrote:
I would still argue that it is an incorrect setup - almost as bad as
not plugging the computer in.
A user themselves could even go in and look at the index files (I've
done so myself)... as could a backup program or whatever. It's
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch updated LUCENE-743:
-
Attachment: lucene-743-take8.patch
OK, all tests pass now, including the thread-safety test.
I
Doug Cutting wrote on 11/07/2007 09:26 AM:
Hadoop's MapFile is similar to Lucene's term index, and supports a
feature where only a subset of the index entries are loaded
(determined by io.map.index.skip). It would not be difficult to add
such a feature to Lucene by changing
Yonik Seeley wrote:
The else clause in SegmentTermPositions.readDeltaPosition() is
redundant and could be removed, yes?
It's a pretty minor improvement, but this is very inner-loop stuff.
-Yonik
Thanks, Yonik, you're right. We can safely remove those two lines.
TermPositions#seek() resets
Chris Hostetter wrote:
independent of the QueryParser aspects of your question, adding a
setSimilarity method to the Query class would be a complete 180 of how it
currently works right now.
Query classes have to have a getSimilarity method so that their
Weight/Scorer have a way to access the
27 matches
Mail list logo