Re: How to solve the issue Unable to read entire block; 72 bytes read; expected 512 bytes

2007-11-12 Thread Durai murugan
Sorry, for rtf it throws the following exception: Unable to read entire header; 100 bytes read; expected 512 bytes Is it a issue with POI of Lucene?. If so which build of POI contains fix for this problem where i can get it?. Please tell me asap. Thanks. - Original Message From:

[jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541874 ] Doug Cutting commented on LUCENE-1044: -- Is a sync before every file close really needed [...] ? It might be

Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread Doug Cutting
robert engels wrote: I don't think this would be any difference performance wise, and might actually be slower. When you call FD.sync() it only needs to ensure the dirty blocks associated with that descriptor need to be saved. The potential benefit is that you wouldn't have to wait for

Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread Doug Cutting
robert engels wrote: Would it not be simpler to pure Java... Add the descriptor that needs to be sync'd (and closed) to a Queue. Start a Thread to sync/close descriptors. In commit(), wait for all sync threads to terminate using join(). +1 Doug

Re: How to solve the issue Unable to read entire block; 72 bytes read; expected 512 bytes

2007-11-12 Thread Ken Krugler
: Sorry, for rtf it throws the following exception: : Unable to read entire header; 100 bytes read; expected 512 bytes : Is it a issue with POI of Lucene?. If so which build of POI contains fix : for this problem where i can get it?. Please tell me asap. 1) java-dev if for discussiong

Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 1:41 PM, robert engels [EMAIL PROTECTED] wrote: Would it not be simpler to pure Java... Add the descriptor that needs to be sync'd (and closed) to a Queue. Start a Thread to sync/close descriptors. In commit(), wait for all sync threads to terminate using join(). This would

Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread robert engels
I would be wary of the additional complexity of doing this. It would be my vote to making 'sync' an option, and if set, all files are sync'd before close. With proper hardware setup, this should be a minimal performance penalty. What about writing a marker at the end of each file? I am

Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown

2007-11-12 Thread Michael McCandless
robert engels [EMAIL PROTECTED] wrote: I would be wary of the additional complexity of doing this. It would be my vote to making 'sync' an option, and if set, all files are sync'd before close. This is the way it is now: doSync is an option to FSDirectory, which defaults to true. I agree

small improvement when no payloads?

2007-11-12 Thread Yonik Seeley
The else clause in SegmentTermPositions.readDeltaPosition() is redundant and could be removed, yes? It's a pretty minor improvement, but this is very inner-loop stuff. -Yonik private final int readDeltaPosition() throws IOException { int delta = proxStream.readVInt(); if

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541955 ] Michael McCandless commented on LUCENE-743: --- I think the cause of the intermittant failure in the test is

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
Why doesn't reopen get the 'read' lock, since commit has the write lock, it should wait... On Nov 12, 2007, at 3:35 PM, Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-743? page=com.atlassian.jira.plugin.system.issuetabpanels:comment-

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
Then how can the commit during reopen be an issue? I am not very family with this new code, but it seems that you need to write segments.XXX.new and then rename to segments.XXX. As long as the files are sync'd, even on nfs the reopen should not see segments.XXX until is is ready.

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael McCandless
robert engels [EMAIL PROTECTED] wrote: Then how can the commit during reopen be an issue? This is what happens: * Reader opens latest segments_N reads all SegmentInfos successfully. * Writer writes new segments_N+1, and then deletes now un-referenced files. * Reader tries to

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
But merging segments doesn't delete the old, it only creates new, unless the segments meet the purge old criteria. A reopen() is supposed to open the latest version in the directory by definition, so this seems rather a remote possibility. If it occurs due to low system resources (meaning

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael McCandless
Yonik Seeley [EMAIL PROTECTED] wrote: On Nov 12, 2007 5:08 PM, robert engels [EMAIL PROTECTED] wrote: As long as the files are sync'd, even on nfs the reopen should not see segments.XXX until is is ready. Right, but then there is a race on the other side... a reader may open the segments

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
What are you basing the rename is not reliable on windows on? That a virus scanner has the file open. If that is the case, that should either be an incorrect setup, or the operation retried until it completes. Writing directly to a file that someone else can open for reading is bound to

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael McCandless
Not just virus scanners: any program that uses the Microsoft API for being notified of file changes. I think TortoiseSVN was one such example. People who embed Lucene can't control what their users install on their desktops. Virus scanners are naturally very common on desktops. I think we want

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
Horse poo poo. If you are working in a local environment, the files should be opened with exclusive access. This guarantees that the operations will succeed for the calling process. That NFS is a viable solution is highly debatable, and IMO shows a lack of understanding of NFS and the

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael Busch
robert engels wrote: I was talking about Windows in particular - as stated, unix/linux does not have the problem - under Windows the delete will (should) fail. As I said, delete does fail on Windows in that case, and the IndexFileDeleter (called by the IndexWriter) catches the IOException

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
That is not true - at least it didn't use to be. if there were readers open the files/segments would not be deleted. they would be deleted at next open. The purge criteria was based on the next commit sets. To make this work, and be able to roll back or open a previous version, you need

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541998 ] Michael Busch commented on LUCENE-743: -- I think the cause of the intermittant failure in the test is a missing

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread robert engels
I would still argue that it is an incorrect setup - almost as bad as not plugging the computer in. If a user runs a virus scanner or file system indexer on the lucene index directory, their system is going to slow to a crawl and indexing will be abominably slow. The installation guide

Re: [jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 7:19 PM, robert engels [EMAIL PROTECTED] wrote: I would still argue that it is an incorrect setup - almost as bad as not plugging the computer in. A user themselves could even go in and look at the index files (I've done so myself)... as could a backup program or whatever. It's

[jira] Updated: (LUCENE-743) IndexReader.reopen()

2007-11-12 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-743: - Attachment: lucene-743-take8.patch OK, all tests pass now, including the thread-safety test. I

Re: Term pollution from binary data

2007-11-12 Thread Chuck Williams
Doug Cutting wrote on 11/07/2007 09:26 AM: Hadoop's MapFile is similar to Lucene's term index, and supports a feature where only a subset of the index entries are loaded (determined by io.map.index.skip). It would not be difficult to add such a feature to Lucene by changing

Re: small improvement when no payloads?

2007-11-12 Thread Michael Busch
Yonik Seeley wrote: The else clause in SegmentTermPositions.readDeltaPosition() is redundant and could be removed, yes? It's a pretty minor improvement, but this is very inner-loop stuff. -Yonik Thanks, Yonik, you're right. We can safely remove those two lines. TermPositions#seek() resets

Re: setSimilarity on Query

2007-11-12 Thread Shailesh Kochhar
Chris Hostetter wrote: independent of the QueryParser aspects of your question, adding a setSimilarity method to the Query class would be a complete 180 of how it currently works right now. Query classes have to have a getSimilarity method so that their Weight/Scorer have a way to access the