[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch updated LUCENE-743:
-
Attachment: lucene-743-take7.patch
Changes:
- Updated patch to current trunk (I just realized th
Dear All,
Using lucene i'm indexing my documents. While doing index for some word
documents i got the following exception:
Unable to read entire block; 72 bytes read; expected 512 bytes
While indexing rtf documents i get the following exception:
Unable to read entire block; 72 bytes read; expect
Sorry, for rtf it throws the following exception:
Unable to read entire header; 100 bytes read; expected 512 bytes
Is it a issue with POI of Lucene?. If so which build of POI contains fix for
this problem where i can get it?. Please tell me asap.
Thanks.
- Original Message
From: Dura
: The problem is that I want to use QueryParser to construct the
: query for me. I am having to overriding the logic in QueryParser to
: construct my own derived class, which seems to me like a convoluted
: way to just setting the Similariy.
that's the basic design of the QueryParser class -
: Sorry, for rtf it throws the following exception:
: Unable to read entire header; 100 bytes read; expected 512 bytes
: Is it a issue with POI of Lucene?. If so which build of POI contains fix
: for this problem where i can get it?. Please tell me asap.
1) java-dev if for discussiong developm
[
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541874
]
Doug Cutting commented on LUCENE-1044:
--
> Is a sync before every file close really needed [...] ?
It might be
I don't think this would be any difference performance wise, and
might actually be slower.
When you call FD.sync() it only needs to ensure the dirty blocks
associated with that descriptor need to be saved.
On Nov 12, 2007, at 12:15 PM, Doug Cutting (JIRA) wrote:
[ https://issues.a
I'm putting together a Google Web Toolkit-based version of Luke:
http://www.inperspective.com/lucene/Luke.war
( Just add your version of lucene core jar to WEB-INF/lib subdirectory and you
should have the basis of a web-enabled Luke.)
The intention behind this is to port Luke to a wholly Apach
robert engels wrote:
I don't think this would be any difference performance wise, and might
actually be slower.
When you call FD.sync() it only needs to ensure the dirty blocks
associated with that descriptor need to be saved.
The potential benefit is that you wouldn't have to wait for thing
Would it not be simpler to pure Java...
Add the descriptor that needs to be sync'd (and closed) to a Queue.
Start a Thread to sync/close descriptors.
In commit(), wait for all sync threads to terminate using join().
On Nov 12, 2007, at 12:34 PM, Doug Cutting wrote:
robert engels wrote:
I don
robert engels wrote:
Would it not be simpler to pure Java...
Add the descriptor that needs to be sync'd (and closed) to a Queue.
Start a Thread to sync/close descriptors.
In commit(), wait for all sync threads to terminate using join().
+1
Doug
-
: Sorry, for rtf it throws the following exception:
: Unable to read entire header; 100 bytes read; expected 512 bytes
: Is it a issue with POI of Lucene?. If so which build of POI contains fix
: for this problem where i can get it?. Please tell me asap.
1) java-dev if for discussiong developme
On Nov 12, 2007 1:41 PM, robert engels <[EMAIL PROTECTED]> wrote:
> Would it not be simpler to pure Java...
>
> Add the descriptor that needs to be sync'd (and closed) to a Queue.
> Start a Thread to sync/close descriptors.
>
> In commit(), wait for all sync threads to terminate using join().
This
I'll look into this approach.
We must also sync/close the file before we can open it for reading, eg
for creating compound file or if a merge kicks off.
Though if we are willing to not commit a new segments_N after saving a
segment and before creating its compound found then we don't need to
syn
I would be wary of the additional complexity of doing this.
It would be my vote to making 'sync' an option, and if set, all files
are sync'd before close.
With proper hardware setup, this should be a minimal performance
penalty.
What about writing a marker at the end of each file? I am no
On Nov 12, 2007, at 1:21 PM, mark harwood wrote:
I'm putting together a Google Web Toolkit-based version of Luke:
http://www.inperspective.com/lucene/Luke.war
( Just add your version of lucene core jar to WEB-INF/lib
subdirectory and you should have the basis of a web-enabled Luke.)
Mark:
"robert engels" <[EMAIL PROTECTED]> wrote:
> I would be wary of the additional complexity of doing this.
>
> It would be my vote to making 'sync' an option, and if set, all files
> are sync'd before close.
This is the way it is now: doSync is an option to FSDirectory,
which defaults to true.
I
The else clause in SegmentTermPositions.readDeltaPosition() is
redundant and could be removed, yes?
It's a pretty minor improvement, but this is very inner-loop stuff.
-Yonik
private final int readDeltaPosition() throws IOException {
int delta = proxStream.readVInt();
if (currentFieldSt
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541955
]
Michael McCandless commented on LUCENE-743:
---
I think the cause of the intermittant failure in the test is a
Why doesn't reopen get the 'read' lock, since commit has the write
lock, it should wait...
On Nov 12, 2007, at 3:35 PM, Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-743?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#action_125
On Nov 12, 2007 4:43 PM, robert engels <[EMAIL PROTECTED]> wrote:
> Why doesn't reopen get the 'read' lock, since commit has the write
> lock, it should wait...
After lockless commits, there is no read lock!
-Yonik
-
To unsubscr
Then how can the commit during reopen be an issue?
I am not very family with this new code, but it seems that you need
to write segments.XXX.new and then rename to segments.XXX.
As long as the files are sync'd, even on nfs the reopen should not
see segments.XXX until is is ready.
Although
On Nov 12, 2007 5:08 PM, robert engels <[EMAIL PROTECTED]> wrote:
> As long as the files are sync'd, even on nfs the reopen should not
> see segments.XXX until is is ready.
Right, but then there is a race on the other side... a reader may open
the segments .XXX file and then start opening all the
robert engels <[EMAIL PROTECTED]> wrote:
> Then how can the commit during reopen be an issue?
This is what happens:
* Reader opens latest segments_N & reads all SegmentInfos
successfully.
* Writer writes new segments_N+1, and then deletes now un-referenced
files.
* Reader tries
But merging segments doesn't delete the old, it only creates new,
unless the segments meet the "purge old criteria".
A reopen() is supposed to open the latest version in the directory by
definition, so this seems rather a remote possibility.
If it occurs due to low system resources (meaning
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 12, 2007 5:08 PM, robert engels <[EMAIL PROTECTED]> wrote:
> > As long as the files are sync'd, even on nfs the reopen should not
> > see segments.XXX until is is ready.
>
> Right, but then there is a race on the other side... a reader may open
>
What are you basing the "rename" is not reliable on windows on? That
a virus scanner has the file open. If that is the case, that should
either be an incorrect setup, or the operation retried until it
completes.
Writing directly to a file that someone else can open for reading is
bound to
"robert engels" <[EMAIL PROTECTED]> wrote:
> But merging segments doesn't delete the old, it only creates new,
> unless the segments meet the "purge old criteria".
What's the "purge old criteria"?
Normally a segment merge once committed immediately deletes the
segments it had just merged.
> A
Not just virus scanners: any program that uses the Microsoft API for
being notified of file changes. I think TortoiseSVN was one such
example.
People who embed Lucene can't control what their users install on
their desktops. Virus scanners are naturally very common on
desktops. I think we want
Horse poo poo. If you are working in a local environment, the files
should be opened with exclusive access. This guarantees that the
operations will succeed for the calling process.
That NFS is a viable solution is highly debatable, and IMO shows a
lack of understanding of NFS and the unix
robert engels wrote:
>
> The commit "in flight" cannot (SHOULD NOT) be deleting segments if they
> are in use. That a caller could issue a reopen call means there are
> segments in use by definition (or they would have nothing to reopen).
>
Reopen still works correctly, even if there are no seg
I am not debating that reopen works (since that is supposed to get
the latest version). I am stating that commit cannot be deleting
segments if they are in use, which they must be at that time in order
to issue a reopen(), since to issue reopen() you must have an
instance of IndexReader ope
robert engels wrote:
>
> I was talking about Windows in particular - as stated, unix/linux does
> not have the problem - under Windows the delete will (should) fail.
>
As I said, delete does fail on Windows in that case, and the
IndexFileDeleter (called by the IndexWriter) catches the IOExceptio
That is not true - at least it didn't use to be. if there were
readers open the files/segments would not be deleted. they would be
deleted at next open.
The "purge criteria" was based on the next "commit" sets. To make
this work, and be able to roll back or open a previous "version", you
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541998
]
Michael Busch commented on LUCENE-743:
--
> I think the cause of the intermittant failure in the test is a missing
I would still argue that it is an incorrect setup - almost as bad as
"not plugging the computer in".
If a user runs a virus scanner or file system indexer on the lucene
index directory, their system is going to slow to a crawl and
indexing will be abominably slow.
The installation guide s
On Nov 12, 2007 7:19 PM, robert engels <[EMAIL PROTECTED]> wrote:
> I would still argue that it is an incorrect setup - almost as bad as
> "not plugging the computer in".
A user themselves could even go in and look at the index files (I've
done so myself)... as could a backup program or whatever.
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch updated LUCENE-743:
-
Attachment: lucene-743-take8.patch
OK, all tests pass now, including the thread-safety test.
I ra
Doug Cutting wrote on 11/07/2007 09:26 AM:
Hadoop's MapFile is similar to Lucene's term index, and supports a
feature where only a subset of the index entries are loaded
(determined by io.map.index.skip). It would not be difficult to add
such a feature to Lucene by changing TermInfosReader#ens
Yonik Seeley wrote:
> The else clause in SegmentTermPositions.readDeltaPosition() is
> redundant and could be removed, yes?
> It's a pretty minor improvement, but this is very inner-loop stuff.
>
> -Yonik
>
Thanks, Yonik, you're right. We can safely remove those two lines.
TermPositions#seek() r
Chris Hostetter wrote:
independent of the QueryParser aspects of your question, adding a
setSimilarity method to the Query class would be a complete 180 of how it
currently works right now.
Query classes have to have a getSimilarity method so that their
Weight/Scorer have a way to access the
True. It seems that the Lucene code might be a bit more resilient
here though, using the following:
1. open the segments file exclusively (if this fails, updates are
prohibited, and an exception is thrown)
2. write new segments
3. write segments.new including segments hash & sync
4. update s
42 matches
Mail list logo