[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857388#action_12857388 ] Shai Erera commented on LUCENE-2396: Robert I think this is great! Can we move more analyzers from core here? I think however that a backwards section in changes is important because it alerts users about those analyzers whose runtime behavior changed. Otherwise how would the poor uses know that? It doesn't mean you need to maintain back compat support but at least alert them when things change. Even if we eventually decide to remove API bw completely, a section in CHANGES will still be required to help users upgrade easily. remove version from contrib/analyzers. -- Key: LUCENE-2396 URL: https://issues.apache.org/jira/browse/LUCENE-2396 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2396.patch Contrib/analyzers has no backwards-compatibility policy, so let's remove Version so the API is consumable. if you think we shouldn't do this, then instead explicitly state and vote on what the backwards compatibility policy for contrib/analyzers should be instead, or move it all to core. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.
[ https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857396#action_12857396 ] Shai Erera commented on LUCENE-2396: Static? Weren't you against that!? But if we remove back compat from analyzers why do we need Version? Or is this API bw that we remove? remove version from contrib/analyzers. -- Key: LUCENE-2396 URL: https://issues.apache.org/jira/browse/LUCENE-2396 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2396.patch Contrib/analyzers has no backwards-compatibility policy, so let's remove Version so the API is consumable. if you think we shouldn't do this, then instead explicitly state and vote on what the backwards compatibility policy for contrib/analyzers should be instead, or move it all to core. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2397) SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened
SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened --- Key: LUCENE-2397 URL: https://issues.apache.org/jira/browse/LUCENE-2397 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1 SDP throws NPE if no commits occurred and snapshot() was called. I will replace it w/ throwing IllegalStateException. I'll also move TestSDP from o.a.l to o.a.l,index. I'll post a patch soon -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2316) Define clear semantics for Directory.fileLength
[ https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2316. Lucene Fields: [New, Patch Available] (was: [New]) Assignee: Shai Erera Resolution: Fixed Committed revision 933879. Define clear semantics for Directory.fileLength --- Key: LUCENE-2316 URL: https://issues.apache.org/jira/browse/LUCENE-2316 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1 Attachments: LUCENE-2316.patch On this thread: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e it was mentioned that Directory's fileLength behavior is not consistent between Directory implementations if the given file name does not exist. FSDirectory returns a 0 length while RAMDirectory throws FNFE. The problem is that the semantics of fileLength() are not defined. As proposed in the thread, we'll define the following semantics: * Returns the length of the file denoted by codename/code if the file exists. The return value may be anything between 0 and Long.MAX_VALUE. * Throws FileNotFoundException if the file does not exist. Note that you can call dir.fileExists(name) if you are not sure whether the file exists or not. For backwards we'll create a new method w/ clear semantics. Something like: {code} /** * @deprecated the method will become abstract when #fileLength(name) has been removed. */ public long getFileLength(String name) throws IOException { long len = fileLength(name); if (len == 0 !fileExists(name)) { throw new FileNotFoundException(name); } return len; } {code} The first line just calls the current impl. If it throws exception for a non-existing file, we're ok. The second line verifies whether a 0 length is for an existing file or not and throws an exception appropriately. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856845#action_12856845 ] Shai Erera commented on LUCENE-2159: This looks like a nice tool. But all it does is create multiple copies of the same segment(s) right? So what exactly do you want to test with it? What worries me is that we'll be multiplying the lexicon, posting lists, statistics etc., therefore I'm not sure how reliable the tests will be (whatever they are), except for measuring things related to large number of segments (like merge performance). Am I right? I also think this class better fits in benchmark rather than misc, as it's really for perf. testing/measurements and not as a generic utility ... You can create a Task out if it, like ExpandIndexTask which one can include in his algorithm. Tool to expand the index for perf/stress testing. - Key: LUCENE-2159 URL: https://issues.apache.org/jira/browse/LUCENE-2159 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.0 Reporter: John Wang Attachments: ExpandIndex.java Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing. This tool does that. See attached class. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856877#action_12856877 ] Shai Erera commented on LUCENE-2159: bq. I understand having a general performance suite to test regression is a good thing. But we found having a more focused test for segmentation and merge is important. Are you saying that because of the benchmark proposal? I still think that an ExpandIndexTask will be useful for benchmark and fits better there, than in contrib/misc. We can have that task together w/ a predefined .alg for using it ... Tool to expand the index for perf/stress testing. - Key: LUCENE-2159 URL: https://issues.apache.org/jira/browse/LUCENE-2159 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.0 Reporter: John Wang Attachments: ExpandIndex.java Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing. This tool does that. See attached class. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856911#action_12856911 ] Shai Erera commented on LUCENE-2159: Which is fine - I think this would be a neat task to add to benchmark, w/ specific documentation on how to use it and for what purposes. If you can also write a sample .alg file which e.g. creates a small index and then Expand it, that'd be great. I've looked at the different PerfTask implementations in benchmark, and I'm thinking if we perhaps should do the following: * Create an AddIndexesTask which receives one or more Directories as input and calls writer.addIndexesNoOptimize * If one wants, he can add an OptimizeTask call afterwards. * Write an expandIndex.alg which initially creates an index of size N from one content source and then calls the AddIndexesTask several times. The .alg file is meant to be an example as well as people can change it to create bigger or smaller indexes, use other content sources and switch between RAM/FS directories. How's that sound? Tool to expand the index for perf/stress testing. - Key: LUCENE-2159 URL: https://issues.apache.org/jira/browse/LUCENE-2159 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.0 Reporter: John Wang Attachments: ExpandIndex.java Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing. This tool does that. See attached class. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856917#action_12856917 ] Shai Erera commented on LUCENE-2159: bq. There is an excellent section on it in LIA2 Indeed ! Ok so to create a task, you just extend PerfTask. You can look under contrib/benchmark/src/java/o.a.l/benchmark/byTask/tasks for many examples. OptimizeTask seems relevant here (i.e. it calls an IW API and receives a parameter). For writing .alg files, that's SUPER simple, just look under contrib/benchmark/conf for many existing examples. You can post a patch once you feel comfortable enough with it and I can help you with the struggles (if you'll run into any). Another great source (besides LIA2) on writing .alg files is the package.html under contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask. Tool to expand the index for perf/stress testing. - Key: LUCENE-2159 URL: https://issues.apache.org/jira/browse/LUCENE-2159 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.0 Reporter: John Wang Attachments: ExpandIndex.java Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing. This tool does that. See attached class. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2386. Resolution: Fixed Committed revision 933613. (take #2) IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2316) Define clear semantics for Directory.fileLength
[ https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2316: --- Attachment: LUCENE-2316.patch Patch clarifies the contract, fixes the directories to adhere to it and adds a CHANGES under backwards section. All tests pass. Define clear semantics for Directory.fileLength --- Key: LUCENE-2316 URL: https://issues.apache.org/jira/browse/LUCENE-2316 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Priority: Minor Fix For: 3.1 Attachments: LUCENE-2316.patch On this thread: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e it was mentioned that Directory's fileLength behavior is not consistent between Directory implementations if the given file name does not exist. FSDirectory returns a 0 length while RAMDirectory throws FNFE. The problem is that the semantics of fileLength() are not defined. As proposed in the thread, we'll define the following semantics: * Returns the length of the file denoted by codename/code if the file exists. The return value may be anything between 0 and Long.MAX_VALUE. * Throws FileNotFoundException if the file does not exist. Note that you can call dir.fileExists(name) if you are not sure whether the file exists or not. For backwards we'll create a new method w/ clear semantics. Something like: {code} /** * @deprecated the method will become abstract when #fileLength(name) has been removed. */ public long getFileLength(String name) throws IOException { long len = fileLength(name); if (len == 0 !fileExists(name)) { throw new FileNotFoundException(name); } return len; } {code} The first line just calls the current impl. If it throws exception for a non-existing file, we're ok. The second line verifies whether a 0 length is for an existing file or not and throws an exception appropriately. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855870#action_12855870 ] Shai Erera commented on LUCENE-2386: I'm not sure if we're arguing about the same thing here ... why when I open an IW on empty Directory I need an empty segment that's created, and from now on never changed, populated or even read? That just seems wrong to me ... when I fixed the tests to not rely on the buggy behavior, I noticed several which count the list of commits (especially the IDP ones) w/ a documentation like 1 for opening + N for committing ... It just looks weird that when you open IW a commit happens, a set of empty files are created, but from now on they are never modified, until IDP kicks in, after the second commit ... it's nothing like initing the Directory to be able to receive input .. And I don't know what's the benefit of doing new IW() following by IR.open() ... that IR will always see 0 documents, until you call reopen (if commit happened in between). So what's the convenience here? that your code can call IR.open once, and from that point forward just 'reopen()'? That seems low advantage to me, really. Maybe what we should do is fix IR.open to return a null IR in case the directory hasn't been populated w/ anything yet. Then you can check easily if you should call open() (==null) or reopen (otherwise). Or create a blank stub of IR which emulates an empty Dir, and when reopen is called works well (if the Directory is not empty now) ... BTW, FWIW, Solr's code did not break from this change at all ... it was the combination of FSDir and NoLF/SingleInstanceLF that broke some tests that used it ... I don't know how many apps out there are using that combination, but I'd bet it's small? I use that combination, however in my case an IR is opened only after a commit signal/event is raised (so I don't check isCurrent often or attempt to reopen()). What I'm trying to say is that this combination is dangerous, and the application needs to ensure that only one IW is open at any given time, and I'm sure such apps are more sophisticated then opening IW and then IR just for the convenience of it. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2316) Define clear semantics for Directory.fileLength
[ https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855873#action_12855873 ] Shai Erera commented on LUCENE-2316: Well ... dir.fileLength is also used by SegmentInfos.sizeInBytes to compute the size of all the files in the Directory. If we remove fileLength, then SI will need to call dir.openInput.length() and the close it? Seems like a lot of work to me, for just obtaining the length of the file. So I agree that if you have an IndexInput at hand, you should call its length() method rather than Dir.fileLength. But otherwise, if you just have a name at hand, a dir.fileLength is convenient? I'm also ok w/ the bw break rather than going through the new/deprecate cycle. Define clear semantics for Directory.fileLength --- Key: LUCENE-2316 URL: https://issues.apache.org/jira/browse/LUCENE-2316 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Priority: Minor Fix For: 3.1 On this thread: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e it was mentioned that Directory's fileLength behavior is not consistent between Directory implementations if the given file name does not exist. FSDirectory returns a 0 length while RAMDirectory throws FNFE. The problem is that the semantics of fileLength() are not defined. As proposed in the thread, we'll define the following semantics: * Returns the length of the file denoted by codename/code if the file exists. The return value may be anything between 0 and Long.MAX_VALUE. * Throws FileNotFoundException if the file does not exist. Note that you can call dir.fileExists(name) if you are not sure whether the file exists or not. For backwards we'll create a new method w/ clear semantics. Something like: {code} /** * @deprecated the method will become abstract when #fileLength(name) has been removed. */ public long getFileLength(String name) throws IOException { long len = fileLength(name); if (len == 0 !fileExists(name)) { throw new FileNotFoundException(name); } return len; } {code} The first line just calls the current impl. If it throws exception for a non-existing file, we're ok. The second line verifies whether a 0 length is for an existing file or not and throws an exception appropriately. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2392) Enable flexible scoring
[ https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855875#action_12855875 ] Shai Erera commented on LUCENE-2392: Mike - it'll also be great if we can store the length of the document in a custom way. I think what I'm saying is that if we can open up the norms computation to custom code - that will do what I want, right? Maybe we can have a class like DocLengthProvider which apps can plug in if they want to customize how that length is computed. Wherever we write the norms, we'll call that impl, which by default will do what Lucene does today? I think though that it's not a field-level setting, but an IW one? Enable flexible scoring --- Key: LUCENE-2392 URL: https://issues.apache.org/jira/browse/LUCENE-2392 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2392.patch This is a first step (nowhere near committable!), implementing the design iterated to in the recent Baby steps towards making Lucene's scoring more flexible java-dev thread. The idea is (if you turn it on for your Field; it's off by default) to store full stats in the index, into a new _X.sts file, per doc (X field) in the index. And then have FieldSimilarityProvider impls that compute doc's boost bytes (norms) from these stats. The patch is able to index the stats, merge them when segments are merged, and provides an iterator-only API. It also has starting point for per-field Sims that use the stats iterator API to compute boost bytes. But it's not at all tied into actual searching! There's still tons left to do, eg, how does one configure via Field/FieldType which stats one wants indexed. All tests pass, and I added one new TestStats unit test. The stats I record now are: - field's boost - field's unique term count (a b c a a b -- 3) - field's total term count (a b c a a b -- 6) - total term count per-term (sum of total term count for all docs that have this term) Still need at least the total term count for each field. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems
[ https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855877#action_12855877 ] Shai Erera commented on LUCENE-2373: I'd rather not count on file length as well ... so a put/getTermDictSize method on Codec will allow one to implement it however one wants, if running on HDFS for example? Change StandardTermsDictWriter to work with streaming and append-only filesystems - Key: LUCENE-2373 URL: https://issues.apache.org/jira/browse/LUCENE-2373 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Andrzej Bialecki Fix For: 3.1 Since early 2.x times Lucene used a skip/seek/write trick to patch the length of the terms dict into a place near the start of the output data file. This however made it impossible to use Lucene with append-only filesystems such as HDFS. In the post-flex trunk the following code in StandardTermsDictWriter initiates this: {code} // Count indexed fields up front CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); out.writeLong(0); // leave space for end index pointer {code} and completes this in close(): {code} out.seek(CodecUtil.headerLength(CODEC_NAME)); out.writeLong(dirStart); {code} I propose to change this layout so that this pointer is stored simply at the end of the file. It's always 8 bytes long, and we known the final length of the file from Directory, so it's a single additional seek(length - 8) to read it, which is not much considering the benefits. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855892#action_12855892 ] Shai Erera commented on LUCENE-2386: bq. what is the proper way (after this fix) to open an IR over possibly-empty directory? You can simply call commit() immediately after you open IW. If that's what you need then it will work for you. You're right that if I add docs, deletes and them commits, I'll get an empty segment. So is if you do new IW() and then iw.close() w/ no addDocument in between. The point here was that we should not create a commit unless the user has specifically asked for it. Calling close() means asking for a commit, per close semantics and contract. But if the app called new IW, add docs and crashed in the middle, the Directory will still remain empty ... which is sort of what, IMO, should happen. I agree it's a matter of perspective. I think that when autoCommit was removed, so should have been this code. I don't know if it was left behind for a good reason, or simply because when someone tried to do it, he found out it's not that simple (like I have :)). IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2392) Enable flexible scoring
[ https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855913#action_12855913 ] Shai Erera commented on LUCENE-2392: I'd like to withdraw my request from above. I misunderstood that the stats I need are stored per-field per-doc. So that will allow me to compute the docLength as I want. Enable flexible scoring --- Key: LUCENE-2392 URL: https://issues.apache.org/jira/browse/LUCENE-2392 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2392.patch This is a first step (nowhere near committable!), implementing the design iterated to in the recent Baby steps towards making Lucene's scoring more flexible java-dev thread. The idea is (if you turn it on for your Field; it's off by default) to store full stats in the index, into a new _X.sts file, per doc (X field) in the index. And then have FieldSimilarityProvider impls that compute doc's boost bytes (norms) from these stats. The patch is able to index the stats, merge them when segments are merged, and provides an iterator-only API. It also has starting point for per-field Sims that use the stats iterator API to compute boost bytes. But it's not at all tied into actual searching! There's still tons left to do, eg, how does one configure via Field/FieldType which stats one wants indexed. All tests pass, and I added one new TestStats unit test. The stats I record now are: - field's boost - field's unique term count (a b c a a b -- 3) - field's total term count (a b c a a b -- 6) - total term count per-term (sum of total term count for all docs that have this term) Still need at least the total term count for each field. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855924#action_12855924 ] Shai Erera commented on LUCENE-2386: I don't think that people need to write that emptiness-detection-then-commit code ... if they care, they can simply immediately call commit() after they open IW. bq. Isn't opening IW with CREATE* mode called specifically asking for? It depends on how you interpret the mode ... for example, you cannot pass OpenMode.APPEND for an empty Directory, because IW throws an exception. The modes are just meant to tell IW how to behave: * APPEND - I know there is an index in the Directory, and I'd like to append to it. * CREATE - I don't care if there is an index in the Directory -- create a new one, zeroing out all segments. * CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one. So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()? That's just an inconsistency that's meant to allow you to open an IR immediately after new IW() call, irregardless of what was there? And if you open that IR, then if the index was populated you see the previous set of documents, but if it wasn't you see nothing, even though you meant to say override what's there? I've checked what FileOutputStream does, using the following code: {code} File file = new File(d:/temp/tmpfile); FileOutputStream fos = new FileOutputStream(file); fos.write(3); fos.close(); fos = new FileOutputStream(file); FileInputStream fis = new FileInputStream(file); System.out.println(fis.read()); {code} * Second line creates an empty file immediately, not waiting for close() or flush() -- which resembles the behavior that you're suggesting we should take w/ IW (which is the 'today's behavior') * Forth line closes the file, flushing and writing the content. * Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros out the file content immediately, even before you wrote a single piece of byte to it. * Sixth+Seventh line proves it by attempting to read from the file, and the output printed is -1. I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm trying to show is that we don't fully adhere to the CREATE mode, and rightfully if you ask me - we shouldn't zero out the segments until the application called commit(). But we choose to adhere differently to the CREATE* mode if the index is already populated. That's an inconsistent behavior, at least in my perspective. It's also harder to explain and document, e.g. you should call commit() if you used CREATE, in case you want to zero out everything immediately, and the Directory is not empty, but you don't need to call commit() if the directory was empty, Lucene will do it for you. -- so now how will the app know if it should call commit()? It will need to write a sort of emptiness-detection-then-commit? I am willing to consider the following semantics: * APPEND - assumes an index exists and open it. * CREATE - zeros out everything that's in the directory *immediately*, and also prepares an empty directory. * CREATE_OR_APPEND - either loads an existing index, or is able to work on the empty directory. No implicit commit is happening by IW if the index does not exist. But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed change to the patch so far -- if you open an index in CREATE*, you should call commit before you can read it. That will adhere to the semantics of what the application wanted, whether it meant to zero out an existing Directory, or create a new one from scratch. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856063#action_12856063 ] Shai Erera commented on LUCENE-2386: So just call new IW(), then rollback and ensure dir.listAll() returns an empty list? Or also index stuff, making sure a flush occurs and then rollback? I'm not sure that the latter is related to that issue ... IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch Patch includes the proposed test in TestIndexWriter. I think this is ready for commit, if there are no more objections. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2386. Lucene Fields: [New, Patch Available] (was: [New]) Resolution: Fixed Committed revision 932868. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855713#action_12855713 ] Shai Erera commented on LUCENE-1709: Committed revision 932878 with the following: # benchmark tests force sequential run # threadsPerProcessor defaults to 1 and can be overridden by -DthreadsPerProcessor=value # A CHANGES entry Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855727#action_12855727 ] Shai Erera commented on LUCENE-2386: Committed revision 932917 for the revert. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch Fixes IndexFileDeleter, adds a proper test to TestIndexWriter. Haven't run all the tests yet though, but the added test passes now with the fix. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855767#action_12855767 ] Shai Erera commented on LUCENE-2386: About IndexReader.listCommits ... the javadocs state this There must be at least one commit in the Directory, else this method throws java.io.IOException.. So I'll change it to reflect the right exception type is thrown (IndexNotFoundException) and revert the change to DirReader.listCommits which returns an empty list. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch Patch w/ proposed fixes. All tests pass, including Solr's :). IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch Patch updated to latest rev. + the proposed name change -- IndexNotFoundException. All tests pass. I plan to commit this later today. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855344#action_12855344 ] Shai Erera commented on LUCENE-2386: Ok I've added the following to DirReader: {code} try { latest.read(dir, codecs); } catch (FileNotFoundException e) { if (e.getMessage().startsWith(no segments* file found in)) { // Might be that the Directory is empty, in which case just return an // empty collection. return Collections.emptyList(); } else { throw e; } } {code} And now that test passes. I'll continue discovering tests that fail ... probably backwards will have its share too :). IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855369#action_12855369 ] Shai Erera commented on LUCENE-2386: I already did that ... just didn't post back. Created SegmentsFileNotFoundException. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1879) Parallel incremental indexing
[ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855379#action_12855379 ] Shai Erera commented on LUCENE-1879: I have found such version ... and it fails too :). At least the one I received. But never mind that ... as long as we both agree the implementation should change. I didn't mean to say anything bad about what you did .. I know the limitations you had to work with. Parallel incremental indexing - Key: LUCENE-1879 URL: https://issues.apache.org/jira/browse/LUCENE-1879 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael Busch Assignee: Michael Busch Fix For: 3.1 Attachments: parallel_incremental_indexing.tar A new feature that allows building parallel indexes and keeping them in sync on a docID level, independent of the choice of the MergePolicy/MergeScheduler. Find details on the wiki page for this feature: http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing Discussion on java-dev: http://markmail.org/thread/ql3oxzkob7aqf3jd -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch Patch fixes all tests as well as changes to IndexWriter, IndexFileDeleter, DirectoryReader and SegmentInfos. I'd like to commit this shortly, before all the files get changed by a malicious other commit :). (kidding of course) IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855457#action_12855457 ] Shai Erera commented on LUCENE-2386: Ok sounds good. Is there a preferred package for exceptions? Or is o.a.l.index ok? IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch, LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854885#action_12854885 ] Shai Erera commented on LUCENE-2074: Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854887#action_12854887 ] Shai Erera commented on LUCENE-2074: bq. I plan to commit this soon! That's great news ! BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check it in somewhere? because it hasn't been released last I checked. I'm asking for general knowledge, because I know the scripts are downloading it, or rely on it to exist somewhere. In that case, then yes, let's fix it here. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854920#action_12854920 ] Shai Erera commented on LUCENE-1482: I still think that calling isDebugEnabled is better, because the message formatting stuff may do unnecessary things like casting, autoboxing etc. IMO, if logging is enabled, evaluating it twice is not a big deal ... it's a simple check. I'm glad someone here thinks logging will be useful though :). I wish there will be quorum here to proceed w/ that. Note that I also offered to not create any dependency on SLF4J, but rather extract infoStream to a static InfoStream class, which will avoid passing it around everywhere, and give the flexibility to output stuff from other classes which don't have an infoStream at hand. Replace infoSteram by a logging framework (SLF4J) - Key: LUCENE-1482 URL: https://issues.apache.org/jira/browse/LUCENE-1482 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar Lucene makes use of infoStream to output messages in its indexing code only. For debugging purposes, when the search application is run on the customer side, getting messages from other code flows, like search, query parsing, analysis etc can be extremely useful. There are two main problems with infoStream today: 1. It is owned by IndexWriter, so if I want to add logging capabilities to other classes I need to either expose an API or propagate infoStream to all classes (see for example DocumentsWriter, which receives its infoStream instance from IndexWriter). 2. I can either turn debugging on or off, for the entire code. Introducing a logging framework can allow each class to control its logging independently, and more importantly, allows the application to turn on logging for only specific areas in the code (i.e., org.apache.lucene.index.*). I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, as it names states, a facade over different logging frameworks. As such, you can include the slf4j.jar in your application, and it recognizes at deploy time what is the actual logging framework you'd like to use. SLF4J comes with several adapters for Java logging, Log4j and others. If you know your application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in your classpath, and your logging statements will use Java logging underneath the covers. This makes the logging code very simple. For a class A the logger will be instantiated like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); } And will later be used like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); public void foo() { if (logger.isDebugEnabled()) { logger.debug(message); } } } That's all ! Checking for isDebugEnabled is very quick, at least using the JDK14 adapter (but I assume it's fast also over other logging frameworks). The important thing is, every class controls its own logger. Not all classes have to output logging messages, and we can improve Lucene's logging gradually, w/o changing the API, by adding more logging messages to interesting classes. I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855020#action_12855020 ] Shai Erera commented on LUCENE-1709: Robert, I will commit the patch, seems good to do anyway. We can handle the ant jars separately later. And ths hang behavior is exactly what I experience, including the FileInputStream thing. Only on my machine, when I took a thread dump, it showed that Ant waits on FIS.read() ... Robert - to remind you that even with the patch which forces junit to use a separate temp folder per thread, it still hung ... Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2385: --- Attachment: LUCENE-2385.patch Move NoDeletionPolicy to core, adds javadocs + TestNoDeletionPolicy. Also includes the relevant changes to benchmark (algorithms + CreateIndexTask). I've fixed a typo I had in NoMergeScheduler - not related to this issue, but since it was just a typo, thought it's no harm to do it here. Tests pass. Planning to commit shortly. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855131#action_12855131 ] Shai Erera commented on LUCENE-2386: Took a look at IndexFileDeleter, and located to offending code segment which is responsible for the IndexCorruptException: {code} if (currentCommitPoint == null) { // We did not in fact see the segments_N file // corresponding to the segmentInfos that was passed // in. Yet, it must exist, because our caller holds // the write lock. This can happen when the directory // listing was stale (eg when index accessed via NFS // client with stale directory listing cache). So we // try now to explicitly open this commit point: SegmentInfos sis = new SegmentInfos(); try { sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs); } catch (IOException e) { throw new CorruptIndexException(failed to locate current segments_N file); } {code} Looks like this code protects against a real problem, which was raised on the list a couple of times already - stale NFS cache. So I'm reluctant to remove that check ... thought I still think we should differentiate between a newly created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a boolean isNew or something like that to the ctor, and if it's a new index and the last commit point is missing, IFD will not throw the exception, but silently ignore that? So the code would become something like this: {code} if (currentCommitPoint == null !isNew) { } {code} Does this make sense, or am I missing something? IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855140#action_12855140 ] Shai Erera commented on LUCENE-2385: I did that first, but then remembered that when I did that in the past, people were unable to apply my patches, w/o doing the svn move themselves. Anyway, for this file it's not really important I think - a very simple and tiny file, w/ no history to preserve? Is that ok for this file (b/c I have no idea how to do the svn move now ... after I've made all the changes already) :) Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855148#action_12855148 ] Shai Erera commented on LUCENE-2386: Looking at IFD again, I think a boolean ctor arg is not required. What I can do is check if any Lucene file has been seen (in the for-loop iteration on the Directory files), and if not, then deduce it's a new Directory, and skip that 'if' check. I'll give it a shot. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2385: --- Attachment: LUCENE-2385.patch Is it better now? Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855155#action_12855155 ] Shai Erera commented on LUCENE-2385: Forgot to mention that the only move I made was of NoDeletionPolicy: svn move contrib/benchmark/src/java/org/apache/lucene/benchmark/utils/NoDeletionPolicy.java src/java/org/apache/lucene/index/NoDeletionPolicy.java I'll remember that in the future Uwe - thanks for the heads up ! Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2385. Resolution: Fixed Committed revision 932129. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch First stab at this. Patch still missing CHANGES entry, and I haven't run all the tests, just TestIndexWriter. With those changes it passes. One thing that I think should be fixed is testImmediateDiskFull - if I don't add writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes returns 0 (no RAMFiles yet), and then the test succeeds at adding one document. So maybe just change the test to set maxSizeInBytes to '1', always? TestNoDeletionPolicy is not covered by this patch (should be fixed as well, because now the number of commits is exactly N and not N+1). Will fix it tomorrow. Anyway, it's really late now, so hopefully some fresh eyes will look at it while I'm away, and comment on the proposed changes. I hope I got all the changes to the tests right. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855265#action_12855265 ] Shai Erera commented on LUCENE-2386: bq. Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? Good idea ! Did it and it works. Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index. So what do you think - do this change anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do it? Pros is that I won't need to change many of the tests because they still use the LUCENE_30 version (but that is not a strong argument), so it's a weak Pro. Cons is that IW will keep having that doCommit handling in its ctor, only now w/ added comments on why this is being kept around etc. What do you think? IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855277#action_12855277 ] Shai Erera commented on LUCENE-2386: Apparently, there are more tests that fail ... lost count but easy fixing. I tried writing the following test: {code} public void testNoCommits() throws Exception { // Tests that if we don't call commit(), the directory has 0 commits. This has // changed since LUCENE-2386, where before IW would always commit on a fresh // new index. Directory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT))); assertEquals(expected 0 commits!, 0, IndexReader.listCommits(dir).size()); // No changes still should generate a commit, because it's a new index. writer.close(); assertEquals(expected 1 commits!, 0, IndexReader.listCommits(dir).size()); } {code} Simple test - validates that no commits are present following a freshly new index creation, w/o closing or committing. However, IndexReader.listCommits fails w/ the following exception: {code} java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.ramdirect...@2d262d26: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023) at org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341) at org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966) {code} The failure occurs when SegmentInfos attempts to find segments.gen and fails. So I wonder if I should fix DirectoryReader to catch that exception and simply return an empty Collection .. or I should fix SegmentInfos at this point -- notice the files: [] at the end - I think that by adding a check to the following code (SegmentInfos, line 652) which validates that there were any files before throwing the exception, it'll still work properly and safely (i.e. to detect a problematic Directory). Will need probably to break away from the while loop and I guess fix some other things in upper layers ... therefore I'm not sure if I should not simply catch that exception in DirectoryReader.listCommits w/ proper documentation and be done w/ it. After all, it's not supposed to be called ... ever? or hardly ever? {code} if (gen == -1) { // Neither approach found a generation throw new FileNotFoundException(no segments* file found in + directory + : files: + Arrays.toString(files)); } {code} IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1709: --- Attachment: LUCENE-1709-2.patch Since I had the changes on my local env. I thought it's best to generate a patch out of them, so they don't get lost. The patch doesn't cover the ant .jars, only the changes to common-build.xml as well as benchmark/build.xml Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
[ https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2377. Resolution: Fixed Committed revision 931502. Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark - Key: LUCENE-2377 URL: https://issues.apache.org/jira/browse/LUCENE-2377 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1 Attachments: LUCENE-2377.patch Benchmark allows one to set the MP and MS to use, by defining the class name and then use reflection to instantiate them. However NoMP and NoMS are singletons and therefore reflection does not work for them. Easy fix in CreateIndexTask. I'll post a patch soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854588#action_12854588 ] Shai Erera commented on LUCENE-2353: Actually, we've reopened LUCENE-1709 to track that. This is not related to this issue's changes, but seems to be related to benchmark test in specifically. Please have a look there at a patch I've posted which forces benchmark tests to run in sequential mode. Additionally, you can 'ant test -Drunsequential=1' from the command line, benchmark's root folder, to achieve the same. And it'd be great if you post the above on LUCENE-1709 as well -- because now I know I'm not the only one running into this :). Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch, LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854348#action_12854348 ] Shai Erera commented on LUCENE-1709: One more thing - change benchmark tests to run sequentially (by adding the property). Robert, are you going to tackle that soon? Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark - Key: LUCENE-2377 URL: https://issues.apache.org/jira/browse/LUCENE-2377 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1 Benchmark allows one to set the MP and MS to use, by defining the class name and then use reflection to instantiate them. However NoMP and NoMS are singletons and therefore reflection does not work for them. Easy fix in CreateIndexTask. I'll post a patch soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
[ https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2377: --- Attachment: LUCENE-2377.patch Patch includes both fix to CreateIndexTask as well as relevant tests to CreateIndexTaskTest. I plan to commit later today if there are no objections. Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark - Key: LUCENE-2377 URL: https://issues.apache.org/jira/browse/LUCENE-2377 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1 Attachments: LUCENE-2377.patch Benchmark allows one to set the MP and MS to use, by defining the class name and then use reflection to instantiate them. However NoMP and NoMS are singletons and therefore reflection does not work for them. Easy fix in CreateIndexTask. I'll post a patch soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851829#action_12851829 ] Shai Erera commented on LUCENE-2310: +1 for this simplification. Can we just name it Indexable, and omit Document from it? That way, it's both shorter and less chances for users to directly link it w/ Document. One thing I didn't understand though, is what will happen to ir/is.doc() method? Will those be deprecated in favor of some other class which receives an IR as parameter and knows how to re-construct Indexable(Document)? Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera reassigned LUCENE-2353: -- Assignee: Shai Erera Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch, LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851836#action_12851836 ] Shai Erera commented on LUCENE-2353: Unless there are objections, I plan to commit this shortly Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch, LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851842#action_12851842 ] Shai Erera commented on LUCENE-2310: Right Earwin - agreed. I'd like to summarize a brief discussion we had on IRC around that: The idea is not to provide another interface/class for search purposes, but rather expose the right API from IndexReader, even if it might be a bit low-level. API like getIndexedFields(docId) and getStorefFields(docId), both optionally take a FieldSelector, should allow the application to re-construct its Indexable however it wants. And IR/IS don't need to know anything about that. To complete the picture for current users, we can have a static reconstruct() on Document which takes IR, docId and FieldSelector ... BTW, I'm not even sure getIndedxedFields can be efficiently supported today. Just listing it here for completeness. Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2353. Resolution: Fixed Committed revision 929520. Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch, LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2353: --- Attachment: LUCENE-2353.patch Updated to also match 'c:/temp' like paths, which are also accepted on Windows Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch, LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850644#action_12850644 ] Shai Erera commented on LUCENE-2353: I don't have an account yet, so I cannot commit this on my own. Any volunteers? Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Fix For: 3.1 I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2353: --- Attachment: LUCENE-2353.patch The fix is only relevant to get(String, String) and not to all other get(String, type) variants. Benchmark test passed but after I svn up (to include the latest parallel test thing) the test just sits idle (after finishing), waiting for something. If I run the tests in eclipse they pass. So I'm guessing it's a problem w/ my env. or build.xml? I also tried 'ant clean test' from within benchmark, but it didn't help. I then tried 'ant clean' from root, and 'ant test' from benchmark, but the test just keeps waiting on WriteLineDocTaskTest, on this line: [junit] config properties: [junit] directory = RAMDirectory [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker [junit] line.file.out = D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line [junit] --- I think this can go in (if it passes on someone else's machine, while I figure out what's wrong in my env. separately. Config incorrectly handles Windows absolute pathnames - Key: LUCENE-2353 URL: https://issues.apache.org/jira/browse/LUCENE-2353 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-2353.patch I have no idea how no one ran into this so far, but I tried to execute an .alg file which used ReutersContentSource and referenced both docs.dir and work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run reported an error of missing content under benchmark\work\something. I've traced the problem back to Config, where get(String, String) includes the following code: {code} if (sval.indexOf(:) 0) { return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); ... {code} It detects : in the value and so it thinks it's a per-round property, thus stripping d: from the value ... fix is very simple: {code} if (sval.indexOf(:) 0) { return sval; } else if (sval.indexOf(:\\) = 0) { // this previously messed up absolute path names on Windows. Assuming // there is no real value that starts with \\ return sval; } // first time this prop is extracted by round int k = sval.indexOf(:); String colName = sval.substring(0, k); sval = sval.substring(k + 1); {code} I'll post a patch w/ the above fix + test shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850075#action_12850075 ] Shai Erera commented on LUCENE-2345: Earwin, w/o knowing too much about the details of your work, I wanted to comment on get rid of of init/reinit/moreinit methods, moving the code to constructors. I work now on Parallel Index and one of the things I do is extend IW. Currently, IW's ctor code performs the initialization, however I'm thinking to move that code to an init method. The reason is to allow easy extensions of IW, such as LUCENE-2330. There I'm going to add a default ctor to IW, accompanied by an init method the extending class can call if needed. So what I'm trying to say is that init methods are not always bad, and sometimes ctors limit you. Perhaps it would make sense though in what you're trying to do ... Make it possible to subclass SegmentReader -- Key: LUCENE-2345 URL: https://issues.apache.org/jira/browse/LUCENE-2345 Project: Lucene - Java Issue Type: Wish Components: Index Reporter: Tim Smith Fix For: 3.1 Attachments: LUCENE-2345_3.0.patch I would like the ability to subclass SegmentReader for numerous reasons: * to capture initialization/close events * attach custom objects to an instance of a segment reader (caches, statistics, so on and so forth) * override methods on segment reader as needed currently this isn't really possible I propose adding a SegmentReaderFactory that would allow creating custom subclasses of SegmentReader default implementation would be something like: {code} public class SegmentReaderFactory { public SegmentReader get(boolean readOnly) { return readOnly ? new ReadOnlySegmentReader() : new SegmentReader(); } public SegmentReader reopen(SegmentReader reader, boolean readOnly) { return newSegmentReader(readOnly); } } {code} It would then be made possible to pass a SegmentReaderFactory to IndexWriter (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, etc) I could prepare a patch if others think this has merit Obviously, this API would be experimental/advanced/will change in future -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850083#action_12850083 ] Shai Erera commented on LUCENE-2345: Thanks Uwe, I know that ctor is the preferred way, and in the process of introducing IWC I delete IW.init which all ctors called and pulled all the code to IW ctor. I will make that init() on IW final. But sometimes putting code in init() is not bad (and it's used in Lucene elsewhere too (e.g. PQ and up until recently IW). Make it possible to subclass SegmentReader -- Key: LUCENE-2345 URL: https://issues.apache.org/jira/browse/LUCENE-2345 Project: Lucene - Java Issue Type: Wish Components: Index Reporter: Tim Smith Fix For: 3.1 Attachments: LUCENE-2345_3.0.patch I would like the ability to subclass SegmentReader for numerous reasons: * to capture initialization/close events * attach custom objects to an instance of a segment reader (caches, statistics, so on and so forth) * override methods on segment reader as needed currently this isn't really possible I propose adding a SegmentReaderFactory that would allow creating custom subclasses of SegmentReader default implementation would be something like: {code} public class SegmentReaderFactory { public SegmentReader get(boolean readOnly) { return readOnly ? new ReadOnlySegmentReader() : new SegmentReader(); } public SegmentReader reopen(SegmentReader reader, boolean readOnly) { return newSegmentReader(readOnly); } } {code} It would then be made possible to pass a SegmentReaderFactory to IndexWriter (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, etc) I could prepare a patch if others think this has merit Obviously, this API would be experimental/advanced/will change in future -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850086#action_12850086 ] Shai Erera commented on LUCENE-2215: Sure let's wait for the patch and some perf. results. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850094#action_12850094 ] Shai Erera commented on LUCENE-2345: Earwin, I wholeheartedly agree with what you wrote. If we could refactor IW and extract it to a set of interfaces, then I agree (and Michael B. has an issue open for that). I think though that IW's API is already that interface (give or take few methods). So perhaps this can be an easy refactoring - introduce an Indexer (a la Searcher) class (or interface) w/ all of IW public methods, and then let PW extend/impl that class/interface as well as IW. We can also consider making IW itself final this way (though bw police will prevent it :)). Then when PW sets up the slices, it can create them as IW or any other IW-like implementation it needs them to impl. If it sounds good enough to become its own issue, I can open one and we can continue discussing it there (and leave that issue focused on extending SR). Then I'll hold off w/ LUCENE-2330, or simply rename it to reflect that Indexer API. Make it possible to subclass SegmentReader -- Key: LUCENE-2345 URL: https://issues.apache.org/jira/browse/LUCENE-2345 Project: Lucene - Java Issue Type: Wish Components: Index Reporter: Tim Smith Fix For: 3.1 Attachments: LUCENE-2345_3.0.patch I would like the ability to subclass SegmentReader for numerous reasons: * to capture initialization/close events * attach custom objects to an instance of a segment reader (caches, statistics, so on and so forth) * override methods on segment reader as needed currently this isn't really possible I propose adding a SegmentReaderFactory that would allow creating custom subclasses of SegmentReader default implementation would be something like: {code} public class SegmentReaderFactory { public SegmentReader get(boolean readOnly) { return readOnly ? new ReadOnlySegmentReader() : new SegmentReader(); } public SegmentReader reopen(SegmentReader reader, boolean readOnly) { return newSegmentReader(readOnly); } } {code} It would then be made possible to pass a SegmentReaderFactory to IndexWriter (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, etc) I could prepare a patch if others think this has merit Obviously, this API would be experimental/advanced/will change in future -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1879) Parallel incremental indexing
[ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850313#action_12850313 ] Shai Erera commented on LUCENE-1879: The way I planned to support multi-threaded indexing is to do a two-phase addDocument. First, allocate a doc ID from DocumentsWriter (synchronized) and then add the Document to each Slice with that doc ID. DocumentsWriter was not suppose to know it is a parallel index ... something like the following. {code} int docId = obtainDocId(); for (IndexWriter slice : slices) { slice.addDocument(docId, Document); } {code} That allows ParallelWriter to be really an orchestrator/manager of all slices, while each slice can be an IW on its own. Now, when you say ParallelDocumentsWriter, I assume you mean that that DocWriter will be aware of the slices? That I think is an interesting idea, which is unrelated to LUCENE-2324. I.e., ParallelWriter will invoke its addDocument code which will get down to ParallelDocumentWriter, which will allocate the doc ID itself and call each slice's DocWriter.addDocument? And then LUCENE-2324 will just improve the performance of that process? This might require a bigger change to IW then I had anticipated, but perhaps it's worth it. What do you think? Parallel incremental indexing - Key: LUCENE-1879 URL: https://issues.apache.org/jira/browse/LUCENE-1879 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael Busch Assignee: Michael Busch Fix For: 3.1 Attachments: parallel_incremental_indexing.tar A new feature that allows building parallel indexes and keeping them in sync on a docID level, independent of the choice of the MergePolicy/MergeScheduler. Find details on the wiki page for this feature: http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing Discussion on java-dev: http://markmail.org/thread/ql3oxzkob7aqf3jd -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1879) Parallel incremental indexing
[ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850336#action_12850336 ] Shai Erera commented on LUCENE-1879: Hi Grant - I believe what you describe is related to solving the incremental field updates problem, where someone might want to change the value of a specific document's field. But PI is not about that. Rather, PI is about updating a whole slice at once, ie, changing a field's value across all docs, or adding a field to all docs (I believe such question was asked on the user list few days ago). I've listed above several scenarios where PI is useful for, but unfortunately it is unrelated to incremental field updates. If I misunderstood you, then please clarify. Re incremental field updates, I think your direction is interesting, and deserves discussion, but in a separate issue/thread? Parallel incremental indexing - Key: LUCENE-1879 URL: https://issues.apache.org/jira/browse/LUCENE-1879 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael Busch Assignee: Michael Busch Fix For: 3.1 Attachments: parallel_incremental_indexing.tar A new feature that allows building parallel indexes and keeping them in sync on a docID level, independent of the choice of the MergePolicy/MergeScheduler. Find details on the wiki page for this feature: http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing Discussion on java-dev: http://markmail.org/thread/ql3oxzkob7aqf3jd -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849728#action_12849728 ] Shai Erera commented on LUCENE-2345: bq. The IndexWriter now has a getter and setter for setting this If this is not expected to change during the lifetime of IW, I think it should be added to IWC when you upgrade the patch to 3.1. Make it possible to subclass SegmentReader -- Key: LUCENE-2345 URL: https://issues.apache.org/jira/browse/LUCENE-2345 Project: Lucene - Java Issue Type: Wish Components: Index Reporter: Tim Smith Fix For: 3.1 Attachments: LUCENE-2345_3.0.patch I would like the ability to subclass SegmentReader for numerous reasons: * to capture initialization/close events * attach custom objects to an instance of a segment reader (caches, statistics, so on and so forth) * override methods on segment reader as needed currently this isn't really possible I propose adding a SegmentReaderFactory that would allow creating custom subclasses of SegmentReader default implementation would be something like: {code} public class SegmentReaderFactory { public SegmentReader get(boolean readOnly) { return readOnly ? new ReadOnlySegmentReader() : new SegmentReader(); } public SegmentReader reopen(SegmentReader reader, boolean readOnly) { return newSegmentReader(readOnly); } } {code} It would then be made possible to pass a SegmentReaderFactory to IndexWriter (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, etc) I could prepare a patch if others think this has merit Obviously, this API would be experimental/advanced/will change in future -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850002#action_12850002 ] Shai Erera commented on LUCENE-2215: bq. since I think it's safe to say most applications implement paging Let's be careful about the semantics here Grant. Most if not all applications implement paging indeed, but I believe only FEW actually store user contexts between searches. PagingCollector relies on the application to store the lowest ranking doc that was returned previously, which means storing context between user's searches. I agree w/ Mike's statement about 99.9% of the searches would never run that code, which is why I've proposed a delegation/wrapper approach from the beginning. I also think that we should make some allowances here and there, for the non-common case, and introduce better software design than specialized code. A Collector filter approach for some rare (or even less common) cases seems very reasonable to me. Also, I think that if we add to TSDC a create method which takes into account the previously scored lowest doc, it will confuse people. Now they will need to think where do I get this low score from? - but perhaps after I see the code, it wouldn't be such a bad thing just have a feeling TSDC and TFC should be left on their own, and extreme paging stuff should either be its own specialized collector, or a wrapper. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849200#action_12849200 ] Shai Erera commented on LUCENE-2215: So what's the motivation of declaring PagingCollector a TopDocsCollector? Would you envision one to request for a TopDocsCollector but don't care if it's TSDC, TFC or PagingCollector? I would rather have it extend TDC directly, and then you won't need to throw UOE for the rest of the methods ... What about renaming it to TopScorePagingCollector? paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors
[ https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849384#action_12849384 ] Shai Erera commented on LUCENE-2343: In the patch you write: topDocOrdered - Creates a TopDocCollector that requires in order docs - did you mean TopScoreDocCollector? Because TopDocCollector is abstract ... I think the following: {code} + Class? extends Collector clazz = (Class? extends Collector) Class.forName(clnName); + collector = clazz.newInstance(); {code} can be written as Class.forName(clnName).asSubclass(Collector.class).newInstance(); Also, and it's a style issue, can you remove the '== true/false' from ifs? I'd change *if (clnName.equals() == false)* to *if (clnName.length() 0)*. Why does benchmark/build.xml now relies on the compiled classes/test (of core)? Add support for benchmarking Collectors --- Key: LUCENE-2343 URL: https://issues.apache.org/jira/browse/LUCENE-2343 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2343.patch As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors
[ https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849393#action_12849393 ] Shai Erera commented on LUCENE-2343: ok I won't argue about == true/false. It's a style thing and I'm not too fanatic about it :). Add support for benchmarking Collectors --- Key: LUCENE-2343 URL: https://issues.apache.org/jira/browse/LUCENE-2343 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2343.patch As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors
[ https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849403#action_12849403 ] Shai Erera commented on LUCENE-2343: I wasn't talking about the name of the parameter but about the comment in the javadoc. TopDocsCollector is a typo - should have been TopScoreDocCollector. If you also want to change the name of the parameter in the .alg file that's ok as well, though I'm fine w/ topDocOrdered/Unordered. Add support for benchmarking Collectors --- Key: LUCENE-2343 URL: https://issues.apache.org/jira/browse/LUCENE-2343 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2343.patch As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849404#action_12849404 ] Shai Erera commented on LUCENE-2339: Do we want to suppress only IOExceptions? What about any RuntimeExceptions - upon hitting any of them the code will fly away? Not saying it's a bad thing, but pointing it out. Other than that, the patch looks good. closeSafely is not exactly what I had in mind about closeNoException because it forces you to catch the IOE if you don't declare you throw it, or you need to move on, discarding it. But I guess this is a matter for another issue. Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors
[ https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849416#action_12849416 ] Shai Erera commented on LUCENE-2343: Looks good ! Add support for benchmarking Collectors --- Key: LUCENE-2343 URL: https://issues.apache.org/jira/browse/LUCENE-2343 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-2343.patch, LUCENE-2343.patch As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors
[ https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849435#action_12849435 ] Shai Erera commented on LUCENE-2343: I've just realized you haven't added a CHANGES entry (and I missed that in my previous review, sorry). Add support for benchmarking Collectors --- Key: LUCENE-2343 URL: https://issues.apache.org/jira/browse/LUCENE-2343 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1 Attachments: LUCENE-2343.patch, LUCENE-2343.patch As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2342) DisjunctionSumScorer explain
[ https://issues.apache.org/jira/browse/LUCENE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848560#action_12848560 ] Shai Erera commented on LUCENE-2342: Took me a while to spot the typo :). Can you reproduce a problem w/ a nice test case? So that we won't run into this issue in the future again. DisjunctionSumScorer explain Key: LUCENE-2342 URL: https://issues.apache.org/jira/browse/LUCENE-2342 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Gary Yngve Priority: Minor Original Estimate: 0.17h Remaining Estimate: 0.17h The bottom of the explain method in DisjunctionSumScorer says if (nrMatchers = minimumNrMatchers) { This is incorrect.. it should say if (nrMatches = minimumNrMatchers) { nrMatchers is the instance variable used for advancing, whereas nrMatches is explain's local variable. Minor, because I don't think DSS's explain is ever called by anything (BooleanWeight has its own explain)? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848565#action_12848565 ] Shai Erera commented on LUCENE-2339: I personally haven't seen problem using NIO on Windows, but that's perhaps just because I haven't run into them yet :). I think your proposal makes sense - let's start w/ NIO bulk-copy and then we can disable if people complain or report errors. Consistency is important, I agree. So let's keep Collection there. I just wanted to avoid converting arrays to a Collection, just so that they can be iterated on. Seems a waste to me, but not so much to argue about :). Re (7), I hate such libraries too. But I hate more the ones that just hide problems away from me :). The ideal thing was if Lucene would use a logging mechanism (I once started it on LUCENE-1482) so that you could include the stacktrace print if logging is enabled. But currently the code just hides the problem away ... and I'd hate to debug such thing, not realizing an IO exception is thrown from close(). So unless LUCENE-1482 springs back to life again, what do you suggest we do? Suppressing the exceptions seems wrong to me. Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848571#action_12848571 ] Shai Erera commented on LUCENE-1482: Well ... since Mark hasn't closed it yet (thanks Mark :)), I thought to try once more. Perhaps w/ the merge of Lucene/Solr this will look more reasonable now? I personally feel that just setting InfoStream on IW is not enough. I don't think we need to control logging per level either. I think it's important to introduce this in at least one of the following modes: # We add SLF4J and allow the application to control logging per package(s), but the logging level won't matter - as long as it's not OFF, we log. # We add a static factory LuceneLogger or something, which turns logging on/off, in which case all components/packages either log or not. I think (1) gives us greater flexibility (us as in the apps developers), but (2) is also acceptable. As long as we can introduce logging messages from more components w/o passing infoStream around ... On LUCENE-2339 for example, a closeSafely method was added which suppresses IOExceptions that may be caused by io.close(). You cannot print the stacktrace because that would be unacceptable w/ products that are not allowed to print anything unless logging has been enabled, but on the other hand suppressing the exception is not good either ... in this case, a LuceneLogger could have helped because you could print the stacktrace if logging was enabled. Replace infoSteram by a logging framework (SLF4J) - Key: LUCENE-1482 URL: https://issues.apache.org/jira/browse/LUCENE-1482 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar Lucene makes use of infoStream to output messages in its indexing code only. For debugging purposes, when the search application is run on the customer side, getting messages from other code flows, like search, query parsing, analysis etc can be extremely useful. There are two main problems with infoStream today: 1. It is owned by IndexWriter, so if I want to add logging capabilities to other classes I need to either expose an API or propagate infoStream to all classes (see for example DocumentsWriter, which receives its infoStream instance from IndexWriter). 2. I can either turn debugging on or off, for the entire code. Introducing a logging framework can allow each class to control its logging independently, and more importantly, allows the application to turn on logging for only specific areas in the code (i.e., org.apache.lucene.index.*). I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, as it names states, a facade over different logging frameworks. As such, you can include the slf4j.jar in your application, and it recognizes at deploy time what is the actual logging framework you'd like to use. SLF4J comes with several adapters for Java logging, Log4j and others. If you know your application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in your classpath, and your logging statements will use Java logging underneath the covers. This makes the logging code very simple. For a class A the logger will be instantiated like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); } And will later be used like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); public void foo() { if (logger.isDebugEnabled()) { logger.debug(message); } } } That's all ! Checking for isDebugEnabled is very quick, at least using the JDK14 adapter (but I assume it's fast also over other logging frameworks). The important thing is, every class controls its own logger. Not all classes have to output logging messages, and we can improve Lucene's logging gradually, w/o changing the API, by adding more logging messages to interesting classes. I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848606#action_12848606 ] Shai Erera commented on LUCENE-2339: Sorry ... I was confused w/ the for loop of Java 5 :). Let's keep it Collection then. Sorry for the hassle. Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848636#action_12848636 ] Shai Erera commented on LUCENE-2339: I don't want to block the issue. If LUCENE-1482 will advance somewhere, we'll log a message in closeSafely. Otherwise between suppressing to always printing I agree we should suppress. If someone does not want to suppress he should call close(). Which makes me think we should call this method closeNoException because closeSafely is not exactly what it does :). Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848729#action_12848729 ] Shai Erera commented on LUCENE-2339: Mike, that's what I wrote above if someone does not want to suppress, he should call close. I think that closeSafely (or as I prefer it - closeNoException) should be closed only when you know you've hit an exception and you want to close the stream suppressing any exceptions. Otherwise call close(). bq. can we add a boolean arg (suppressExceptions) to control that? That would beat the purpose of the method no? I mean, currently it does not throw any exception, not even declaring one, and if we add that boolean it will need to declare throws IOException, which will force the caller to try-catch that exception and ... suppress it or document // cannot happen because I've passed false? So how about we call it closeNoException, document that it does not throw any exception and intentionally suppresses them, and if you don't want them to be suppressed, you can call io.close() yourself? Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848753#action_12848753 ] Shai Erera commented on LUCENE-2339: bq. But there is still a need to close everything, but do throw the 1st exception you hit. Ohh I see what you mean. My assumption is that when you call closeNoException you already know that you've hit an exception and just want to close the stream w/o getting more exceptions. If you don't know that, don't call closeNoException? Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848777#action_12848777 ] Shai Erera commented on LUCENE-2339: Ok that's indeed different :). I guess we can introduce it now, in this issue (it's tiny and simple). A closeAll which documents it throws the first exception it hits. Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848896#action_12848896 ] Shai Erera commented on LUCENE-2215: I've reviewed PagingCollector.java and the first thing I have to say about it is that I really like it ! :) Saves lots of unnecessary heapify code, if the application can allow itself to store the lowest last SD. I have few comments/questions. I don't understand what getLastScoreDoc is for? Is it just a utility method? Is it something the app can compute by itself? Anyway, it lacks javadocs, so perhaps if they existed I wouldn't need to ask ;). In collect(), there's the following code: {code} } else if (score == previousPassLowest.score doc = previousPassLowest.doc) { // if the scores are the same and the doc is less than or equal to // the // previous pass lowest hit doc then skip because this collector // favors // lower number documents. return; {code} I think there's a typo in the comment favors lower number documents .. while it seems to prefer higher doc IDs? The way I understand it, irregardless of whether docs are collected in/out of order, HitQueue ensures that when scores are equals, the lowest IDs are favored. Thus the first round always keeps the lowest IDs among the docs whose scores match. The next round will favor the docs whose IDs come next, and so forth ... am I right? (just clarifying my understanding). If that's the case, I think it'll be good if it's spelled out in the comment, and also mention that it means that document has already been returned previously (like it's documented in the previous 'if'). The last 'else' really looks like TSDC's out-of-order version, which makes me think whether PagingCollector can be viewed as a filter on top of TSDC (and possibly even TopFieldCollector)? So if a hit should be collected, it just calls super.collect? I realize though that a Collector is a hotspot and we want to minimize 'if' let alone method call statements as much as possible. But it just feels so strong that it should be a filter ... :). And you wouldn't need to specifically handle in/out orderness ... and w/ the right design, it can also wrap a TFC or any other TDC implementation ... BTW, I've noticed that you don't track maxScore - is it assumed that the application stores it from the first round? If so I'd document it, because the application needs to know it should use TSDC the first round, and PagingCollector the second round. Also, PagingCollector offers a ctor which does not force the application to pass in a ScoreDoc. See my comment from above - it might be misleading, because if you use this collector right from the very first search, you lose the maxScore tracking. I also don't see why it should be allowed - if a dummy previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. I think this collector should be used only from the second round, and a single ctor which forces a ScoreDoc to be passed would make more sense. If the application wishes to shoot itself in the leg (performance-wise), it can pass a dummy SD itself. paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2215) paging collector
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848908#action_12848908 ] Shai Erera commented on LUCENE-2215: I must admit I don't like throwing UOE. I imagine the naive user calling one of these and hit w/ UOE out of nowhere really :). Perhaps it's a sign PagingCollector should not be a sub-class of TopDocsCollector? It does not benefit from it in any way because it overrides all the main methods, impls them or throws UOE for those it doesn't like. So perhaps it should just be a TopScorePagingCollector which copies some of the functionality of TSDC, but is not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the rest don't make any sense). Notice the different name I propose - to make it clear it's a collector that can be used for paging through a scored list of results. I BTW liked that the if/else clauses were separated, b/c you could include meaningful documentation for each. Right now those are just very long lines. About in-order, I think the only thing you will save is the last 'else'. Read my comment above about wrapping TSDC ... not sure about it, but it will make it more elegant. I'll review the rest of the patch. Didn't yet understand what's PagingIterable for ... paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 2.4, 3.0 Reporter: Adam Heinz Assignee: Grant Ingersoll Priority: Minor Attachments: IterablePaging.java, LUCENE-2215.patch, PagingCollector.java, TestingPagingCollector.java http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 Somebody assign this to Aaron McCurry and we'll see if we can get enough votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2331: --- Attachment: LUCENE-2331.patch Sorry - new eclipse and project settings :). Should be ok now. Add NoOpMergePolicy --- Key: LUCENE-2331 URL: https://issues.apache.org/jira/browse/LUCENE-2331 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Shai Erera Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2331.patch, LUCENE-2331.patch I'd like to add a simple and useful MP implementation which does nothing ! :). I've came across many places where either the following is documented or implemented: if you want to prevent merges, set mergeFactor to a high enough value. I think a NoOpMergePolicy is just as good, and can REALLY allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL). As such, NoOpMergePolicy will be introduced as a singleton, and can be used for convenience purposes only. Also, for Parallel Index it's important, because I'd like the slices to never do any merges, unless ParallelWriter decides so. So they should be set w/ that MP. I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need to change it afterwards. About the name - I like the name, but suggestions are welcome. I thought of a NullMergePolicy, but I don't like 'Null' used for a NoOp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848113#action_12848113 ] Shai Erera commented on LUCENE-2331: bq. do you think we should allow instantiation of NoMergePolicy, allowing you to control if it uses CFS or not? You ask because of the useCompound* methods? I wanted NMP to be a singleton really, and I don't think those two really matter? Meaning, if you are using it, I guess you don't really care if it uses a cmpnd file or not? But if you think it's important, I can create 3 singletons: NO_COMPOUND_FILES_AND_STORE, COMPOUND_FILES, COMPOUND_FILES_AND_STORE (I really hate the long names though). We can settle w/ just two - (NO)COMPOUND_FILES ... Add NoOpMergePolicy --- Key: LUCENE-2331 URL: https://issues.apache.org/jira/browse/LUCENE-2331 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Shai Erera Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2331.patch, LUCENE-2331.patch I'd like to add a simple and useful MP implementation which does nothing ! :). I've came across many places where either the following is documented or implemented: if you want to prevent merges, set mergeFactor to a high enough value. I think a NoOpMergePolicy is just as good, and can REALLY allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL). As such, NoOpMergePolicy will be introduced as a singleton, and can be used for convenience purposes only. Also, for Parallel Index it's important, because I'd like the slices to never do any merges, unless ParallelWriter decides so. So they should be set w/ that MP. I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need to change it afterwards. About the name - I like the name, but suggestions are welcome. I thought of a NullMergePolicy, but I don't like 'Null' used for a NoOp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2331: --- Attachment: LUCENE-2331.patch Patch includes NoMergePolicy.NO_COMPOUND_FILES and COMPOUND_FILES singletons. Add NoOpMergePolicy --- Key: LUCENE-2331 URL: https://issues.apache.org/jira/browse/LUCENE-2331 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Shai Erera Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch I'd like to add a simple and useful MP implementation which does nothing ! :). I've came across many places where either the following is documented or implemented: if you want to prevent merges, set mergeFactor to a high enough value. I think a NoOpMergePolicy is just as good, and can REALLY allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL). As such, NoOpMergePolicy will be introduced as a singleton, and can be used for convenience purposes only. Also, for Parallel Index it's important, because I'd like the slices to never do any merges, unless ParallelWriter decides so. So they should be set w/ that MP. I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need to change it afterwards. About the name - I like the name, but suggestions are welcome. I thought of a NullMergePolicy, but I don't like 'Null' used for a NoOp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848192#action_12848192 ] Shai Erera commented on LUCENE-2331: I think it's correct. The idea is to say that even w/ NMP, if you use NMS you ensure that no MS code is ever run (e.g. if you use NMP only, then CMS code [default] will always run but won't do anything). Add NoOpMergePolicy --- Key: LUCENE-2331 URL: https://issues.apache.org/jira/browse/LUCENE-2331 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Shai Erera Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch I'd like to add a simple and useful MP implementation which does nothing ! :). I've came across many places where either the following is documented or implemented: if you want to prevent merges, set mergeFactor to a high enough value. I think a NoOpMergePolicy is just as good, and can REALLY allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL). As such, NoOpMergePolicy will be introduced as a singleton, and can be used for convenience purposes only. Also, for Parallel Index it's important, because I'd like the slices to never do any merges, unless ParallelWriter decides so. So they should be set w/ that MP. I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need to change it afterwards. About the name - I like the name, but suggestions are welcome. I thought of a NullMergePolicy, but I don't like 'Null' used for a NoOp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848341#action_12848341 ] Shai Erera commented on LUCENE-2328: Earwin, can you add a deprecation message to sync(String)? When I upgraded from 2.9 to 3.0 some methods were deprecated w/o any explanation as to what I should use instead. I think a message like @deprecated use #sync(Collection) instead. For easy migration you can change your code to call sync(Colllections.singleton(name)) ... or something along those lines. Other than that, patch looks great! I really like the code cleanup from IW. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied
[ https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848376#action_12848376 ] Shai Erera commented on LUCENE-2339: Patch looks good! Few comments: # is it safe to use NIO for all FSDirs? I thought that on Windows NIO has some bugs/limitations. In that case, would it be safer if just NIOFSDir used NIO? # Can copyTo(Directory, CollectionString) be changed to copyTo(Directory, IterableString)? Unless we think that someone would want to use size() or something. # I know it's a matter of style, but you import static Arrays.asList, and then use asList directly in copyTo(Dir). It confuses me because I expect asList to be a method declared on Dir, and so I prefer to see Arrays.asList. But it's just style, don't know how others feel about that. # On copyTo(Dir), perhaps instead of converting the listAll() to List and then remove elements from it, you can just iterate on whatever listAll() returns and add the files that pass the filter to a list? You can even optimize and if all the files Dir returned pass the filter, you can just pass the array to copyTo(Dir, Iterable), assuming we change the method to accept Iterable. But that's a minor optimization. # copy(src, dest, boolean) - can you add a message to @deprecated so users will know what to replace it with more easily? # I see that copy(src, dest) also accepts a boolean of whether to close the src directory. But copyTo(dIr) doesn't. I personally think it's ok, as someone can call close on src himself, but am wondering if it wouldn't be more convenient. I.e. instead of change calls from Directory.copy(src, dest, true), I now need to do src.copyTo(dest) followed by a src.close(). # closeSafely - perhaps print the stacktrace, even if you don't throw it? Allow Directory.copy() to accept a collection of file names to be copied Key: LUCENE-2339 URL: https://issues.apache.org/jira/browse/LUCENE-2339 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Assignee: Michael McCandless Attachments: LUCENE-2339.patch, LUCENE-2339.patch Par example, I want to copy files pertaining to a certain commit, and not everything there is in a Directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2337) DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of date after move from skipTo() to advance()
[ https://issues.apache.org/jira/browse/LUCENE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847916#action_12847916 ] Shai Erera commented on LUCENE-2337: Note that -1 is a valid return value in case doc() is called before nextDoc(). However it is not valid for nextDoc() and advance(). DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of date after move from skipTo() to advance() -- Key: LUCENE-2337 URL: https://issues.apache.org/jira/browse/LUCENE-2337 Project: Lucene - Java Issue Type: Improvement Components: Javadocs, Search Reporter: Paul Elschot Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2337.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2333) Failures during contrib builds, when classes in core were changed without ant clean
[ https://issues.apache.org/jira/browse/LUCENE-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847415#action_12847415 ] Shai Erera commented on LUCENE-2333: This up-to-date thingy looks really cool and useful. So I guess you'd compare the .jar date and the build/classes/java date? This is sort of what javac does when it decides which classes to compile ... I guess. Failures during contrib builds, when classes in core were changed without ant clean --- Key: LUCENE-2333 URL: https://issues.apache.org/jira/browse/LUCENE-2333 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: LUCENE-2333.patch, shai-compile-fix.patch, shai-compile-fix2.patch From java-dev by Shai Erera: {quote} I've noticed that sometimes, after I run test-core and test-contrib, and then change core code, test-contrib fail on NoSuchMethodError and stuff like that. I've noticed that core.jar exists under build, and I assumed it's used by test-contrib, and probably is not recreated after core code has changed. I verified it when looking in contrib-build.xml, which defines a property lucene.jar.present which is set to true if the jar is ... well, present. Which I believe is the reason for these failures. I've been thinking how to resolve that, and I can think of two ways: (1) have test-core always delete that file, but that has two issues: (1.1) It's redundant if the code hasn't changed. (1.2) It forces you to either jar-core or test-core before you test-contrib, if you want to make sure you run w/ the latest jar. or (2) have test-contrib always call jar-core, which will first delete the file and then re-create it by compiling first. Compiling should not do anything if the code hasn't changed. So the only waste would be to create the .jar, but I think that's quite fast? Does anyone, with more Ant skills than me, know of a better way to detect from test-contrib that core code has changed and only then rebuild the jar? {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847448#action_12847448 ] Shai Erera commented on LUCENE-2328: Earwin, I agree that sub-classing FSDir is not that easy. So I guess you'll add another piece of jdoc to createOutput, to notify Dir when it's closed? This seems reasonable. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847585#action_12847585 ] Shai Erera commented on LUCENE-2328: bq. Trying to sync a file that hasn't yet been closed will be undefined Can we avoid 'undefined'? We have an issue open about SegmentInfos.fileLength() not clearly defined and it causes confusion. If it's undefined, then someone might attempt to call sync before he closes the file, and only then close ... can we throw an exception in that case? We can have close(), sync() and closeAndSync(). Would the latter make sense? I prefer if the API will be explicit,, and I think that throwing an exception (StillOpenException?) if sync() is called before close() is very explicit, and reasonable if accompanied by a proper jdoc. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2331: --- Attachment: LUCENE-2331.patch Patch includes: * NoMergePolicy + TestNoMergePolicy * NoMergeScheduler + TestNoMergeScheduler * MergeScheduler - methods changed to public * CHANGES entry (New Features) Add NoOpMergePolicy --- Key: LUCENE-2331 URL: https://issues.apache.org/jira/browse/LUCENE-2331 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-2331.patch I'd like to add a simple and useful MP implementation which does nothing ! :). I've came across many places where either the following is documented or implemented: if you want to prevent merges, set mergeFactor to a high enough value. I think a NoOpMergePolicy is just as good, and can REALLY allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL). As such, NoOpMergePolicy will be introduced as a singleton, and can be used for convenience purposes only. Also, for Parallel Index it's important, because I'd like the slices to never do any merges, unless ParallelWriter decides so. So they should be set w/ that MP. I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need to change it afterwards. About the name - I like the name, but suggestions are welcome. I thought of a NullMergePolicy, but I don't like 'Null' used for a NoOp. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2336) off by one: DisjunctionSumScorer::advance
[ https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847716#action_12847716 ] Shai Erera commented on LUCENE-2336: Hi Gary This has been discussed before (I'm not sure if about DisjunctionSumScorer specifically), and therefore there is also a NOTE in advance() of DISI: {code} * bNOTE:/b certain implementations may return a different value (each * time) if called several times in a row with the same target. {code} Note the *may return a different value...* part. I remember while working on LUCENE-1614 that this has been discussed and thus we ended up w/ documenting that *may return* part. See here: https://issues.apache.org/jira/browse/LUCENE-1614?focusedCommentId=12710860page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12710860 and read some above and below to see relevant discussion. I'll need to refresh my memory though why DisjunctionSumScorer works like that ... perhaps an overlook on my side from 1614, but perhaps there was a reason. Anyway, about the code example you gave above, why would you want to call advance w/ the same value many times? What's the use case? If you're only dealing w/ one DISI, then unless you really want to skip to a certain document, I don't see any reason for calling advance. The usage is typically if you have 2 or more DISIs, and one's nextDoc or advance returned a value that is greater than the other's doc() ... Also, it's risky to write the code you wrote, because some scorers, upon init are already on a certain doc (I think the Disj. ones, but maybe also the Conj. one), and so by calling advance(1), you will actually *skip* over the first document and miss a hit. Can you clarify the usage then? off by one: DisjunctionSumScorer::advance - Key: LUCENE-2336 URL: https://issues.apache.org/jira/browse/LUCENE-2336 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Gary Yngve Priority: Minor Original Estimate: 4h Remaining Estimate: 4h The bug is: if (target = currentDoc) { should be if (target currentDoc) { based on the comments for the method as well as the contract for DocIdSetIterator: Advances to the first beyond the current It can be demonstrated by: assertEquals(advance(1) first match failed, 1, scorer.advance(1)); assertEquals(advance(1) second match failed, n, scorer.advance(1)); if docId: 1 is a hit and n is the next hit. (Tests all pass if this code change is made.) I'm not labeling it as major because the class is package-protected and currently passes spec. Relevant excerpt: /** * Advances to the first match beyond the current whose document number is * greater than or equal to a given target. br * When this method is used the {...@link #explain(int)} method should not be * used. br * The implementation uses the skipTo() method on the subscorers. * * @param target * The target document number. * @return the document whose number is greater than or equal to the given * target, or -1 if none exist. */ public int advance(int target) throws IOException { if (scorerDocQueue.size() minimumNrMatchers) { return currentDoc = NO_MORE_DOCS; } if (target = currentDoc) { return currentDoc; } do { if (scorerDocQueue.topDoc() = target) { boolean b = advanceAfterCurrent(); return b ? currentDoc : (currentDoc = NO_MORE_DOCS); } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) { if (scorerDocQueue.size() minimumNrMatchers) { return currentDoc = NO_MORE_DOCS; } } } while (true); } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2320) Add MergePolicy to IndexWriterConfig
[ https://issues.apache.org/jira/browse/LUCENE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2320: --- Attachment: LUCENE-2320.patch Fixed a copy-paste comment error in IndexWriter (introduced in LUCENE-2294). Add MergePolicy to IndexWriterConfig Key: LUCENE-2320 URL: https://issues.apache.org/jira/browse/LUCENE-2320 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2320.patch, LUCENE-2320.patch, LUCENE-2320.patch, LUCENE-2320.patch, LUCENE-2320.patch Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as well. The change is not straightforward and so I've kept it for a separate issue. MergePolicy requires in its ctor an IndexWriter, however none can be passed to it before an IndexWriter actually exists. And today IW may create an MP just for it to be overridden by the application one line afterwards. I don't want to make iw member of MP non-final, or settable by extending classes, however it needs to remain protected so they can access it directly. So the proposed changes are: * Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set once (hence its name). It'll have the signature SetOnceT w/ *synchronized setT* and *T get()*. T will be declared volatile, so that get() won't be synchronized. * MP will define a *protected final SetOnceIndexWriter writer* instead of the current writer. *NOTE: this is a bw break*. any suggestions are welcomed. * MP will offer a public default ctor, together with a set(IndexWriter). * IndexWriter will set itself on MP using set(this). Note that if set will be called more than once, it will throw an exception (AlreadySetException - or does someone have a better suggestion, preferably an already existing Java exception?). That's the core idea. I'd like to post a patch soon, so I'd appreciate your review and proposals. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org