[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3867: --- Lucene Fields: New,Patch Available (was: New) Assignee: (was: Shai Erera) Wow, what awesome improvements you guys have added ! Uwe, +1 to commit. I unassigned myself - you and Dawid definitely deserve the credit! RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3867: --- Attachment: LUCENE-3867.patch Thanks Uwe ! I ran the test, and now with both J9 (IBM) and Oracle, I get this print (without enabling any flag): {code} [junit] NOTE: running test testReferenceSize [junit] NOTE: This JVM is 64bit: true [junit] NOTE: Reference size in this JVM: 8 {code} * I modified the test name to testReferenceSize (was testCompressedOops). I wrote this small test to print the differences between sizeOf(String) and estimateRamUsage(String): {code} public void testSizeOfString() throws Exception { String s = abcdefgkjdfkdsjdskljfdskfjdsf; String sub = s.substring(0, 4); System.out.println(original= + RamUsageEstimator.sizeOf(s)); System.out.println(sub= + RamUsageEstimator.sizeOf(sub)); System.out.println(checkInterned=true(orig): + new RamUsageEstimator().estimateRamUsage(s)); System.out.println(checkInterned=false(orig): + new RamUsageEstimator(false).estimateRamUsage(s)); System.out.println(checkInterned=false(sub): + new RamUsageEstimator(false).estimateRamUsage(sub)); } {code} It prints: {code} original=104 sub=56 checkInterned=true(orig): 0 checkInterned=false(orig): 98 checkInterned=false(sub): 98 {code} So clearly estimateRamUsage factors in the sub-string's larger char[]. The difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I modified AverageGuess to use constants from RUE (they are best guesses themselves). Still the test prints a difference, but now I think it's because sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed that in size(Object), and now the prints are the same. * I also fixed sizeOfArray -- if the array.length == 0, it returned 0, but it should return its header, and aligned to mod 8 as well. * I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size only. I started to add sizeOf(String), fastSizeOf(String) and deepSizeOf(String[]), but reverted to avoid the hassle -- the documentation confuses even me :). * Changed all sizeOf() to return long, and align() to take and return long. I think this is ready to commit, though I'd appreciate a second look on the MemoryModel and size(Obj) changes. Also, how about renaming MemoryModel methods to: arrayHeaderSize(), classHeaderSize(), objReferenceSize() to make them more clear and accurate? For instance, getArraySize does not return the size of an array, but its object header ... RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3867: --- Attachment: LUCENE-3867.patch Ok removed sizeOf(Object[]). One can compute it by using RUE.estimateRamSize to do a deep calculation. Geez Dawid, you took away all the reasons I originally opened the issue for ;). But at least AvgGuessMemoryModel and RUE.size() are more accurate now. And we have some useful utility methods. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3867: --- Attachment: LUCENE-3867.patch Patch adds RUE.sizeOf(String) and various sizeOf(arr[]) methods. Also fixes the ARRAY_HEADER. Uwe, I merged with your patch, with one difference -- the System.out prints in the test are printed only if VERBOSE. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect - Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3786) Create SearcherTaxoManager
[ https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3786: --- Fix Version/s: (was: 3.6) Removing 3.6 Fix version. If I'll make it by the release, I'll put it back. Create SearcherTaxoManager -- Key: LUCENE-3786 URL: https://issues.apache.org/jira/browse/LUCENE-3786 Project: Lucene - Java Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 4.0 If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index. This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance. I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call: {code} SearcherTaxoPair pair = manager.acquire(); try { IndexSearcher searcher = pair.searcher; TaxonomyReader taxoReader = pair.taxoReader; // do something with them } finally { manager.release(pair); pair = null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3793) Use ReferenceManager in DirectoryTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3793: --- Fix Version/s: (was: 3.6) Removing 3.6 Fix version. If I'll make it by the release, I'll put it back. Use ReferenceManager in DirectoryTaxonomyReader --- Key: LUCENE-3793 URL: https://issues.apache.org/jira/browse/LUCENE-3793 Project: Lucene - Java Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 4.0 DirTaxoReader uses hairy code to protect its indexReader instance from being modified while threads use it. It maintains a ReentrantLock (indexReaderLock) which is obtained on every 'read' access, while refresh() locks it for 'write' operations (refreshing the IndexReader). Instead of all that, now that we have ReferenceManager in place, I think that we can write a ReaderManagerIndexReader which will be used by DirTR. Every method that requires access to the indexReader will acquire/release (not too different than obtaining/releasing the read lock), and refresh() will call ReaderManager.maybeRefresh(). It will simplify the code and remove some rather long comments, that go into great length explaining why does the code looks like that. This ReaderManager cannot be used for every IndexReader, because DirTR's refresh() logic is special -- it reopens the indexReader, and then verifies that the createTime still matches on the reopened reader as well. Otherwise, it closes the reopened reader and fails with an exception. Therefore, this ReaderManager.refreshIfNeeded will need to take the createTime into consideration and fail if they do not match. And while we're at it ... I wonder if we should have a manager for an IndexReader/ParentArray pair? I think that it makes sense because we don't want DirTR to use a ParentArray that does not match the IndexReader. Today this can happen in refresh() if e.g. after the indexReader instance has been replaced, parentArray.refresh(indexReader) fails. DirTR will be left with a newer IndexReader instance, but old (or worse, corrupt?) ParentArray ... I think it'll be good if we introduce clone() on ParentArray, or a new ctor which takes an int[]. I'll work on a patch once I finish with LUCENE-3786. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3138) IW.addIndexes should fail fast if an index is too old/new
[ https://issues.apache.org/jira/browse/LUCENE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3138: --- Fix Version/s: (was: 3.6) Removing 3.6 Fix Version. IW.addIndexes should fail fast if an index is too old/new - Key: LUCENE-3138 URL: https://issues.apache.org/jira/browse/LUCENE-3138 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Fix For: 4.0 Today IW.addIndexes (both Dir and IR versions) do not check the format of the incoming indexes. Therefore it could add a too old/new index and the app will discover that only later, maybe during commit() or segment merges. We should check that up front and fail fast. This issue is relevant only to 4.0 at the moment, which will not support 2.x indexes anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2921) Now that we track the code version at the segment level, we can stop tracking it also in each file level
[ https://issues.apache.org/jira/browse/LUCENE-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2921: --- Fix Version/s: (was: 3.6) Removing 3.6 version. Since this requires an index format change, I think that it'd be good if we can resolve it by 4.0 Alpha. Now that we track the code version at the segment level, we can stop tracking it also in each file level Key: LUCENE-2921 URL: https://issues.apache.org/jira/browse/LUCENE-2921 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shai Erera Fix For: 4.0 Now that we track the code version that created the segment at the segment level, we can stop tracking versions in each file. This has several major benefits: # Today the constant names that use to track versions are confusing - they do not state since which version it applies to, and so it's harder to determine which formats we can stop supporting when working on the next major release. # Those format numbers are usually negative, but in some cases positive (inconsistency) -- we need to remember to increase it one down for the negative ones, which I always find confusing. # It will remove the format tracking from all the *Writers, and the *Reader will receive the code format (String) and work w/ the appropriate constant (e.g. Constants.LUCENE_30). Centralizing version tracking to SegmentInfo is an advantage IMO. It's not urgent that we do it for 3.1 (though it requires an index format change), because starting from 3.1 all segments track their version number anyway (or migrated to track it), so we can safely release it in follow-on 3x release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3794) DirectoryTaxonomyWriter can lose the INDEX_CREATE_TIME property, causing DirTaxoReader.refresh() to falsely succeed (or fail)
[ https://issues.apache.org/jira/browse/LUCENE-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3794: --- Attachment: LUCENE-3794.patch Patch fixes the bug + adds a couple of test cases to ensure the correct behavior. DirectoryTaxonomyWriter can lose the INDEX_CREATE_TIME property, causing DirTaxoReader.refresh() to falsely succeed (or fail) - Key: LUCENE-3794 URL: https://issues.apache.org/jira/browse/LUCENE-3794 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.6, 4.0 Attachments: LUCENE-3794.patch DirTaxoWriter sets createTime to null after it put it in the commit data once. But that's wrong because if one calls commit(Map) twice, the second time doesn't record the creation time. Also, in the ctor, if an index exists and OpenMode is not CREATE, the creation time property is not read. I wrote a couple of unit tests that assert this, and modified DirTaxoWriter to always record the creation time (in every commit) -- that's the only safe way. Will upload a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3761) Generalize SearcherManager
[ https://issues.apache.org/jira/browse/LUCENE-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3761: --- Attachment: LUCENE-3761.patch Updated patch: - ThingyManager renamed to ReferenceManager - Declared 'current' volatile (thanks Simon!) - Added two tests to TestSM. While they could be under a TestReferenceManager new class, I didn't think that creating another class + a ReferenceManager extension is worth it. - Added a CHANGES entry under back-compat (following maybeReopen to maybeRefresh). If nobody objects, I'd like to rename maybeRefresh to just refresh, and commit it. Otherwise, I'll commit what I have. I've decided to deal with the SearcherTaxoManager in a different issue. Generalize SearcherManager -- Key: LUCENE-3761 URL: https://issues.apache.org/jira/browse/LUCENE-3761 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3761.patch, LUCENE-3761.patch, LUCENE-3761.patch I'd like to generalize SearcherManager to a class which can manage instances of a certain type of interfaces. The reason is that today SearcherManager knows how to handle IndexSearcher instances. I have a SearcherManager which manages a pair of IndexSearcher and TaxonomyReader pair. Recently, few concurrency bugs were fixed in SearcherManager, and I realized that I need to apply them to my version as well. Which led me to think why can't we have an SM version which is generic enough so that both my version and Lucene's can benefit from? The way I see SearcherManager, it can be divided into two parts: (1) the part that manages the logic of acquire/release/maybeReopen (i.e., ensureOpen, protect from concurrency stuff etc.), and (2) the part which handles IndexSearcher, or my SearcherTaxoPair. I'm thinking that if we'll have an interface with incRef/decRef/tryIncRef/maybeRefresh, we can make SearcherManager a generic class which handles this interface. I will post a patch with the initial idea, and we can continue from there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3761) Generalize SearcherManager
[ https://issues.apache.org/jira/browse/LUCENE-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3761: --- Attachment: LUCENE-3761.patch Option #2: ThingyManagerG is an abstract class which implements all the concurrency administration and exposes the abstract methods tryIncRef(), decRef() and refreshIfNeeded(). SearcherManager now extends ThingyManagerIndexSearcher and implements just these 3 methods (in addition to isSearcherCurrent()). What I like about this approach is that SearcherManager remains a concrete class, so that code can reference it and not ThingyManager. Also, IMO it's a simplified impl vs. the composite ThingyManager/Thingy. AND besides the rename of maybeReopen to maybeRefresh, NONE of the code was affected by this refactoring. I've left the unneeded code as commented out in SearcherManager for easy comparison, but it should go away. TestSM passes (as well as all core tests), so I think that ThingyManager handles all concurrency cases as SearcherManager. However, it could use another inspecting eye :). As for the name -- now the name is less important b/c I don't think we'll reference ThingyManagers. I lean towards something like ReferenceManager / RefCountManager or remove Manager. Something simple. Suggestions are welcome. Generalize SearcherManager -- Key: LUCENE-3761 URL: https://issues.apache.org/jira/browse/LUCENE-3761 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3761.patch, LUCENE-3761.patch I'd like to generalize SearcherManager to a class which can manage instances of a certain type of interfaces. The reason is that today SearcherManager knows how to handle IndexSearcher instances. I have a SearcherManager which manages a pair of IndexSearcher and TaxonomyReader pair. Recently, few concurrency bugs were fixed in SearcherManager, and I realized that I need to apply them to my version as well. Which led me to think why can't we have an SM version which is generic enough so that both my version and Lucene's can benefit from? The way I see SearcherManager, it can be divided into two parts: (1) the part that manages the logic of acquire/release/maybeReopen (i.e., ensureOpen, protect from concurrency stuff etc.), and (2) the part which handles IndexSearcher, or my SearcherTaxoPair. I'm thinking that if we'll have an interface with incRef/decRef/tryIncRef/maybeRefresh, we can make SearcherManager a generic class which handles this interface. I will post a patch with the initial idea, and we can continue from there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3761) Generalize SearcherManager
[ https://issues.apache.org/jira/browse/LUCENE-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3761: --- Attachment: LUCENE-3761.patch Initial patch. Introduces a new package 'thingy' (a temporary one, this will eventually move to o.a.l.search) with the class ThingyManager, a Thingy interface and a SearcherThingy implementation. As far as I can tell (if there are no bugs), this can replace SearcherManager as-is, aside from a 'nocommit' which I know how to handle, but didn't get to it yet. The approach is that ThingyManager receives a ThingyG instance and delegates calls to it. Robert and I discussed another approach - have ThingyManager abstract with a concrete (final) SearcherManager impl which overrides methods like incRef/decRef etc. I still didn't try to impl that approach, I think that I'll give it a try, later. Oh, and BTW, ThingyManager (even though a cool name !) will not be its final name ! :). It's just easier to progress like that, without thinking too much about the name. Generalize SearcherManager -- Key: LUCENE-3761 URL: https://issues.apache.org/jira/browse/LUCENE-3761 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3761.patch I'd like to generalize SearcherManager to a class which can manage instances of a certain type of interfaces. The reason is that today SearcherManager knows how to handle IndexSearcher instances. I have a SearcherManager which manages a pair of IndexSearcher and TaxonomyReader pair. Recently, few concurrency bugs were fixed in SearcherManager, and I realized that I need to apply them to my version as well. Which led me to think why can't we have an SM version which is generic enough so that both my version and Lucene's can benefit from? The way I see SearcherManager, it can be divided into two parts: (1) the part that manages the logic of acquire/release/maybeReopen (i.e., ensureOpen, protect from concurrency stuff etc.), and (2) the part which handles IndexSearcher, or my SearcherTaxoPair. I'm thinking that if we'll have an interface with incRef/decRef/tryIncRef/maybeRefresh, we can make SearcherManager a generic class which handles this interface. I will post a patch with the initial idea, and we can continue from there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3703) DirectoryTaxonomyReader.refresh misbehaves with ref counts
[ https://issues.apache.org/jira/browse/LUCENE-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3703: --- Attachment: LUCENE-3703.patch Patch addresses Doron's comments. DirectoryTaxonomyReader.refresh misbehaves with ref counts -- Key: LUCENE-3703 URL: https://issues.apache.org/jira/browse/LUCENE-3703 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.6, 4.0 Attachments: LUCENE-3703.patch, LUCENE-3703.patch DirectoryTaxonomyReader uses the internal IndexReader in order to track its own reference counting. However, when you call refresh(), it reopens the internal IndexReader, and from that point, all previous reference counting gets lost (since the new IndexReader's refCount is 1). The solution is to track reference counting in DTR itself. I wrote a simple unit test which exposes the bug (will be attached with the patch shortly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3703) DirectoryTaxonomyReader.refresh misbehaves with ref counts
[ https://issues.apache.org/jira/browse/LUCENE-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3703: --- Attachment: LUCENE-3703.patch Patch fixes the bug by moving to track reference count by DTR. Also, added a test which covers that bug. On the go, fixed close() to synchronize on this if the instance is not already closed. Otherwise, two threads that call close() concurrently might fail (one of them) in decRef(). I think it's ready to commit, will wait until tomorrow for review. DirectoryTaxonomyReader.refresh misbehaves with ref counts -- Key: LUCENE-3703 URL: https://issues.apache.org/jira/browse/LUCENE-3703 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.6, 4.0 Attachments: LUCENE-3703.patch DirectoryTaxonomyReader uses the internal IndexReader in order to track its own reference counting. However, when you call refresh(), it reopens the internal IndexReader, and from that point, all previous reference counting gets lost (since the new IndexReader's refCount is 1). The solution is to track reference counting in DTR itself. I wrote a simple unit test which exposes the bug (will be attached with the patch shortly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3649) Facet userguide link is broken after ant javadocs-all
[ https://issues.apache.org/jira/browse/LUCENE-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3649: --- Attachment: LUCENE-3649.patch Patch against 3x: * Move docs/ under src/java/org/apache/lucene/facet/doc-files -- that way the javadocs tool takes these files as they are * Fix references to the userguide in overview.html and o.a.l.facet/package.html. * Remove 'javadocs' target from facet/build.xml. I will commit this shortly. Facet userguide link is broken after ant javadocs-all --- Key: LUCENE-3649 URL: https://issues.apache.org/jira/browse/LUCENE-3649 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.6, 4.0 Attachments: LUCENE-3649.patch, LUCENE-3649.patch Spinoff from http://mail-archives.apache.org/mod_mbox/lucene-java-user/201112.mbox/%3CCAO9cvUaZePZ3faWo==xx7x8-5+snwlsbdqqjo_n-ycxr0lj...@mail.gmail.com%3E. When javadocs-all is run, the userguide is not copied at all, and therefore the link is broken. Two options: inline the userguide in package/overview.html or fix the Ant target to copy the userguide correctly. Thanks Lukas for reporting this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3637) Make IndexReader.decRef() call refCount.decrementAndGet instead of getAndDecrement
[ https://issues.apache.org/jira/browse/LUCENE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3637: --- Attachment: LUCENE-3637.patch Very trivial patch. If there are no objections, I'd like to commit this. Make IndexReader.decRef() call refCount.decrementAndGet instead of getAndDecrement -- Key: LUCENE-3637 URL: https://issues.apache.org/jira/browse/LUCENE-3637 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Shai Erera Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3637.patch IndexReader.decRef() has this code: {code} final int rc = refCount.getAndDecrement(); if (rc == 1) { {code} I think it will be clearer if it was written like this: {code} final int rc = refCount.decrementAndGet(); if (rc == 0) { {code} It's a very simple change, which makes reading the code (at least IMO) easier. Will post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3635) Allow setting arbitrary objects on PerfRunData
[ https://issues.apache.org/jira/browse/LUCENE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3635: --- Attachment: LUCENE-3635.patch Patch (against trunk) adds perfObjects MapString, Object with matching set/get methods. Allow setting arbitrary objects on PerfRunData -- Key: LUCENE-3635 URL: https://issues.apache.org/jira/browse/LUCENE-3635 Project: Lucene - Java Issue Type: Improvement Components: modules/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3635.patch PerfRunData is used as the intermediary objects between PerfRunTasks. Just like we can set IndexReader/Writer on it, it will be good if it allows setting other arbitrary objects that are e.g. created by one task and used by another. A recent example is the enhancement to the benchmark package following the addition of the facet module. We had to add TaxoReader/Writer. The proposal is to add a HashMapString, Object that custom PerfTasks can set()/get(). I do not propose to move IR/IW/TR/TW etc. into that map. If however people think that we should, I can do that as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3620) FilterIndexReader does not override all of IndexReader methods
[ https://issues.apache.org/jira/browse/LUCENE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3620: --- Attachment: LUCENE-3620-trunk.patch Patch adds the test to TestFilterIndexReader. Uwe asked that I do not commit these changes (test + FIR/IR fixes) until he merges in the branch on IR-read-only. We decided that Uwe will apply that patch to the branch, fix FIR/IR there and merge the branch afterwards. FilterIndexReader does not override all of IndexReader methods -- Key: LUCENE-3620 URL: https://issues.apache.org/jira/browse/LUCENE-3620 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3620-trunk.patch, LUCENE-3620.patch, LUCENE-3620.patch, LUCENE-3620.patch FilterIndexReader does not override all of IndexReader methods. We've hit an error in LUCENE-3573 (and fixed it). So I thought to write a simple test which asserts that FIR overrides all methods of IR (and we can filter our methods that we don't think that it should override). The test is very simple (attached), and it currently fails over these methods: {code} getRefCount incRef tryIncRef decRef reopen reopen reopen reopen clone numDeletedDocs document setNorm setNorm termPositions deleteDocument deleteDocuments undeleteAll getIndexCommit getUniqueTermCount getTermInfosIndexDivisor {code} I didn't yet fix anything in FIR -- if you spot a method that you think we should not override and delegate, please comment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3620) FilterIndexReader does not override all of IndexReader methods
[ https://issues.apache.org/jira/browse/LUCENE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3620: --- Attachment: LUCENE-3620.patch Attached patch against 3x adds the test to TestFilterIndexReader. Even if there are methods which you don't think need to be overridden by FIR, I prefer that we still override them and call super.(), with a comment why we don't delegate. FilterIndexReader does not override all of IndexReader methods -- Key: LUCENE-3620 URL: https://issues.apache.org/jira/browse/LUCENE-3620 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3620.patch FilterIndexReader does not override all of IndexReader methods. We've hit an error in LUCENE-3573 (and fixed it). So I thought to write a simple test which asserts that FIR overrides all methods of IR (and we can filter our methods that we don't think that it should override). The test is very simple (attached), and it currently fails over these methods: {code} getRefCount incRef tryIncRef decRef reopen reopen reopen reopen clone numDeletedDocs document setNorm setNorm termPositions deleteDocument deleteDocuments undeleteAll getIndexCommit getUniqueTermCount getTermInfosIndexDivisor {code} I didn't yet fix anything in FIR -- if you spot a method that you think we should not override and delegate, please comment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3620) FilterIndexReader does not override all of IndexReader methods
[ https://issues.apache.org/jira/browse/LUCENE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3620: --- Attachment: LUCENE-3620.patch Patch against 3x: * Adds a HashSet of methods that should not be overridden by FilterIndexReader. ** If a method appears there and is not overridden, it is an error. ** If a method appears there and is overridden, it is an error as well. * Override more methods by FIR. see previous comment for the rest of the methods. FilterIndexReader does not override all of IndexReader methods -- Key: LUCENE-3620 URL: https://issues.apache.org/jira/browse/LUCENE-3620 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3620.patch, LUCENE-3620.patch FilterIndexReader does not override all of IndexReader methods. We've hit an error in LUCENE-3573 (and fixed it). So I thought to write a simple test which asserts that FIR overrides all methods of IR (and we can filter our methods that we don't think that it should override). The test is very simple (attached), and it currently fails over these methods: {code} getRefCount incRef tryIncRef decRef reopen reopen reopen reopen clone numDeletedDocs document setNorm setNorm termPositions deleteDocument deleteDocuments undeleteAll getIndexCommit getUniqueTermCount getTermInfosIndexDivisor {code} I didn't yet fix anything in FIR -- if you spot a method that you think we should not override and delegate, please comment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3603) jar-src fails if ${build.dir} does not exist
[ https://issues.apache.org/jira/browse/LUCENE-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3603: --- Attachment: LUCENE-3603.patch Patch fixes jar-src to: * not depend on init, as there's no need to compile anything (saves time) * create ${build.dir} I've decided not to modify the build.dir definitions in the other build.xmls for now, as it's more delicate. I intend to commit this soon. jar-src fails if ${build.dir} does not exist Key: LUCENE-3603 URL: https://issues.apache.org/jira/browse/LUCENE-3603 Project: Lucene - Java Issue Type: Improvement Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3603.patch Simple fix -- make jar-src depend on a target which creates the build.dir. Also, I noticed that build.dir is set in multiple places across our build.xmls, so I'd like to improve that a bit (minor fixes as well). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3583) benchmark tests always fail on windows because directory cannot be removed
[ https://issues.apache.org/jira/browse/LUCENE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3583: --- Attachment: LUCENE-3583.patch Patch fixes the problem in LineDocSourceTest - add tasks.close() (otherwise LDS keeps a reader open on the file). I intend to commit shortly, after verifying all tests pass and no other such changes are required. benchmark tests always fail on windows because directory cannot be removed -- Key: LUCENE-3583 URL: https://issues.apache.org/jira/browse/LUCENE-3583 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.5, 4.0 Environment: Only fails for Lucene trunk Reporter: Uwe Schindler Attachments: LUCENE-3583.patch, LUCENE-3583.patch, benchmark-test-output.txt, io-event-log.txt This seems to be a bug recently introduced. I have no idea what's wrong. Attached is a log file, reproduces everytime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3269: --- Attachment: LUCENE-3269.patch Patch introduces the following: * HashMapInteger, SearchTaxoDirPair which is initialized in beforeClass and maps a partition size to the pair of Directories. * initIndex first checks the map for the partition size, and creates the indexes only if no matching pair is found. The sampling tests do not benefit from that directly, as they only run one test method, however, if they will run in the same JVM, then they will reuse the already created indexes. Patch is against 3x and seems trivial, so I intend to commit this later today or tomorrow if there are no objections. Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3556) Make DirectoryTaxonomyWriter's indexWriter member private
[ https://issues.apache.org/jira/browse/LUCENE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3556: --- Attachment: LUCENE-3556.patch Trivial patch against trunk. I'd like to commit this shortly. Make DirectoryTaxonomyWriter's indexWriter member private - Key: LUCENE-3556 URL: https://issues.apache.org/jira/browse/LUCENE-3556 Project: Lucene - Java Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.5, 4.0 Attachments: LUCENE-3556.patch DirectoryTaxonomyWriter has a protected indexWriter member. As far as I can tell, for two reasons: # protected openIndexWriter method which lets you open your own IW (e.g. with a custom IndexWriterConfig). # protected closeIndexWriter which is a hook for letting you close the IW you opened in the previous one. The fixes are trivial IMO: # Modify the method to return IW, and have the calling code set DTW's indexWriter member # Eliminate closeIW. DTW already has a protected closeResources() which lets you clean other resources you've allocated, so I think that's enough. I'll post a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3552) TaxonomyReader/Writer and their Lucene* implementation
[ https://issues.apache.org/jira/browse/LUCENE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3552: --- Attachment: LUCENE-3552.patch Patch renames LTW/R to DirectoryTW/TR. Also, I renamed LTW's openLuceneIndex/closeLuceneIndex to open/closeIndexWriter. I've also made TW extend TwoPhaseCommit. I think that it's ready to commit. I'll port the changes to trunk afterwards. I'll wait until tomorrow before I commit (the changes are trivial). TaxonomyReader/Writer and their Lucene* implementation -- Key: LUCENE-3552 URL: https://issues.apache.org/jira/browse/LUCENE-3552 Project: Lucene - Java Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.5, 4.0 Attachments: LUCENE-3552.patch The facet module contains two interfaces TaxonomyWriter and TaxonomyReader, with two implementations Lucene*. We've never actually implemented two TaxonomyWriters/Readers, so I'm not sure if these interfaces are useful anymore. Therefore I'd like to propose that we do either of the following: # Remove the interfaces and remove the Lucene part from the implementation classes (to end up with TW/TR impls). Or, # Keep the interfaces, but rename the Lucene* impls to Directory*. Whatever we do, I'd like to make the impls/interfaces impl also TwoPhaseCommit. Any preferences? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3549) Remove DocumentBuilder interface from facet module
[ https://issues.apache.org/jira/browse/LUCENE-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3549: --- Attachment: LUCENE-3549.patch Patch against 3x (but easy to apply on trunk as well). I will commit this soon. Remove DocumentBuilder interface from facet module -- Key: LUCENE-3549 URL: https://issues.apache.org/jira/browse/LUCENE-3549 Project: Lucene - Java Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.5, 4.0 Attachments: LUCENE-3549.patch The facet module contains an interface called DocumentBuilder, which contains a single method, build(Document) (it's a builder API). We use it in my company to standardize how different modules populate a Document object. I've included it with the facet contribution so that things will compile with as few code changes as possible. Now it's time to do some cleanup and I'd like to start with this interface. If people think that this interface is useful to reside in 'core', then I don't mind moving it there. But otherwise, let's remove it from the code. It has only one impl in the facet module: CategoryDocumentBuilder, and we can certainly do without the interface. More so, it's under o.a.l package which is inappropriate IMO. If it's moved to 'core', it should be under o.a.l.document. If people see any problem with that, please speak up. I will do the changes and post a patch here shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3522) TermsFilter.getDocIdSet(context) NPE on missing field
[ https://issues.apache.org/jira/browse/LUCENE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3522: --- Fix Version/s: 3.5 Added 3.5 as a fix version as well TermsFilter.getDocIdSet(context) NPE on missing field - Key: LUCENE-3522 URL: https://issues.apache.org/jira/browse/LUCENE-3522 Project: Lucene - Java Issue Type: Bug Components: modules/other Affects Versions: 4.0 Reporter: Dan Climan Assignee: Michael McCandless Priority: Minor Fix For: 3.5, 4.0 Attachments: LUCENE-3522.patch If the context does not contain the field for a term when calling TermsFilter.getDocIdSet(AtomicReaderContext context) then a NullPointerException is thrown due to not checking for null Terms before getting iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3522) TermsFilter.getDocIdSet(context) NPE on missing field
[ https://issues.apache.org/jira/browse/LUCENE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3522: --- Fix Version/s: (was: 3.5) Ah. I thought that we need the Fix Version to properly track which issues are part of a release. But you're right - if this bug didn't exist in 3.x, then we better not mark that it was fixed there. TermsFilter.getDocIdSet(context) NPE on missing field - Key: LUCENE-3522 URL: https://issues.apache.org/jira/browse/LUCENE-3522 Project: Lucene - Java Issue Type: Bug Components: modules/other Affects Versions: 4.0 Reporter: Dan Climan Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-3522.patch If the context does not contain the field for a term when calling TermsFilter.getDocIdSet(AtomicReaderContext context) then a NullPointerException is thrown due to not checking for null Terms before getting iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3485) LuceneTaxonomyReader .decRef() may close the inner IR, renderring the LTR in a limbo.
[ https://issues.apache.org/jira/browse/LUCENE-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3485: --- Lucene Fields: New,Patch Available (was: New) Fix Version/s: 4.0 3.5 LuceneTaxonomyReader .decRef() may close the inner IR, renderring the LTR in a limbo. - Key: LUCENE-3485 URL: https://issues.apache.org/jira/browse/LUCENE-3485 Project: Lucene - Java Issue Type: Bug Components: modules/facet Affects Versions: 3.4 Reporter: Gilad Barkai Assignee: Shai Erera Priority: Minor Fix For: 3.5, 4.0 Attachments: LUCENE-3485.patch TaxonomyReader which supports ref-counting, has a decRef() method which delegates to an inner IndexReader and calls its .decRef(). The latter may close the reader (if the ref is zeroes) but the taxonomy would remain 'open' which will fail many of its method calls. Also, the LTR's .close() method does not work in the same manner as IndexReader's - which calls decRef(), and leaves the real closing logic to the decRef(). I believe this should be the right approach for the fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org