[jira] [Created] (LUCENENET-500) Lucene fails to run in medium trust ASP.NET Application
Christopher Currens created LUCENENET-500: - Summary: Lucene fails to run in medium trust ASP.NET Application Key: LUCENENET-500 URL: https://issues.apache.org/jira/browse/LUCENENET-500 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 3.0.3 Reporter: Simon Svensson Assignee: Christopher Currens Fix For: Lucene.Net 3.0.3 I'm having trouble upgrading a web application running under medium trust from 2.9.4 to 3.0.3. Code that previously worked now throws a SecurityException. [SecurityException: Request for the permission of type 'System.Security.Permissions.SecurityPermission, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed.] Lucene.Net.Support.WeakKey`1..ctor(T key) +0 Lucene.Net.Support.WeakDictionary`2.get_Item(TKey key) +113 Lucene.Net.Util.DefaultAttributeFactory.GetClassForInterface() +178 Lucene.Net.Util.DefaultAttributeFactory.CreateAttributeInstance() +95 Lucene.Net.Util.AttributeSource.AddAttribute() +375 Lucene.Net.Analysis.CharTokenizer..ctor(TextReader input) +126 Lucene.Net.Analysis.WhitespaceTokenizer..ctor(TextReader in) +37 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-500) Lucene fails to run in medium trust ASP.NET Application
[ https://issues.apache.org/jira/browse/LUCENENET-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-500. --- Resolution: Fixed Issue was in the WeakKeyT class in WeakDictionary.cs. A generic wrapper for WeakReference was being used but was inheriting from WeakReference which requires UnmanagedCode security permissions. Removing the wrapper and doing casting instead fixes the permissions issue. Lucene fails to run in medium trust ASP.NET Application --- Key: LUCENENET-500 URL: https://issues.apache.org/jira/browse/LUCENENET-500 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 3.0.3 Reporter: Simon Svensson Assignee: Christopher Currens Fix For: Lucene.Net 3.0.3 I'm having trouble upgrading a web application running under medium trust from 2.9.4 to 3.0.3. Code that previously worked now throws a SecurityException. [SecurityException: Request for the permission of type 'System.Security.Permissions.SecurityPermission, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed.] Lucene.Net.Support.WeakKey`1..ctor(T key) +0 Lucene.Net.Support.WeakDictionary`2.get_Item(TKey key) +113 Lucene.Net.Util.DefaultAttributeFactory.GetClassForInterface() +178 Lucene.Net.Util.DefaultAttributeFactory.CreateAttributeInstance() +95 Lucene.Net.Util.AttributeSource.AddAttribute() +375 Lucene.Net.Analysis.CharTokenizer..ctor(TextReader input) +126 Lucene.Net.Analysis.WhitespaceTokenizer..ctor(TextReader in) +37 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-480. --- Resolution: Fixed This should be fixed now, there was an issue with the test not running properly in the CI. Now that issue fixed, I think it's working properly now, although the CI server is still occasionally failing (for different reasons)...a random IPC error was the last failure I say for the poll-changes build. The nightly ran successfully last night. Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: SortedSet.cs We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417213#comment-13417213 ] Christopher Currens commented on LUCENENET-421: --- Can you reproduce the issue in 2.9.4 or trunk? Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415388#comment-13415388 ] Christopher Currens commented on LUCENENET-421: --- Jackie Wong, are you using 2.9.2 as well? Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-480: -- Fix Version/s: Lucene.Net 3.0.3 I'm merging it back into trunk now. I just have to make sure all the unit tests still pass and that it can successfully be built. I also need to write up a little page on the wiki explaining how it works and how to add new projects that can also support multi-framework targeting. Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: SortedSet.cs We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-337) TokenAttribute for Selectively Including Tokens in Length Norm
[ https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-337: -- Affects Version/s: (was: Lucene.Net 2.9.2) Fix Version/s: (was: Lucene.Net 3.0.3) Lucene.Net 3.6 Moving to 3.6. We need to evaluate community desire for this patch. TokenAttribute for Selectively Including Tokens in Length Norm -- Key: LUCENENET-337 URL: https://issues.apache.org/jira/browse/LUCENENET-337 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Reporter: Michael Garski Priority: Minor Fix For: Lucene.Net 3.6 Attachments: LengthNorm.patch This patch adds functionality to Lucene.Net that allow a TokenFilter to mark a Token as not to be included in the length norm calculation through the use of a new TokenAttribute interface LengthNormAttribute and a corresponding implementation LengthNormAttributeImpl. This functionality is useful to prevent the increase of the length norm during synonym injection, particularly in cases where there are a large number of synonyms in relation to the number of original tokens. Following is an example of how to use the new attribute. Within your custom TokenFilter, define a field to persist a reference to the attribute and set it's value in the constructor. When a the stream advances to a new Token within the call to IncrementToken() the value of the IncludeInLengthNorm property of the attribute is set to false for Tokens which should not be included in the length norm calculation. It defaults to true and is reset to true after each Token is consumed within DocInverterPerField.ProcessFields. {code:title=CustomTokenFilter.cs|borderStyle=solid} public class CustomTokenFilter : TokenFilter { private LengthNormAttribute lnAttribute; public CustomTokenFilter(TokenStream input) : base(input) { this.lnAttribute = (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute)); } public override bool IncrementToken() { if (input.IncrementToken()) { // make determination that the token is not to be // included in the length norm value // this example marks all tokens to not be // included in the length norm value this.lnAttribute.IncludeInLengthNorm = false; return true; } else { return false; } } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-484) Some possibly major tests intermittently fail
[ https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-484. --- Resolution: Fixed Thanks for your help with all of these Luc. Thanks to your hard work, this issue is finally closed, and for the first time in a long time, the whole test suite should consistently pass! Some possibly major tests intermittently fail -- Key: LUCENENET-484 URL: https://issues.apache.org/jira/browse/LUCENENET-484 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Environment: All Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-484-FieldCacheImpl.patch, Lucenenet-484-NativeFSLockFactory.patch, Lucenenet-484-WeakDictionary.patch, Lucenenet-484-WeakDictionaryTests.patch These tests will fail intermittently in Debug or Release mode, in the core test suite: # Lucene.Net.Index: #- TestConcurrentMergeScheduler.TestFlushExceptions -- *FIXED* # Lucene.Net.Store: #- TestLockFactory.TestStressLocks -- *FIXED* # Lucene.Net.Search: #- TestSort.TestParallelMultiSort -- *FIXED* # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 -- *FIXED* #- TestFieldCacheSanityChecker.TestInsanity2 -- *FIXED* # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test -- *FIXED* TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-484) Some possibly major tests intermittently fail
[ https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-484: -- Description: These tests will fail intermittently in Debug or Release mode, in the core test suite: # Lucene.Net.Index: #- TestConcurrentMergeScheduler.TestFlushExceptions -- *FIXED* # Lucene.Net.Store: #- TestLockFactory.TestStressLocks -- *FIXED* # Lucene.Net.Search: #- TestSort.TestParallelMultiSort -- *FIXED* # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 -- *FIXED* #- TestFieldCacheSanityChecker.TestInsanity2 -- *FIXED* # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test -- *FIXED* TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. was: These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # -Lucene.Net.Util:- #- -TestFieldCacheSanityChecker.TestInsanity1- #- -TestFieldCacheSanityChecker.TestInsanity2- #- -(It's possible all of the insanity tests fail at one point or another)- # -Lucene.Net.Support- #- -TestWeakHashTableMultiThreadAccess.Test- TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. Some possibly major tests intermittently fail -- Key: LUCENENET-484 URL: https://issues.apache.org/jira/browse/LUCENENET-484 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Environment: All Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-484-FieldCacheImpl.patch, Lucenenet-484-NativeFSLockFactory.patch, Lucenenet-484-WeakDictionary.patch, Lucenenet-484-WeakDictionaryTests.patch These tests will fail intermittently in Debug or Release mode, in the core test suite: # Lucene.Net.Index: #- TestConcurrentMergeScheduler.TestFlushExceptions -- *FIXED* # Lucene.Net.Store: #- TestLockFactory.TestStressLocks -- *FIXED* # Lucene.Net.Search: #- TestSort.TestParallelMultiSort -- *FIXED* # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 -- *FIXED* #- TestFieldCacheSanityChecker.TestInsanity2 -- *FIXED* # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test -- *FIXED* TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-493) Make lucene.net culture insensitive (like the java version)
[ https://issues.apache.org/jira/browse/LUCENENET-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-493. --- Resolution: Fixed This should be fixed now. Added a test to LocalizedTestCase (different from the patch above) so that it will run some or all tests (if no specific tests were selected) under all installed cultures on the machine. This should be the same behavior as Java, except for that it is running all tests in one, instead of individually. However, when a test fails, it will output which test failed and which culture it failed it. I discovered issues in DateTools.cs in the ar culture, where DateTimeToString was returning culture specific formatting. I think we can resolve this now that we can confirm that the tests pass. If any future culture-sensitive bug appear, new issues can be created, and then specific tests can be written to check for those issues. Make lucene.net culture insensitive (like the java version) --- Key: LUCENENET-493 URL: https://issues.apache.org/jira/browse/LUCENENET-493 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Reporter: Luc Vanlerberghe Labels: patch Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-493.patch, UpdatedLocalizedTestCase.patch In Java, conversion of the basic types to and from strings is locale (culture) independent. For localized input/output one needs to use the classes in the java.text package. In .Net, conversion of the basic types to and from strings depends on the default Culture. Otherwise you have to specify CultureInfo.InvariantCulture explicitly. Some of the testcases in lucene.net fail if they are not run on a machine with culture set to US. In the current version of lucene.net there are patches here and there that try to correct for some specific cases by using string replacement (like System.Double.Parse(s.Replace(., CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)), but that seems really ugly. I submit a patch here that removes the old workarounds and replaces them by calls to classes in the Lucene.Net.Support namespace that try to handle the conversions in a compatible way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-436) Refactor Deprecated Code inside of tests
[ https://issues.apache.org/jira/browse/LUCENENET-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-436: -- Affects Version/s: Lucene.Net 3.0.3 Fix Version/s: (was: Lucene.Net 3.0.3) Lucene.Net 3.5 Refactor Deprecated Code inside of tests - Key: LUCENENET-436 URL: https://issues.apache.org/jira/browse/LUCENENET-436 Project: Lucene.Net Issue Type: Sub-task Components: Lucene.Net Test Affects Versions: Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: michael herndon Labels: refactoring, testing, Fix For: Lucene.Net 3.5 * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. This is one place we should probably deviate from the parent project and make sure that any deprecated code gets isolated to the tests designed only for the deprecated methods and then use the newer API through out the testsuite. This should help move the project forward and remove deprecated API's when the time comes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-434) Remove AnonymousXXXX classes to increase readablity
[ https://issues.apache.org/jira/browse/LUCENENET-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-434: -- Fix Version/s: (was: Lucene.Net 3.0.3) Lucene.Net 3.5 Moving to 3.5. As ugly as they are, they don't hurt anything except our eyes leaving them in there. As we port to 3.5, we can remove these as much as we can. Remove Anonymous classes to increase readablity --- Key: LUCENENET-434 URL: https://issues.apache.org/jira/browse/LUCENENET-434 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Scott Lombard Assignee: Scott Lombard Priority: Minor Fix For: Lucene.Net 3.5 Attachments: TeeSinkTokenFilter.patch Original Estimate: 168h Time Spent: 13h Remaining Estimate: 155h Replace Anonymous classes inhereted from JLCA which make the code impossible to read. Follow Digy's template to replace the single abstract method with Func or Action like in FilterCacheT from: protected abstract object MergeDeletes(IndexReader reader, object value); to: FuncIndexReader, object, object MergeDeletes; Determine a solution to the classes with more than 1 abstract method without diverging much from Java. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-467) .NETify the public API where appropriate
[ https://issues.apache.org/jira/browse/LUCENENET-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-467: -- Fix Version/s: (was: Lucene.Net 3.0.3) Lucene.Net 3.5 Moving to 3.5 .NETify the public API where appropriate Key: LUCENENET-467 URL: https://issues.apache.org/jira/browse/LUCENENET-467 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib, Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Environment: all Reporter: Christopher Currens Labels: refactoring Fix For: Lucene.Net 3.5 Attachments: Lucenenet-467-create.patch Although we haven't abandoned the line-by-line port of Java lucene, there are many idioms in Java that make little to no sense in a .NET assembly. The API can change to allow for a conventional .NET experience, while still maintaining the ability and ease during the porting process of Java logic. * Change Getxxx() and Setxxx() methods to .NET Properties * Implement the [dispose pattern|http://msdn.microsoft.com/en-us/library/fs2xkftw.aspx] properly. Try, at all costs, to only use finalizers *when necessary*. They are expensive, and most of the classes used already have finalizers that will be called. * Convert Java Iterator-style classes (see TermEnum, TermDocs and others) to implement IEnumerableT * When catching exceptions, do not use *throw;* instead of *throw ex;* to maintain the stack trace -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397690#comment-13397690 ] Christopher Currens commented on LUCENENET-480: --- Regarding the sorted SortedSet implementation, I might consider using a {{SortedDictionaryT}} internally instead of a {{SortedListT}}. It's faster at removals and insertions, at the cost of a little more memory. I think in the cases where SortedSet is used in Lucene, it won't make much of a difference at all in memory usage, but could use the speed. Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens Attachments: SortedSet.cs We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397164#comment-13397164 ] Christopher Currens commented on LUCENENET-480: --- Change {{ISetT x = new HashSetT()}} to instantiate from a factory instead. So, it would become something like: {{ISetT x = Lucene.Net.Support.Compatibility.GetSetT()}}. In .NET 3.5 would return a {{WrappedHashSetT : Lucene.Net.Support.ISetT}} (which in turn just wraps a normal HashSetT). In .NET 4, it would just return a HashSetT. We would need to do the same with SortedSet, as well. Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens Attachments: SortedSet.cs We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations
[ https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-495. --- Resolution: Fixed Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations -- Key: LUCENENET-495 URL: https://issues.apache.org/jira/browse/LUCENENET-495 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Priority: Critical Fix For: Lucene.Net 3.0.3 This issue mostly just affects RAMDirectory. However, RAMFile and RAMOutputStream are used in other (all?) directory implementations, including FSDirectory types. In RAMOutputStream, the file last modified property for the RAMFile is updated when the stream is flushed. It's calculated using {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}. I've read before that Microsoft has regretted making DateTime.Now a property instead of a method, and after seeing what it's doing, I'm starting to understand why. DateTime.Now is returning local time. In order for it to calculate that, it has to get the utf offset for the machine, which requires the creation of a _class_, System.Globalization.DaylightTime. This is bad for performance. Using code to write 10,000 small documents to an index (4kb sizes), it created 1,570,157 of these DaylightTime classes, a total of 62MB of extra memory...clearly RAMOutputStream.Flush() is called a lot. A fix I'd like to propose is to change the RAMFile from storing the LastModified date to UTC instead of local. DateTime.UtcNow doesn't create any additional objects and is very fast. For this small benchmark, the performance increase is 31%. I've set it to convert to local-time, when {{RAMDirectory.LastModified(string name)}} is called to make sure it has the same behavior (tests fail otherwise). Are there any other side-effects to making this change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-493) Make lucene.net culture insensitive (like the java version)
[ https://issues.apache.org/jira/browse/LUCENENET-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-493: -- Attachment: UpdatedLocalizedTestCase.patch I've updated LocalizedTestCase, so that it will actually run against all installed cultures. The workaround is unfortunate...there is now only 1 test that does all localization checks. It's done more or less the same way that java does it, however, instead of being able to override the {{runBare()}} method, I've created a test method that will run all methods in all installed cultures. If a method were to fail, it would list the method name, and the culture it failed in. I've attached a patch that shows the solution, so if anyone has a better solution, we can discuss that and possibly use it instead. Interestingly enough, the tests that Java Lucene has set to test, don't actually fail when using the older code that doesn't have localization changes in it. However, when I added {{TestBoost}} to the list in {{TestQueryParser}}, that one did fail before the push that Simon did. So, it concerns me that we don't have enough tests written that actually will cause it to fail, when run as a localized test. So, what I propose we do before we apply Luc's patch, is to write tests that *will fail* when using as a LocalizedTestCase and then make sure his patch makes all of the tests pass. Make lucene.net culture insensitive (like the java version) --- Key: LUCENENET-493 URL: https://issues.apache.org/jira/browse/LUCENENET-493 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Reporter: Luc Vanlerberghe Labels: patch Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-493.patch, UpdatedLocalizedTestCase.patch In Java, conversion of the basic types to and from strings is locale (culture) independent. For localized input/output one needs to use the classes in the java.text package. In .Net, conversion of the basic types to and from strings depends on the default Culture. Otherwise you have to specify CultureInfo.InvariantCulture explicitly. Some of the testcases in lucene.net fail if they are not run on a machine with culture set to US. In the current version of lucene.net there are patches here and there that try to correct for some specific cases by using string replacement (like System.Double.Parse(s.Replace(., CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)), but that seems really ugly. I submit a patch here that removes the old workarounds and replaces them by calls to classes in the Lucene.Net.Support namespace that try to handle the conversions in a compatible way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393610#comment-13393610 ] Christopher Currens commented on LUCENENET-480: --- I would feel most comfortable leaving it as ISetT in most (all?) places, based on the one you've created, which I'm assuming follows the same contract as .NET 4? I can see Java possibly using more and more classes that aren't HashSet, that implement ISet. What did you wind up doing with SortedSet, did you write a class for it? Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-337) TokenAttribute for Selectively Including Tokens in Length Norm
[ https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393615#comment-13393615 ] Christopher Currens commented on LUCENENET-337: --- I'm unsure about it. It's implemented directly into the DocInverterPerField class, which makes me slightly uncomfortable, but by default, the behavior won't be changed, since LengthNormAttribute.IncludeInLengthNorm is set to true, by default. I think (but don't actually remember) that the API might be outdated, so it would have to be upgraded for 3.0.3. TokenAttribute for Selectively Including Tokens in Length Norm -- Key: LUCENENET-337 URL: https://issues.apache.org/jira/browse/LUCENENET-337 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Reporter: Michael Garski Priority: Minor Fix For: Lucene.Net 3.0.3 Attachments: LengthNorm.patch This patch adds functionality to Lucene.Net that allow a TokenFilter to mark a Token as not to be included in the length norm calculation through the use of a new TokenAttribute interface LengthNormAttribute and a corresponding implementation LengthNormAttributeImpl. This functionality is useful to prevent the increase of the length norm during synonym injection, particularly in cases where there are a large number of synonyms in relation to the number of original tokens. Following is an example of how to use the new attribute. Within your custom TokenFilter, define a field to persist a reference to the attribute and set it's value in the constructor. When a the stream advances to a new Token within the call to IncrementToken() the value of the IncludeInLengthNorm property of the attribute is set to false for Tokens which should not be included in the length norm calculation. It defaults to true and is reset to true after each Token is consumed within DocInverterPerField.ProcessFields. {code:title=CustomTokenFilter.cs|borderStyle=solid} public class CustomTokenFilter : TokenFilter { private LengthNormAttribute lnAttribute; public CustomTokenFilter(TokenStream input) : base(input) { this.lnAttribute = (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute)); } public override bool IncrementToken() { if (input.IncrementToken()) { // make determination that the token is not to be // included in the length norm value // this example marks all tokens to not be // included in the length norm value this.lnAttribute.IncludeInLengthNorm = false; return true; } else { return false; } } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations
[ https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393624#comment-13393624 ] Christopher Currens commented on LUCENENET-495: --- Yeah, I noticed that new structure a few weeks ago. Definitely more powerful, in that it keeps track of utc offset. DateTimeOffset.Now actually just calls {{new DateTimeOffset(DateTime.Now)}}, so it doesn't help in this case. Interestingly enough, this improves speed in Lucene enough, that it has exposed other thread-safety issues in Lucene. Fortunately, I think it's only affecting code specific to the test suite. Well, it's actually code in CollectionHelpers, in Lucene.Net.Support, on AddIfNotContains(Hashtable, object). However, the only usages I could find for that particular method, is in Lucene.Net.Test. Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations -- Key: LUCENENET-495 URL: https://issues.apache.org/jira/browse/LUCENENET-495 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Priority: Critical Fix For: Lucene.Net 3.0.3 This issue mostly just affects RAMDirectory. However, RAMFile and RAMOutputStream are used in other (all?) directory implementations, including FSDirectory types. In RAMOutputStream, the file last modified property for the RAMFile is updated when the stream is flushed. It's calculated using {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}. I've read before that Microsoft has regretted making DateTime.Now a property instead of a method, and after seeing what it's doing, I'm starting to understand why. DateTime.Now is returning local time. In order for it to calculate that, it has to get the utf offset for the machine, which requires the creation of a _class_, System.Globalization.DaylightTime. This is bad for performance. Using code to write 10,000 small documents to an index (4kb sizes), it created 1,570,157 of these DaylightTime classes, a total of 62MB of extra memory...clearly RAMOutputStream.Flush() is called a lot. A fix I'd like to propose is to change the RAMFile from storing the LastModified date to UTC instead of local. DateTime.UtcNow doesn't create any additional objects and is very fast. For this small benchmark, the performance increase is 31%. I've set it to convert to local-time, when {{RAMDirectory.LastModified(string name)}} is called to make sure it has the same behavior (tests fail otherwise). Are there any other side-effects to making this change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations
[ https://issues.apache.org/jira/browse/LUCENENET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393631#comment-13393631 ] Christopher Currens commented on LUCENENET-495: --- I did fix the thread-safety bug. The code was checking if a key existed in the (synchronized) HashTable, and then tried to add it. Because there was no locking, there was the scenario when two threads would check if the key existed at the same time, then both add it within a few instructions of each other, causing one to throw an ArgumentException, because the key already existed. In all of the code code, we are using the correct types we should be (I think). This is code in the test suite that hasn't ever been updated. In fact, it really should be a HashSet and not a HashTable. We were using it because at the time, it was pre-.net 3.0, and the only way to match the java code. We could change it, but IMO, it's not really worth it right now, because it's ONLY used in the test code. In the next version we're porting, the testing code is significantly different, so I don't want to spend _too_ much time cleaning it up if it works. Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations -- Key: LUCENENET-495 URL: https://issues.apache.org/jira/browse/LUCENENET-495 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Priority: Critical Fix For: Lucene.Net 3.0.3 This issue mostly just affects RAMDirectory. However, RAMFile and RAMOutputStream are used in other (all?) directory implementations, including FSDirectory types. In RAMOutputStream, the file last modified property for the RAMFile is updated when the stream is flushed. It's calculated using {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}. I've read before that Microsoft has regretted making DateTime.Now a property instead of a method, and after seeing what it's doing, I'm starting to understand why. DateTime.Now is returning local time. In order for it to calculate that, it has to get the utf offset for the machine, which requires the creation of a _class_, System.Globalization.DaylightTime. This is bad for performance. Using code to write 10,000 small documents to an index (4kb sizes), it created 1,570,157 of these DaylightTime classes, a total of 62MB of extra memory...clearly RAMOutputStream.Flush() is called a lot. A fix I'd like to propose is to change the RAMFile from storing the LastModified date to UTC instead of local. DateTime.UtcNow doesn't create any additional objects and is very fast. For this small benchmark, the performance increase is 31%. I've set it to convert to local-time, when {{RAMDirectory.LastModified(string name)}} is called to make sure it has the same behavior (tests fail otherwise). Are there any other side-effects to making this change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (LUCENENET-495) Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations
Christopher Currens created LUCENENET-495: - Summary: Use of DateTime.Now causes huge amount of System.Globalization.DaylightTime object allocations Key: LUCENENET-495 URL: https://issues.apache.org/jira/browse/LUCENENET-495 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4, Lucene.Net 3.0.3 Reporter: Christopher Currens Assignee: Christopher Currens Priority: Critical Fix For: Lucene.Net 3.0.3 This issue mostly just affects RAMDirectory. However, RAMFile and RAMOutputStream are used in other (all?) directory implementations, including FSDirectory types. In RAMOutputStream, the file last modified property for the RAMFile is updated when the stream is flushed. It's calculated using {{DateTime.Now.Ticks / TimeSpan.TicksPerMillisecond}}. I've read before that Microsoft has regretted making DateTime.Now a property instead of a method, and after seeing what it's doing, I'm starting to understand why. DateTime.Now is returning local time. In order for it to calculate that, it has to get the utf offset for the machine, which requires the creation of a _class_, System.Globalization.DaylightTime. This is bad for performance. Using code to write 10,000 small documents to an index (4kb sizes), it created 1,570,157 of these DaylightTime classes, a total of 62MB of extra memory...clearly RAMOutputStream.Flush() is called a lot. A fix I'd like to propose is to change the RAMFile from storing the LastModified date to UTC instead of local. DateTime.UtcNow doesn't create any additional objects and is very fast. For this small benchmark, the performance increase is 31%. I've set it to convert to local-time, when {{RAMDirectory.LastModified(string name)}} is called to make sure it has the same behavior (tests fail otherwise). Are there any other side-effects to making this change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-490) QueryParser is culture-sensitive
[ https://issues.apache.org/jira/browse/LUCENENET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-490. --- Resolution: Fixed That is how it used to be. I believe I caused this regression in LUCENENET-478 when I re-ported QueryParser. If the LocalizedTestCase class worked correctly, we would have caught this earlier, I think. QueryParser is culture-sensitive Key: LUCENENET-490 URL: https://issues.apache.org/jira/browse/LUCENENET-490 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 3.0.3 Environment: CurrentCulture = sv-SE, CurrentUICulture = en-US. Reporter: Simon Svensson Priority: Minor Fix For: Lucene.Net 3.0.3 The QueryParser calls Single.Parse which is culture-sensitive. This will fail on cultures using other decimal separators (e.g. Swedish [sv-SE]). This does not affect 2.9.4; it calls SupportClass.Single.Parse. This looks related to LUCENENET-285. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-480) Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible
[ https://issues.apache.org/jira/browse/LUCENENET-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295239#comment-13295239 ] Christopher Currens commented on LUCENENET-480: --- I'm assuming we're going to define some sort of symbol to compile code if it's 3.5 vs 4.0. If so, we can do a few things to make it easier. 1. We can. The only reason we're using the interface is because Java is. I can see in the future this might be a problem if we has to use a set class that was not HashSet...but at least not it's not a problem. Alternatively, we can write our own ISet class based on the .NET 4.0 one, and use a class that wraps HashSet and implements ISet. 2. I think the only way to do this one is write our own, as you said. 3. We can just define these in the support class, when using .NET 3.5 {code} public delegate TResult FuncT1, T2, T3, T4, T5, T6, T7, T8, T9, TResult(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9) public delegate TResult FuncT1, T2, T3, T4, T5, T6, T7, T8, T9, T10, TResult(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10) {code} 4. We can either use Digy's (I think it uses a WeakHashTable) or we can write our own (more work), using Thread.AllocateDataSlot(). I believe that is how it is done internally in .NET 4. 5. ParallelMultiSearcher does have the biggest changes between .NET 3.5 (and actually Java's version of the class, because of the use of Tasks instead of manually spawning threads). I feel like we could remove it from the 3.5 version (at least for now), or have two versions of 3.5, where one has a dependency on the TPL for 3.5. Investigate what needs to happen to make both .NET 3.5 and 4.0 builds possible -- Key: LUCENENET-480 URL: https://issues.apache.org/jira/browse/LUCENENET-480 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Christopher Currens We need to investigate what needs to be done with the source to be able to support both a .NET 3.5 and 4.0 build. There was some concern from at least one member of the community ([see here|http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201202.mbox/%3C004b01cce111$871f4990$955ddcb0$@fr%3E]) that we've alienated some of our user base by making such a heavy jump from 2.0 to 4.0. There are major benefits to using 4.0 over the 2.0-based runtimes, specifically FAR superior memory management, particularly with the LOH, where Lucene.NET has had major trouble with in the past. Based on what has been done with Lucene.NET 3.0.3, we can't (easily) drop .NET 3.5/3.0 libraries and go back to 2.0. HashSetT and ISetT is used extensively in the code, that would be a major blocker to get done. I suppose it could be possible, but there hasn't been a whole lot of talk about what runtimes we intend to support. The big change between lucene 2.x and 3.x for java was also a change in runtimes, that allowed generics and enums to be used in the base code. We have a similar situation with the API (it's substantially different, with the addition of properties alone) for this next version of Lucene.NET, so I think it's reasonable to at least make a permanent move from 2.0 to 3.5, though that's only my opinion, hasn't been discussed with the committers. It seems that supporting 3.5 and 4.0 should be fairly easy, and is a decent compromise in supporting both a 2.0 and 4.0 runtime. At the very least, we should try our best to support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-493) Make lucene.net culture insensitive (like the java version)
[ https://issues.apache.org/jira/browse/LUCENENET-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293189#comment-13293189 ] Christopher Currens commented on LUCENENET-493: --- This is rather annoying, actually. Java has tests for different cultures wired into the test suite. Interestingly enough, so do we, but because of the differences between JUnit and NUnit (namely attribute based test discovery), we can't override the test running implementation in the same way java does. So, the code we've ported for testing cultures does not work...period. NUnit supports changing the cultures via attributes, but only a single culture. MbUnit allows multiple cultures and will run the test each time in that culture. We should find a workaround. Make lucene.net culture insensitive (like the java version) --- Key: LUCENENET-493 URL: https://issues.apache.org/jira/browse/LUCENENET-493 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Reporter: Luc Vanlerberghe Labels: patch Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-493.patch In Java, conversion of the basic types to and from strings is locale (culture) independent. For localized input/output one needs to use the classes in the java.text package. In .Net, conversion of the basic types to and from strings depends on the default Culture. Otherwise you have to specify CultureInfo.InvariantCulture explicitly. Some of the testcases in lucene.net fail if they are not run on a machine with culture set to US. In the current version of lucene.net there are patches here and there that try to correct for some specific cases by using string replacement (like System.Double.Parse(s.Replace(., CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)), but that seems really ugly. I submit a patch here that removes the old workarounds and replaces them by calls to classes in the Lucene.Net.Support namespace that try to handle the conversions in a compatible way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-490) QueryParser is culture-sensitive
[ https://issues.apache.org/jira/browse/LUCENENET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-490: -- Fix Version/s: Lucene.Net 3.0.3 QueryParser is culture-sensitive Key: LUCENENET-490 URL: https://issues.apache.org/jira/browse/LUCENENET-490 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 3.0.3 Environment: CurrentCulture = sv-SE, CurrentUICulture = en-US. Reporter: Simon Svensson Priority: Minor Fix For: Lucene.Net 3.0.3 The QueryParser calls Single.Parse which is culture-sensitive. This will fail on cultures using other decimal separators (e.g. Swedish [sv-SE]). This does not affect 2.9.4; it calls SupportClass.Single.Parse. This looks related to LUCENENET-285. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-464) The Lucene.Net.FastVectorHighligher.dll of the latest release 2.9.4 breaks any ASP.NET application
[ https://issues.apache.org/jira/browse/LUCENENET-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-464. --- Resolution: Fixed Fix Version/s: Lucene.Net 3.0.3 Fixed in trunk. Doesn't look like we'll be doing a hotfix, since contrib is now tied into the 3.x API. The Lucene.Net.FastVectorHighligher.dll of the latest release 2.9.4 breaks any ASP.NET application -- Key: LUCENENET-464 URL: https://issues.apache.org/jira/browse/LUCENENET-464 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 64.bit System with ASP.NET 4.0 / C# Reporter: Jörg Lang Labels: patch Fix For: Lucene.Net 3.0.3 Original Estimate: 1h Remaining Estimate: 1h After I included Lucene and the Contrib modules via NuGet to my project, the web application would not even start anymore. Before the first page was shown, I got the message I added at the end of this text. By trial and error I found out, that when the Lucene.Net.FastVectorHighligher.dll is deleted from the bin directory, the application runs again. I then looked at the source code of Lucene.Net.FastVectorHighligher.dll and found that in file support.cs the following code is located that causes the problem. namespace System.Runtime.CompilerServices { [AttributeUsage(AttributeTargets.Method)] public sealed class ExtensionAttribute : Attribute { public ExtensionAttribute() { } } } After removing and recompiling the dll everything works, meaning my application starts and also the highlighting is working correctly. It would be cool if this could be fixed in a patch or hotfix, so the NuGet would deliver a corrected version. Regards Jörg Server Error in '/' Application. Compilation Error Description: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately. Compiler Error Message: BC30560: 'ExtensionAttribute' is ambiguous in the namespace 'System.Runtime.CompilerServices'. Source Error: [No relevant source lines] Source File: InternalXmlHelper.vbLine: 9 Show Detailed Compiler Output: C:\Program Files (x86)\IIS Express C:\Windows\Microsoft.NET\Framework\v4.0.30319\vbc.exe /t:library /utf8output /R:C:\Windows\assembly\GAC_MSIL\IKVM.OpenJDK.Util\0.46.0.1__13235d27fcbfff58\IKVM.OpenJDK.Util.dll /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\51d01989\a92a29aa_d2c4cc01\Lucene.Net.Contrib.Core.DLL /R:C:\Windows\assembly\GAC_MSIL\IKVM.OpenJDK.Text\0.46.0.1__13235d27fcbfff58\IKVM.OpenJDK.Text.dll /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\74156559\d79ffad5_b7a2cc01\Xml.Schema.Linq.DLL /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\3a057033\005cb977_cb62ca01\DeepZoomTools.DLL /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\6b08d27a\eac1b816_1898cc01\Microsoft.Practices.ServiceLocation.DLL /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\2fe0a669\00eae495_b440cc01\Telerik.Windows.RadUploadHandler.DLL /R:C:\Windows\assembly\GAC_MSIL\IKVM.OpenJDK.Core\0.46.0.1__13235d27fcbfff58\IKVM.OpenJDK.Core.dll /R:C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll /R:C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel.Web\v4.0_4.0.0.0__31bf3856ad364e35\System.ServiceModel.Web.dll /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\c2392a3f\9a92e9c6_95c0cc01\Microsoft.Practices.EnterpriseLibrary.Logging.DLL /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\b9daec08\db8bd216_1898cc01\Microsoft.Practices.Unity.DLL /R:C:\Users\jlang.EVELIX\AppData\Local\Temp\Temporary ASP.NET Files\root\abef18a3\584f3aa\assembly\dl3\a9da4c25\00c6435b_cfa2cc01\Telerik.Web.UI.Skins.DLL /R:C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Runtime.Serialization\v4.0_4.0.0.0__b77a5c561934e089\System.Runtime.Serialization.dll /R:C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel.Activities\v4.0_4.0.0.0__31bf3856ad364e35\System.ServiceModel.Activities.dll
[jira] [Resolved] (LUCENENET-445) Lucene.Net.Index.TestIndexWriter.TestFutureCommit() Fails
[ https://issues.apache.org/jira/browse/LUCENENET-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-445. --- Resolution: Fixed Fix Version/s: Lucene.Net 3.0.3 I saw this fail when trunk was in 2.9, but not since trunk was merged with the 3.x branch, clearly some change caused this test to now pass. I this it's safe to close this issue. Lucene.Net.Index.TestIndexWriter.TestFutureCommit() Fails - Key: LUCENENET-445 URL: https://issues.apache.org/jira/browse/LUCENENET-445 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.4 Reporter: Prescott Nasser Priority: Minor Fix For: Lucene.Net 3.0.3 Fails with the following error: System.Collections.Generic.KeyNotFoundException was unhandled by user code Message=The given key was not present in the dictionary. Source=mscorlib StackTrace: at System.Collections.Generic.Dictionary`2.get_Item(TKey key) at Lucene.Net.Index.TestIndexWriter.TestFutureCommit() in C:\Users\GeoBMX540\Desktop\Trunk\test\core\Index\TestIndexWriter.cs:line 5969 InnerException: -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LUCENENET-487) Remove Obsolete Members, Fields that are marked as obsolete and to be removed in 3.0
[ https://issues.apache.org/jira/browse/LUCENENET-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-487. --- Resolution: Fixed Fix Version/s: Lucene.Net 3.0.3 I did a quick search across the source in Lucene, and I think the LongParser and DoubleParser are the only two items in the entire source, marked obsoleted for removal in 3.0. There is one other object that is marked Obsolete to be removed in 3.x, but it is still in the 3.0.3 java code. I'm going to remove the obsolete attributes from those two interfaces listed, because of the differences of java and C# (not .NET), we can't match this API. Specifically, this reason is because even though Java and .NET allow static members and nested types (both static and instance types), it's invalid C# code, since it doesn't support it. Either way, since we can't conform to the exact API that java does for the FieldCache, it wouldn't be a bad idea to clean it up. Remove Obsolete Members, Fields that are marked as obsolete and to be removed in 3.0 Key: LUCENENET-487 URL: https://issues.apache.org/jira/browse/LUCENENET-487 Project: Lucene.Net Issue Type: Task Affects Versions: Lucene.Net 3.0.3 Reporter: Prescott Nasser Priority: Minor Fix For: Lucene.Net 3.0.3 Original Estimate: 24h Remaining Estimate: 24h We have several places where things are marked as obsolete and to be removed in 3.0. We should remove those. Also, care should be taken to comment what was removed (and what should be used in it's place), so that we can have complete and quality release notes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-467) .NETify the public API where appropriate
[ https://issues.apache.org/jira/browse/LUCENENET-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291130#comment-13291130 ] Christopher Currens commented on LUCENENET-467: --- Thanks. Patch applied in revision 1347715 .NETify the public API where appropriate Key: LUCENENET-467 URL: https://issues.apache.org/jira/browse/LUCENENET-467 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib, Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Environment: all Reporter: Christopher Currens Labels: refactoring Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-467-create.patch Although we haven't abandoned the line-by-line port of Java lucene, there are many idioms in Java that make little to no sense in a .NET assembly. The API can change to allow for a conventional .NET experience, while still maintaining the ability and ease during the porting process of Java logic. * Change Getxxx() and Setxxx() methods to .NET Properties * Implement the [dispose pattern|http://msdn.microsoft.com/en-us/library/fs2xkftw.aspx] properly. Try, at all costs, to only use finalizers *when necessary*. They are expensive, and most of the classes used already have finalizers that will be called. * Convert Java Iterator-style classes (see TermEnum, TermDocs and others) to implement IEnumerableT * When catching exceptions, do not use *throw;* instead of *throw ex;* to maintain the stack trace -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-484) Some possibly major tests intermittently fail
[ https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286699#comment-13286699 ] Christopher Currens commented on LUCENENET-484: --- Thanks Luc. This is great stuff. I'll run the patch on my local box and double check everything. Your help with this is appreciated by all of us! Some possibly major tests intermittently fail -- Key: LUCENENET-484 URL: https://issues.apache.org/jira/browse/LUCENENET-484 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-484-WeakDictionary.patch, Lucenenet-484-WeakDictionaryTests.patch These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 #- TestFieldCacheSanityChecker.TestInsanity2 #- (It's possible all of the insanity tests fail at one point or another) # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENENET-484) Some possibly major tests intermittently fail
[ https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-484: -- Description: These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # -Lucene.Net.Util:- #- -TestFieldCacheSanityChecker.TestInsanity1- #- -TestFieldCacheSanityChecker.TestInsanity2- #- -(It's possible all of the insanity tests fail at one point or another)- # -Lucene.Net.Support- #- -TestWeakHashTableMultiThreadAccess.Test- TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. was: These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 #- TestFieldCacheSanityChecker.TestInsanity2 #- (It's possible all of the insanity tests fail at one point or another) # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. Environment: All Applied the patches. Getting closer to resolving this issue. Some possibly major tests intermittently fail -- Key: LUCENENET-484 URL: https://issues.apache.org/jira/browse/LUCENENET-484 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Environment: All Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 Attachments: Lucenenet-484-WeakDictionary.patch, Lucenenet-484-WeakDictionaryTests.patch These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # -Lucene.Net.Util:- #- -TestFieldCacheSanityChecker.TestInsanity1- #- -TestFieldCacheSanityChecker.TestInsanity2- #- -(It's possible all of the insanity tests fail at one point or another)- # -Lucene.Net.Support- #- -TestWeakHashTableMultiThreadAccess.Test- TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-484) Some possibly major tests intermittently fail
[ https://issues.apache.org/jira/browse/LUCENENET-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264032#comment-13264032 ] Christopher Currens commented on LUCENENET-484: --- Some of the tests I can only produce if a) I run it in Release mode, AND b) I run the *entire* test suite, not just the single test. Some possibly major tests intermittently fail -- Key: LUCENENET-484 URL: https://issues.apache.org/jira/browse/LUCENENET-484 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core, Lucene.Net Test Affects Versions: Lucene.Net 3.0.3 Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 These tests will fail intermittently in Debug or Release mode, in the core test suite: # -Lucene.Net.Index:- #- -TestConcurrentMergeScheduler.TestFlushExceptions- # Lucene.Net.Store: #- TestLockFactory.TestStressLocks # Lucene.Net.Search: #- TestSort.TestParallelMultiSort # Lucene.Net.Util: #- TestFieldCacheSanityChecker.TestInsanity1 #- TestFieldCacheSanityChecker.TestInsanity2 #- (It's possible all of the insanity tests fail at one point or another) # Lucene.Net.Support #- TestWeakHashTableMultiThreadAccess.Test TestWeakHashTableMultiThreadAccess should be fine to remove along with the WeakHashTable in the Support namespace, since it's been replaced with WeakDictionary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENENET-486) Wildcard queries are not analyzed
[ https://issues.apache.org/jira/browse/LUCENENET-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260708#comment-13260708 ] Christopher Currens commented on LUCENENET-486: --- I think this affects other languages more than it does English, well, at least it affects the German analyzer, since it does umlaut conversions. While I don't think design change to Lucene.NET is necessary, it might be beneficial to expose the logic that converts umlauts in terms, so that developers can manually sanitize the terms in the query themselves (even overriding methods in QueryParser) so they can get the same behavior. I think that might be a reasonable compromise, and only affects the GermanAnalyzer in Contrib. Wildcard queries are not analyzed - Key: LUCENENET-486 URL: https://issues.apache.org/jira/browse/LUCENENET-486 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib, Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4 Environment: Windows 7, Visual Studio 2010, .net 4.0 Reporter: Björn Attachments: LuceneTest.zip The lucene 'QueryParser' doesn't analyze wildcard querys. The function 'GetPrefixQuery'(QueryParser.cs) returns the string without any analyzation. I have performed some queries to show the problem. The analyzer is the 'Contrib.Analyzers.DE.GermanAnalyzer' -- indexed word: 'Häuser'; in the index stemmed as: 'hau' -- query: Hau*; hit: yes query: Hause*; hit: no; This should be a hit. -- indexed word: 'Angebote'; in the index stemmed as: 'angebo' -- query: Angebo*; hit: yes query: Angebot*; hit: no; This should be a hit. query: Angebote*; hit: no; This should be a hit. -- indexed word: 'Björn'; in the index stemmed as: 'bjor' -- query: Bjor*; hit: yes query: Björ*; hit: no; This should be a hit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049331#comment-13049331 ] Christopher Currens commented on LUCENENET-417: --- Also, SimpleFSDirectory doesn't really support stream indexing as much as I would hope. The issue lies in that SimpleFSDirectory creates a RAMOutputStream that it uses before its flushed to disk. The PerDoc class keeps the entire thing in memory before flushing to disk. I'm assuming it does this so indexes aren't corrupted. It seems a good idea may be to create a new Directory implementation that has a special IndexOutput that will buffer to disk when a certain limit is hit, to prevent OOM exceptions indexing huge amounts of data. However, I'm not sure that falls within scope of Lucene.Net...maybe contrib? I have some ideas on how to do this without leaving behind any artifacts, like temp files. It seems the easiest way would be using MemoryMappedFile, as GC frees the file, even under early termination of a program. Unfortunately, that's a .Net 4 only class. implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: StreamValues.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. -Java lucene has the ability to use a TextReader the both analyze and store text in the index.- Lucene.NET lacks the ability to store string data in the index via streams. This should be a feature added into Lucene .NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET
[ https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049334#comment-13049334 ] Christopher Currens commented on LUCENENET-423: --- .NET is far better at parsing date string, but it's the inconsistency between the Java version and .NET version that I'm worried about. Search the index with one query from java and you get different results with the same query in .Net. How compatible do we want to be with Java? QueryParser differences between Java and .NET - Key: LUCENENET-423 URL: https://issues.apache.org/jira/browse/LUCENENET-423 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Christopher Currens When trying to do a RangeQuery that uses dates in a certain format, .NET behaves differently from its Java counterpart. The code is the same between them, but as far as I can tell, it appears that it is a difference in the way Java parses dates vs how .NET parses dates. To reproduce: {code:java} var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FullText, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)); var query = queryParser.Parse(Field:[2001-01-17 TO 2001-01-20]); {code} You'll notice that query looks like the old DateField format (eg 0g1d64542). If you do the same query in Java (or Luke), you'll notice the query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot parse a string formatted in that way. If you change the string to use / instead of - in the java, you'll get one that uses DateResolutions and DateTools.DateToString(). It seems an appropriate fix for this, if we wanted to keep this behavior similar to Java, would be to write our own DateTime parser that behaved the same way to Java's parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049363#comment-13049363 ] Christopher Currens commented on LUCENENET-425: --- On a 1.18GB index of only one text field: MMap Dir: 74 FS Dir: 34 --- MMap Dir: 79 FS Dir: 29 Press any key to continue . . . Same index, order of search changed: FS Dir: 25 MMap Dir: 110 --- FS Dir: 112 MMap Dir: 78 Press any key to continue . . . On a 241MB index of text and binary data (used a field selector to only get the text field): FS Dir: 151 MMap Dir: 679 --- FS Dir: 130 MMap Dir: 627 Press any key to continue . . . Same index, order of search changed: MMap Dir: 867 FS Dir: 134 --- MMap Dir: 600 FS Dir: 135 Press any key to continue . . . The second index, while smaller, requires a lot more seeking, due to the amount of fields per doc (anywhere from 15-30 fields per doc). Seems it would be a more realistic index to search. MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Issue Comment Edited] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049363#comment-13049363 ] Christopher Currens edited comment on LUCENENET-425 at 6/14/11 7:36 PM: On a 1.18GB index of only one text field: {panel} MMap Dir: 74 FS Dir: 34 --- MMap Dir: 79 FS Dir: 29 Press any key to continue . . . {panel} Same index, order of search changed: {panel} FS Dir: 25 MMap Dir: 110 --- FS Dir: 112 MMap Dir: 78 Press any key to continue . . . {panel} On a 241MB index of text and binary data (used a field selector to only get the text field): {panel} FS Dir: 151 MMap Dir: 679 --- FS Dir: 130 MMap Dir: 627 Press any key to continue . . . {panel} Same index, order of search changed: {panel} MMap Dir: 867 FS Dir: 134 --- MMap Dir: 600 FS Dir: 135 Press any key to continue . . . {panel} The second index, while smaller, requires a lot more seeking, due to the amount of fields per doc (anywhere from 15-30 fields per doc). Seems it would be a more realistic index to search. was (Author: ccurrens): On a 1.18GB index of only one text field: MMap Dir: 74 FS Dir: 34 --- MMap Dir: 79 FS Dir: 29 Press any key to continue . . . Same index, order of search changed: FS Dir: 25 MMap Dir: 110 --- FS Dir: 112 MMap Dir: 78 Press any key to continue . . . On a 241MB index of text and binary data (used a field selector to only get the text field): FS Dir: 151 MMap Dir: 679 --- FS Dir: 130 MMap Dir: 627 Press any key to continue . . . Same index, order of search changed: MMap Dir: 867 FS Dir: 134 --- MMap Dir: 600 FS Dir: 135 Press any key to continue . . . The second index, while smaller, requires a lot more seeking, due to the amount of fields per doc (anywhere from 15-30 fields per doc). Seems it would be a more realistic index to search. MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049359#comment-13049359 ] Christopher Currens commented on LUCENENET-425: --- On a 1.18GB index of only text: FS Reader: 27 MMap Reader: 90 --- FS Reader: 38 MMap Reader: 77 Press any key to continue . . . MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-425: -- Comment: was deleted (was: On a 1.18GB index of only text: FS Reader: 27 MMap Reader: 90 --- FS Reader: 38 MMap Reader: 77 Press any key to continue . . .) MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-417) implement streams as field values
implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. Java lucene has the ability to use a TextReader the both analyze and store text in the index. .NET lacks the ability to store the data in the index, due to the fact that .net TextReaders cannot seek or reset the position of the stream. This should be a feature added into Lucene.NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-417: -- Attachment: BinaryStream.patch This patch allows StreamValue to be used with binary data. implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: BinaryStream.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. Java lucene has the ability to use a TextReader the both analyze and store text in the index. .NET lacks the ability to store the data in the index, due to the fact that .net TextReaders cannot seek or reset the position of the stream. This should be a feature added into Lucene.NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039436#comment-13039436 ] Christopher Currens commented on LUCENENET-417: --- Good call. I think I was confusing storing the whole field with storing the term vectors, which lucene.net can do. I still think at the very least being able to store binary values via a stream is a necessary addition to Lucene.Net. Strings are less of an issue, to me at least, of making streamable. However, I can see the benefit when indexing large items, which is really all this is attempting to solve. There are speed/memory issues created by being forced to load large quantities of data into memory to perform any sort of indexing operation on them. This may not be a terribly large use case for some people, but anyone trying to write a multi-threaded indexing system would certainly enjoy the benefits of a low memory footprint/speed increase. implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: BinaryStream.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. Java lucene has the ability to use a TextReader the both analyze and store text in the index. .NET lacks the ability to store the data in the index, due to the fact that .net TextReaders cannot seek or reset the position of the stream. This should be a feature added into Lucene.NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-417: -- Description: Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. -Java lucene has the ability to use a TextReader the both analyze and store text in the index.- Lucene.NET lacks the ability to store string data in the index via streams. This should be a feature added into Lucene .NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. was: Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. Java lucene has the ability to use a TextReader the both analyze and store text in the index. .NET lacks the ability to store the data in the index, due to the fact that .net TextReaders cannot seek or reset the position of the stream. This should be a feature added into Lucene.NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: BinaryStream.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. -Java lucene has the ability to use a TextReader the both analyze and store text in the index.- Lucene.NET lacks the ability to store string data in the index via streams. This should be a feature added into Lucene .NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034961#comment-13034961 ] Christopher Currens commented on PYLUCENE-9: We can close it. Thanks for the help. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031259#comment-13031259 ] Christopher Currens commented on PYLUCENE-9: I've posted a question to the java-lucene list, however, I'm sure it won't help at all. The simple fact is that the lucene 3.0 jar parses the query as ft:calendar item msg. The *same* lucene 3.0 jar when invoked from pylucene, produces ft:calendar item ? msg for me, on both windows and ubuntu boxes. I suppose this just might be an issue with jcc? I've been able to produce this both on my boxes at work, and my box at home, both producing the incorrect output. Perhaps I'm most curious if this can be reproduced by any developer for pylucene, or if its just some crazy environment issue happening on my boxes and everyone else I know. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031284#comment-13031284 ] Christopher Currens commented on PYLUCENE-9: Hmm, the code I have is nearly identical, and when I pull it out of the contained code, it behaves as it should. I can't post the whole code, but the issue must be that there's a lingering Version.LUCENE_24 somewhere I suppose. I'll try figuring it out on my own, I'm glad to see its something idiotic I've done. :) QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PYLUCENE-9) QueryParser replacing stop words with wildcards
QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards
[ https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029674#comment-13029674 ] Christopher Currens commented on PYLUCENE-9: I was very hesitant to report this as a bug, since pylucene isn't a port, rather just recompiled. I am positive I am comparing the correct versions (I'm a committer on Lucene.Net). I'll show you all the configurations I've done: Lucene.Net 2.9.2 - Valid Lucene.Net 2.9.4 - Valid Java Lucene (via Luke 1.0.1 (uses Lucene 2.9.4)) - Valid Java Lucene (via Luke 3.1.0 (uses Lucene 3.0)) - Valid pyLucene (Lucene 2.9.2) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 2.9.4) - Invalid replaced by single Wildcard ('?') pyLucene (Lucene 3.0.3) - Invalid replaced by single Wildcard ('?') Those tests are all on the 32-bin Win-XP. The ubuntu box I've used was using pyLucene w/ lucene 2.9.2. One thing I hadn't considered, though, was to see if it can be replicated outside of the many machines I've used myself to test, specifically if there's in issue with our building of it via JCC, or something in our environment. But considering I've tried it at work and at home, there's no real other place I can test it. QueryParser replacing stop words with wildcards --- Key: PYLUCENE-9 URL: https://issues.apache.org/jira/browse/PYLUCENE-9 Project: PyLucene Issue Type: Bug Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 GNU/Linux, jdk1.6.0_23 Reporter: Christopher Currens Was using query parser to build a query. In Java Lucene (as well as Lucene.Net), the query Calendar Item as Msg (quotes included), is parsed properly as FullText:calendar item msg in Java Lucene and Lucene.Net. In pylucene, it is parsed as: FullText:calendar item ? msg. This causes obvious problems when comparing search results from python, java and .net. Initially, I thought it was the Analyzer I was using, but I've tried the StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but not pylucene. Here is code I've used to reproduce the issue: from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version analyzer = StandardAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30, FullText, analyzer) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg analyzer = StopAnalyzer(Version.LUCENE_30) query = QueryParser(Version.LUCENE_30) parsedQuery = query.parse(\Calendar Item as Msg\) parsedQuery Query: FullText:calendar item ? msg I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995120#comment-12995120 ] Christopher Currens commented on LUCENENET-379: --- I would be happy to move away from the old logo. The project's goals have certainly changed from the previous project and I think it deserves to look new. Not to mention the old lucene.net logo, the .net part of it looks funky. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira