[jira] [Closed] (LUCENENET-479) QueryParser.SetEnablePositionIncrements(false) doesn't work
[ https://issues.apache.org/jira/browse/LUCENENET-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens closed LUCENENET-479. - Resolution: Fixed This was fixed along with re-porting the parser in LUCENENET-478. Additionally, SetEnablePositionIncrements and GetEnablePositionIncrements now uses a bool instead of a class, and is now a public property with a getter and setter (EnablePositionIncrements) QueryParser.SetEnablePositionIncrements(false) doesn't work --- Key: LUCENENET-479 URL: https://issues.apache.org/jira/browse/LUCENENET-479 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Christopher Currens Fix For: Lucene.Net 3.0.3 Trying to disable position increments via SetEnablePositionIncrements(false) has no effect, at least on phrase queries. The parsed query returned from the QueryParser with this input, should by default return a phrase query whose terms look like: Query with Stopwords should look silmilar to this if converted to a string: query ? stopwords, where ? is a null term query in the phrase query. With EnablePositionIncrements set to false, the resulting query should be similary to query stopwords. However, calling SetEnablePositionIncrements(false) has no effect on the resulting query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (LUCENENET-466) optimisation for the GermanStemmer.vb
[ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens closed LUCENENET-466. - Resolution: Fixed I've added a new stemmer in trunk called GermanDIN2Stemmer. You can specify GermanAnalyzer use it via some new constructors that take a bool indicating if you want to use the DIN-5007-2 stemmer instead of the default DIN-5007-1 stemmer. This won't break compatibility with users who want to use the old default DIN1 stemmer, but enables anyone who wants to use the other. optimisation for the GermanStemmer.vb -- Key: LUCENENET-466 URL: https://issues.apache.org/jira/browse/LUCENENET-466 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Prescott Nasser Priority: Minor Fix For: Lucene.Net 3.0.3 I have a little optimisation for the GermanStemmer.vb (in Contrib.Analyzers) class. At the moment the function Substitute converts the german Umlaute ä in a, ö ino and ü in u. This is not the correct german translation. They must be converted to ae, oe and ue. So I can write the name Björn or Bjoern but not Bjorn. With this optimization a user can search for Björn and also find Bjoern. Here is the optimized code snippet: else if ( buffer[c] == 'ä' ) { buffer[c] = 'a'; buffer.Insert(c + 1, 'e'); } else if ( buffer[c] == 'ö' ) { buffer[c] = 'o'; buffer.Insert(c + 1,'e'); } else if ( buffer[c] == 'ü' ) { buffer[c] = 'u'; buffer.Insert(c + 1,'e'); } Thank You Björn -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2042 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2042/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: ERROR: SolrIndexSearcher opens=93 closes=92 Stack Trace: junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=93 closes=92 at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$3.addError(JUnitTestRunner.java:974) at junit.framework.TestResult.addError(TestResult.java:38) at junit.framework.JUnit4TestAdapterCache$1.testFailure(JUnit4TestAdapterCache.java:51) at org.junit.runner.notification.RunNotifier$4.notifyListener(RunNotifier.java:100) at org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:41) at org.junit.runner.notification.RunNotifier.fireTestFailure(RunNotifier.java:97) at org.junit.internal.runners.model.EachTestNotifier.addFailure(EachTestNotifier.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:306) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) Caused by: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=93 closes=92 at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:211) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:37) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:39) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) ... 4 more Build Log (for compile errors): [...truncated 9494 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235413#comment-13235413 ] Tommaso Teofili commented on SOLR-2983: --- I just noticed also the toIndexWriter method should be explicitly tested, going to work on it and attach a new patch Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Assignee: Tommaso Teofili Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2983.patch As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3877) Lucene should not call System.out.println
[ https://issues.apache.org/jira/browse/LUCENE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235422#comment-13235422 ] Dawid Weiss commented on LUCENE-3877: - bq. I have seen it not work in the past for obscure reasons Most likely the reasons were incorrect pointcut definitions? These can be tricky, I agree. Nonetheless, I've been using AspectJ for a long time and it always fits my needs and expectations. I'm not saying it doesn't have any bugs -- I'm sure it has. But the right tool for the right job; it took me about 5 mins to write and apply that aspect (with follow ups, I sent an e-mail to the mailing list, JIRA didn't work at the time). I'm not advocating for any tool, really. To me aspectj is a fast tool for expressing where I want a given snippet of code to be injected (or what I want excluded) and for such tasks I don't see a faster or more pleasant to use alternative. Oh, I've been using asmlib too; extensively in fact; so it's not lack of knowledge about the tool itself. Lucene should not call System.out.println - Key: LUCENE-3877 URL: https://issues.apache.org/jira/browse/LUCENE-3877 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: IllegalSystemTest.java, IllegalSystemTest.java, SystemPrintCheck.java We seem to have accumulated a few random sops... Eg, PairOutputs.java (oal.util.fst) and MultiDocValues.java, at least. Can we somehow detect (eg, have a test failure) if we accidentally leave errant System.out.println's (leftover from debugging)...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3877) Lucene should not call System.out.println
[ https://issues.apache.org/jira/browse/LUCENE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235424#comment-13235424 ] Dawid Weiss commented on LUCENE-3877: - My aspectj experiments from yesterday when JIRA was dead. I applied that aspect just to see what happens. {noformat} ajc -sourceroots aspects \ -inpath lucene-core-3.6-SNAPSHOT.jar \ -d none \ -cp aspectjrt.jar \ -showWeaveInfo {noformat} Here's what I got: {noformat} Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.analysis.PorterStemmer' (PorterStemmer.java:529) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.analysis.PorterStemmer' (PorterStemmer.java:534) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.analysis.PorterStemmer' (PorterStemmer.java:542) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:989) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:996) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1003) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1012) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1013) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1038) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1043) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1047) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1056) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1057) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1062) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1071) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1073) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1074) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1077) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1079) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1081) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1082) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream java.lang.System.out)' in Type 'org.apache.lucene.index.CheckIndex' (CheckIndex.java:1085) advised by before advice from 'spikes.NoSysOuts' (NoSysOuts.aj:6) Join point 'field-get(java.io.PrintStream
[jira] [Commented] (LUCENE-3877) Lucene should not call System.out.println
[ https://issues.apache.org/jira/browse/LUCENE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235428#comment-13235428 ] Dawid Weiss commented on LUCENE-3877: - Oh, btw. I think a FindBugs rule for detecting sysouts/syserrs would be a great addition to FindBugs -- you should definitely file it as an improvement there. In reality at least class-level exclusions will be needed to avoid legitimate matches like the ones shown above (main methods, exception handlers), but these can be lived with. Lucene should not call System.out.println - Key: LUCENE-3877 URL: https://issues.apache.org/jira/browse/LUCENE-3877 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: IllegalSystemTest.java, IllegalSystemTest.java, SystemPrintCheck.java We seem to have accumulated a few random sops... Eg, PairOutputs.java (oal.util.fst) and MultiDocValues.java, at least. Can we somehow detect (eg, have a test failure) if we accidentally leave errant System.out.println's (leftover from debugging)...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3901) Add katakana filter to better deal with katakana spelling variants
Add katakana filter to better deal with katakana spelling variants -- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3901: --- Summary: Add katakana stem filter to better deal with certain katakana spelling variants (was: Add katakana filter to better deal with katakana spelling variants) Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1801 - Still Failing
Build: https://builds.apache.org/job/Solr-trunk/1801/ 1 tests failed. FAILED: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: Uncaught exception by thread: Thread[Thread-662,5,] Stack Trace: org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[Thread-662,5,] at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:84) at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:618) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:37) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:39) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) Caused by: java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: http://localhost:53923/solr at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:374) Caused by: org.apache.solr.client.solrj.SolrServerException: http://localhost:53923/solr at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:496) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:312) at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:369) Caused by: org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 100 ms at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:155) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:426) ... 4 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at
[jira] [Commented] (LUCENE-3897) KuromojiTokenizer fails with large docs
[ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235493#comment-13235493 ] Michael McCandless commented on LUCENE-3897: Thanks Christian! KuromojiTokenizer fails with large docs --- Key: LUCENE-3897 URL: https://issues.apache.org/jira/browse/LUCENE-3897 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3897.patch just shoving largeish random docs triggers asserts like: {noformat} [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120 [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404) {noformat} But, you get no seed... I'll commit the test case and @Ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235494#comment-13235494 ] Dawid Weiss commented on LUCENE-3867: - I've been experimenting a bit with the new code. Field offsets for three classes in a hierarchy with unalignable fields (byte, long combinations at all levels). Note unaligned reordering of byte field in JRockit - nice. {noformat} JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (compressed OOPs) @12 4 Super.superByte @16 8 Super.subLong @24 8 Sub.subLong @32 4 Sub.subByte @36 4 SubSub.subSubByte @40 8 SubSub.subSubLong @48sizeOf(SubSub.class instance) JVM: [JVM: HotSpot, Sun Microsystems Inc., 1.6.0_31] (normal OOPs) @16 8 Super.subLong @24 8 Super.superByte @32 8 Sub.subLong @40 8 Sub.subByte @48 8 SubSub.subSubLong @56 8 SubSub.subSubByte @64sizeOf(SubSub.class instance) JVM: [JVM: J9, IBM Corporation, 1.6.0] @24 8 Super.subLong @32 4 Super.superByte @36 4 Sub.subByte @40 8 Sub.subLong @48 8 SubSub.subSubLong @56 8 SubSub.subSubByte @64sizeOf(SubSub.class instance) JVM: [JVM: JRockit, Oracle Corporation, 1.6.0_26] (64-bit JVM!) @ 8 8 Super.subLong @16 1 Super.superByte @17 7 Sub.subByte @24 8 Sub.subLong @32 8 SubSub.subSubLong @40 8 SubSub.subSubByte @48sizeOf(SubSub.class instance) {noformat} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235497#comment-13235497 ] Christian Moen commented on LUCENE-3901: Patch for this coming up shortly. Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235500#comment-13235500 ] Uwe Schindler commented on LUCENE-3867: --- Thanks for the insight. When thinking about the reordering, I am a littel bit afraid about the optimization in the shallow sizeOf(Class?). This optimiaztion does not recurse to superclasses, as it assumes, that all field offsets are greater than those of the superclass, so finding the maximum does not need to recurse up (so it early exits). This is generally true (also in the above printout), but not guaranteed. E.g. JRockit does it partly (it reuses space inside the superclass area to locate the byte from the subclass). In the above example still the order of fields is always Super-Sub-SubSub, but if the ordeing in the JRockit example would be like: {noformat} @ 8 1 Super.superByte @ 9 7 Sub.subByte @16 8 Super.subLong @24 8 Sub.subLong @32 8 SubSub.subSubLong @40 8 SubSub.subSubByte @48sizeOf(SubSub.class instance) {noformat} The only thing the JVM cannot change is field offsets between sub classes (so the field offset of the superclass is inherited), but it could happen that *new* fields are located between super's fields (see above - it's unused space). This would also allow casting and so on (it's unused space in superclass). Unfortunately with that reordering the maximum field offset in the subclass is no longer guaranteed to be greater. I would suggest that we remove the optimization in the shallow class size method. It's too risky in my opinion to underdetermine the size, because the maximum offset in the subclass is the maximum offset in the superclass. I hope my explanation was understandable... :-) Dawid, what do you thing, should we remove the optimization? Patch is easy. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235501#comment-13235501 ] Dawid Weiss commented on LUCENE-3867: - bq. I hope my explanation was understandable... Perfectly well. Yes, I agree, it's possible to fill in the holes packing them with fields from subclasses. It would be a nice vm-level optimization in fact! I'm still experimenting on this code and cleaning/ adding javadocs -- I'll patch this and provide a complete patch once I'm done, ok? RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235502#comment-13235502 ] Uwe Schindler edited comment on LUCENE-3867 at 3/22/12 11:12 AM: - OK. All you have to remove is the if (fieldFound useUnsafe) check and always recurse. fieldFound itsself can also be removed. was (Author: thetaphi): OK. All you have to remove is the if (fieldFound || useUnsafe) check and always recurse. fieldFound itsself can also be removed. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235502#comment-13235502 ] Uwe Schindler commented on LUCENE-3867: --- OK. All you have to remove is the if (fieldFound || useUnsafe) check and always recurse. fieldFound itsself can also be removed. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235505#comment-13235505 ] Uwe Schindler commented on LUCENE-3867: --- JRockit could even compress like this, it would still allow casting as all holes are solely used by one sub-class: {noformat} @ 8 1 Super.superByte @ 9 1 Sub.subByte @10 6 SubSub.subSubByte @16 8 Super.subLong @24 8 Sub.subLong @32 8 SubSub.subSubLong @40sizeOf(SubSub.class instance) {noformat} RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235506#comment-13235506 ] Dawid Weiss commented on LUCENE-3867: - Maybe it does such things already. I didn't check extensively. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-3867: --- We have to remove the shallow size optimization in 3.x and trunk. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3897) KuromojiTokenizer fails with large docs
[ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235517#comment-13235517 ] Christian Moen commented on LUCENE-3897: Committed revision 1303739 on {{trunk}}. Backporting to {{branch_3x}}. KuromojiTokenizer fails with large docs --- Key: LUCENE-3897 URL: https://issues.apache.org/jira/browse/LUCENE-3897 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3897.patch just shoving largeish random docs triggers asserts like: {noformat} [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120 [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404) {noformat} But, you get no seed... I'll commit the test case and @Ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
3.6 branching
Hello, I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). As Hossman noted in his last email: we are doing some JIRA reorganization etc to get things organized. Also related to this: because we intend for this to be the last 3.x release, I want to make sure people have a few more days to get their changes in. New features are fine, of course bugfixes, tests, and docs, but since we are trying to get things in shape I only ask a few extra things at this stage: * please ensure any new classes have at least one sentence as the class javadocs * please ensure any new packages have a package.html with at least a description of what the package is * please ensure any added files have the apache license header thoughts? objections? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.6 branching
I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). +1 This fine with me.
[jira] [Updated] (SOLR-3255) OpenExchangeRates.Org Exchange Rate Provider
[ https://issues.apache.org/jira/browse/SOLR-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3255: -- Attachment: SOLR-3255.patch Here's the provider implementation with tests. See http://wiki.apache.org/solr/CurrencyField for documentation. Highlights: * Uses open, free exchange rates REST API * Plugs into CurrencyField in schema.xml * Can load rates json from any URL or through ResourceLoader * Configurable refresh of rates, enforces max every 60 min (since that's the update rate of the API) This patch also changes the ExchangeRateProvider interface slightly: * Instead of listCurrencies() returning FROM,TO pairs (which would be 25.000 lines for all available pairs for this provider, it takes an argument, so that listCurrencies(false) returns a list of supported currencies, while listCurrencies(true) returns list of pairs Known limitations/questions: * The reflection for the providerClass param uses Class.forName() to instantiate the provider. But then the solr.MyClass alias does not work. How to solve this? * Is the correct location o.a.s.schema for these providers or should we make a new package somewhere else? OpenExchangeRates.Org Exchange Rate Provider Key: SOLR-3255 URL: https://issues.apache.org/jira/browse/SOLR-3255 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: CurrencyField Fix For: 3.6, 4.0 Attachments: SOLR-3255.patch An exchange rate provider for CurrencyField using the freely available feed from http://openexchangerates.org/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.6 branching
+1, makes my life easier On Thu, Mar 22, 2012 at 7:51 AM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). +1 This fine with me. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3897) KuromojiTokenizer fails with large docs
[ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen resolved LUCENE-3897. Resolution: Fixed Thanks a lot, Mike and Robert! KuromojiTokenizer fails with large docs --- Key: LUCENE-3897 URL: https://issues.apache.org/jira/browse/LUCENE-3897 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3897.patch just shoving largeish random docs triggers asserts like: {noformat} [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120 [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404) {noformat} But, you get no seed... I'll commit the test case and @Ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3897) KuromojiTokenizer fails with large docs
[ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235531#comment-13235531 ] Christian Moen commented on LUCENE-3897: Committed revision 1303744 on {{branch_3x}}. KuromojiTokenizer fails with large docs --- Key: LUCENE-3897 URL: https://issues.apache.org/jira/browse/LUCENE-3897 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3897.patch just shoving largeish random docs triggers asserts like: {noformat} [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120 [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404) {noformat} But, you get no seed... I'll commit the test case and @Ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3255) OpenExchangeRates.Org Exchange Rate Provider
[ https://issues.apache.org/jira/browse/SOLR-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3255: -- Attachment: SOLR-3255.patch Slightly improved Noggit JSON parsing loop. Removed a few unnecessary imports. Fixed order of assertEquals() params. OpenExchangeRates.Org Exchange Rate Provider Key: SOLR-3255 URL: https://issues.apache.org/jira/browse/SOLR-3255 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: CurrencyField Fix For: 3.6, 4.0 Attachments: SOLR-3255.patch, SOLR-3255.patch An exchange rate provider for CurrencyField using the freely available feed from http://openexchangerates.org/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3897) KuromojiTokenizer fails with large docs
[ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235540#comment-13235540 ] Robert Muir commented on LUCENE-3897: - Thanks guys! The last of the fallout from LUCENE-3894 I think :) I ran 'ant test -Dtests.nightly=true -Dtests.multiplier=5 -Dtests.iter=10' to simulate 10 nightly builds and (after 2 hours) everything looks ok :) KuromojiTokenizer fails with large docs --- Key: LUCENE-3897 URL: https://issues.apache.org/jira/browse/LUCENE-3897 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3897.patch just shoving largeish random docs triggers asserts like: {noformat} [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120 [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756) [junit] at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403) [junit] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404) {noformat} But, you get no seed... I'll commit the test case and @Ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3887: --- Attachment: LUCENE-3887.patch Another iteration, this time working I think :) 'ant javadocs' should fail if a package is missing a package.html - Key: LUCENE-3887 URL: https://issues.apache.org/jira/browse/LUCENE-3887 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Attachments: LUCENE-3887.patch, LUCENE-3887.patch While reviewing the javadocs I noticed many packages are missing a basic package.html. For 3.x I committed some package.html files where they were missing (I will port forward to trunk). I think all packages should have this... really all public/protected classes/methods/constants, but this would be a good step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235543#comment-13235543 ] Uwe Schindler commented on LUCENE-3887: --- Some unrelated fixes in the patch, otherwise ok for smokeTesting. I would just disagree to add python requirements to our official ant script... 'ant javadocs' should fail if a package is missing a package.html - Key: LUCENE-3887 URL: https://issues.apache.org/jira/browse/LUCENE-3887 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Attachments: LUCENE-3887.patch, LUCENE-3887.patch While reviewing the javadocs I noticed many packages are missing a basic package.html. For 3.x I committed some package.html files where they were missing (I will port forward to trunk). I think all packages should have this... really all public/protected classes/methods/constants, but this would be a good step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235546#comment-13235546 ] Robert Muir commented on LUCENE-3887: - Uwe well we can discuss integration into the official ant build later? For now personally I would like to have an automated check in the smokeTester script, that would help me clean the stuff up rather than manually eyeballing everything. Its a step. 'ant javadocs' should fail if a package is missing a package.html - Key: LUCENE-3887 URL: https://issues.apache.org/jira/browse/LUCENE-3887 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Attachments: LUCENE-3887.patch, LUCENE-3887.patch While reviewing the javadocs I noticed many packages are missing a basic package.html. For 3.x I committed some package.html files where they were missing (I will port forward to trunk). I think all packages should have this... really all public/protected classes/methods/constants, but this would be a good step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235549#comment-13235549 ] Uwe Schindler commented on LUCENE-3887: --- Did I say anything else? 'ant javadocs' should fail if a package is missing a package.html - Key: LUCENE-3887 URL: https://issues.apache.org/jira/browse/LUCENE-3887 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Attachments: LUCENE-3887.patch, LUCENE-3887.patch While reviewing the javadocs I noticed many packages are missing a basic package.html. For 3.x I committed some package.html files where they were missing (I will port forward to trunk). I think all packages should have this... really all public/protected classes/methods/constants, but this would be a good step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3902) public classes with no javadocs
public classes with no javadocs --- Key: LUCENE-3902 URL: https://issues.apache.org/jira/browse/LUCENE-3902 Project: Lucene - Java Issue Type: Improvement Components: general/javadocs Reporter: Robert Muir Here is a list of public classes with no javadocs. I think even some simple javadocs can be valuable for all javadocs classes: * in various summaries, we don't see an empty summary for what the class does * easier to work with the source in various IDEs that present this stuff on hover, etc * better documentation for developers to know what all these classes do. Maybe we don't have time to fix this for 3.x, but it would be great if anybody has good knowledge of these classes and could commit any useful stuff to the javadocs. Here is the list from Mike's tool on LUCENE-3887 {noformat} rmuir@beast:~/workspace/lucene-branch3x2/dev-tools/scripts$ python checkJavaDocs.py ../../lucene/build/docs/api Check... ../../lucene/build/docs/api/all/org/tartarus/snowball/package-summary.html missing: Among missing: TestApp ../../lucene/build/docs/api/all/org/apache/lucene/spatial/tier/package-summary.html missing: DistanceHandler.Precision ../../lucene/build/docs/api/all/org/apache/lucene/index/package-summary.html missing: MergePolicy.MergeAbortedException ../../lucene/build/docs/api/all/org/apache/lucene/index/pruning/package-summary.html missing: CarmelTopKTermPruningPolicy.ByDocComparator missing: CarmelUniformTermPruningPolicy.ByDocComparator ../../lucene/build/docs/api/all/org/apache/lucene/util/package-summary.html missing: ByteBlockPool.Allocator missing: ByteBlockPool.DirectAllocator missing: ByteBlockPool.DirectTrackingAllocator missing: BytesRefHash.BytesStartArray missing: BytesRefHash.DirectBytesStartArray missing: BytesRefIterator.EmptyBytesRefIterator missing: DoubleBarrelLRUCache.CloneableKey missing: English missing: OpenBitSetDISI missing: PagedBytes.Reader missing: StoreClassNameRule missing: SystemPropertiesInvariantRule missing: UncaughtExceptionsRule.UncaughtExceptionEntry missing: UnicodeUtil.UTF16Result missing: UnicodeUtil.UTF8Result ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/core/nodes/package-summary.html missing: TextableQueryNode missing: PathQueryNode.QueryText missing: PhraseSlopQueryNode missing: ProximityQueryNode.ProximityType missing: ModifierQueryNode.Modifier missing: ParametricQueryNode.CompareOperator missing: ProximityQueryNode.Type ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/core/parser/package-summary.html missing: EscapeQuerySyntax.Type ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/standard/builders/package-summary.html missing: AnyQueryNodeBuilder ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/standard/config/package-summary.html missing: FuzzyConfig missing: StandardQueryConfigHandler.ConfigurationKeys missing: DefaultOperatorAttribute.Operator missing: StandardQueryConfigHandler.Operator ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/standard/parser/package-summary.html missing: EscapeQuerySyntaxImpl missing: StandardSyntaxParser ../../lucene/build/docs/api/all/org/apache/lucene/queryParser/surround/query/package-summary.html missing: DistanceSubQuery missing: SimpleTerm.MatchingTermVisitor missing: AndQuery missing: BasicQueryFactory missing: ComposedQuery missing: DistanceQuery missing: FieldsQuery missing: NotQuery missing: OrQuery missing: SimpleTerm missing: SpanNearClauseFactory missing: SrndPrefixQuery missing: SrndQuery missing: SrndTermQuery missing: SrndTruncQuery missing: TooManyBasicQueries ../../lucene/build/docs/api/all/org/apache/lucene/store/package-summary.html missing: FSDirectory.FSIndexOutput missing: NativePosixUtil missing: NIOFSDirectory.NIOFSIndexInput missing: RAMFile missing: SimpleFSDirectory.SimpleFSIndexInput missing: SimpleFSDirectory.SimpleFSIndexInput.Descriptor missing: WindowsDirectory.WindowsIndexInput missing: MockDirectoryWrapper.Throttling ../../lucene/build/docs/api/all/org/apache/lucene/xmlparser/package-summary.html missing: FilterBuilder missing: CorePlusExtensionsParser missing: DOMUtils missing: FilterBuilderFactory missing: QueryBuilderFactory missing: ParserException ../../lucene/build/docs/api/all/org/apache/lucene/xmlparser/builders/package-summary.html missing: SpanQueryBuilder missing: BooleanFilterBuilder missing: BooleanQueryBuilder missing: BoostingQueryBuilder missing: BoostingTermBuilder missing: ConstantScoreQueryBuilder missing: DuplicateFilterBuilder missing: FilteredQueryBuilder missing: FuzzyLikeThisQueryBuilder missing: LikeThisQueryBuilder missing: MatchAllDocsQueryBuilder missing: RangeFilterBuilder missing: SpanBuilderBase missing:
Re: 3.6 branching
+1. Keep it simple -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 22. mars 2012, at 12:48, Robert Muir wrote: Hello, I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). As Hossman noted in his last email: we are doing some JIRA reorganization etc to get things organized. Also related to this: because we intend for this to be the last 3.x release, I want to make sure people have a few more days to get their changes in. New features are fine, of course bugfixes, tests, and docs, but since we are trying to get things in shape I only ask a few extra things at this stage: * please ensure any new classes have at least one sentence as the class javadocs * please ensure any new packages have a package.html with at least a description of what the package is * please ensure any added files have the apache license header thoughts? objections? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235570#comment-13235570 ] Dawid Weiss commented on LUCENE-3867: - I confirmed that this packing indeed takes place. Wrote a pseudo-random test with lots of classes and fields. Here's an offender on J9 for example (Wild_{inheritance-level}_{field-number}): {noformat} @24 4 Wild_0_92.fld_0_0_92 @28 4 Wild_0_92.fld_1_0_92 @32 4 Wild_0_92.fld_2_0_92 @36 4 Wild_0_92.fld_3_0_92 @40 4 Wild_0_92.fld_4_0_92 @44 4 Wild_0_92.fld_5_0_92 @48 4 Wild_0_92.fld_6_0_92 @52 4 Wild_2_5.fld_0_2_5 @56 8 Wild_1_85.fld_0_1_85 @64 8 Wild_1_85.fld_1_1_85 @72sizeOf(Wild_2_5 instance) {noformat} HotSpot and JRockit don't seem to do this (at least it didn't fail on the example). RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235574#comment-13235574 ] Michael McCandless commented on LUCENE-3887: You can also just run the javadoc checker directly in a source checkout, like this: {noformat} python -u dev-tools/scripts/checkJavaDocs.py /lucene/3x/lucene/build {noformat} You have to ant javadocs first yourself. Right now it only checks for missing sentences in the package-summary.html... I'll see if I can fix it to also detect missing package.html's... Here's what it reports on 3.x right now: {noformat} /lucene/3x/lucene/build/docs/api/contrib-highlighter/org/apache/lucene/search/highlight/package-summary.html missing: TokenStreamFromTermPositionVector /lucene/3x/lucene/build/docs/api/contrib-highlighter/org/apache/lucene/search/vectorhighlight/package-summary.html missing: BoundaryScanner missing: BaseFragmentsBuilder missing: FieldFragList.WeightedFragInfo missing: FieldFragList.WeightedFragInfo.SubInfo missing: FieldPhraseList.WeightedPhraseInfo missing: FieldPhraseList.WeightedPhraseInfo.Toffs missing: FieldQuery.QueryPhraseMap missing: FieldTermStack.TermInfo missing: ScoreOrderFragmentsBuilder.ScoreComparator missing: SimpleBoundaryScanner /lucene/3x/lucene/build/docs/api/contrib-spatial/org/apache/lucene/spatial/tier/package-summary.html missing: DistanceHandler.Precision /lucene/3x/lucene/build/docs/api/contrib-spellchecker/org/apache/lucene/search/suggest/package-summary.html missing: Lookup.LookupPriorityQueue /lucene/3x/lucene/build/docs/api/contrib-spellchecker/org/apache/lucene/search/suggest/jaspell/package-summary.html missing: JaspellLookup /lucene/3x/lucene/build/docs/api/contrib-spellchecker/org/apache/lucene/search/suggest/tst/package-summary.html missing: TSTAutocomplete missing: TSTLookup /lucene/3x/lucene/build/docs/api/contrib-pruning/org/apache/lucene/index/pruning/package-summary.html missing: CarmelTopKTermPruningPolicy.ByDocComparator missing: CarmelUniformTermPruningPolicy.ByDocComparator /lucene/3x/lucene/build/docs/api/contrib-facet/org/apache/lucene/facet/taxonomy/writercache/lru/package-summary.html missing: LruTaxonomyWriterCache.LRUType /lucene/3x/lucene/build/docs/api/contrib-facet/org/apache/lucene/facet/index/package-summary.html missing: FacetsPayloadProcessorProvider.FacetsDirPayloadProcessor /lucene/3x/lucene/build/docs/api/core/org/apache/lucene/store/package-summary.html missing: FSDirectory.FSIndexOutput missing: NIOFSDirectory.NIOFSIndexInput missing: RAMFile missing: SimpleFSDirectory.SimpleFSIndexInput missing: SimpleFSDirectory.SimpleFSIndexInput.Descriptor /lucene/3x/lucene/build/docs/api/core/org/apache/lucene/index/package-summary.html missing: MergePolicy.MergeAbortedException /lucene/3x/lucene/build/docs/api/core/org/apache/lucene/search/package-summary.html missing: FieldCache.CreationPlaceholder missing: FieldComparator.NumericComparatorlt;T extends Numbergt; missing: FieldValueHitQueue.Entry missing: QueryTermVector missing: ScoringRewritelt;Q extends Querygt; missing: SpanFilterResult.PositionInfo missing: SpanFilterResult.StartEnd missing: TimeLimitingCollector.TimerThread /lucene/3x/lucene/build/docs/api/core/org/apache/lucene/util/package-summary.html missing: ByteBlockPool.Allocator missing: ByteBlockPool.DirectAllocator missing: ByteBlockPool.DirectTrackingAllocator missing: BytesRefHash.BytesStartArray missing: BytesRefHash.DirectBytesStartArray missing: BytesRefIterator.EmptyBytesRefIterator missing: DoubleBarrelLRUCache.CloneableKey missing: OpenBitSetDISI missing: PagedBytes.Reader missing: UnicodeUtil.UTF16Result missing: UnicodeUtil.UTF8Result /lucene/3x/lucene/build/docs/api/contrib-analyzers/org/tartarus/snowball/package-summary.html missing: Among missing: TestApp /lucene/3x/lucene/build/docs/api/contrib-xml-query-parser/org/apache/lucene/xmlparser/package-summary.html missing: FilterBuilder missing: CorePlusExtensionsParser missing: DOMUtils missing: FilterBuilderFactory missing: QueryBuilderFactory missing: ParserException /lucene/3x/lucene/build/docs/api/contrib-xml-query-parser/org/apache/lucene/xmlparser/builders/package-summary.html missing: SpanQueryBuilder missing: BooleanFilterBuilder missing: BooleanQueryBuilder missing: BoostingQueryBuilder missing: BoostingTermBuilder missing: ConstantScoreQueryBuilder missing: DuplicateFilterBuilder missing: FilteredQueryBuilder missing: FuzzyLikeThisQueryBuilder missing: LikeThisQueryBuilder missing: MatchAllDocsQueryBuilder missing: RangeFilterBuilder missing: SpanBuilderBase missing: SpanFirstBuilder missing: SpanNearBuilder missing: SpanNotBuilder missing: SpanOrBuilder missing: SpanOrTermsBuilder missing: SpanQueryBuilderFactory missing: SpanTermBuilder
[jira] [Created] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235577#comment-13235577 ] Uwe Schindler commented on LUCENE-3867: --- Thanks, in that case shallowSizeOf(Wild_2_5.class) would incorrectly return 56 because of the short-circuit - so let's fix this. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.6 branching
+1, much easier. Tommaso 2012/3/22 Robert Muir rcm...@gmail.com Hello, I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). As Hossman noted in his last email: we are doing some JIRA reorganization etc to get things organized. Also related to this: because we intend for this to be the last 3.x release, I want to make sure people have a few more days to get their changes in. New features are fine, of course bugfixes, tests, and docs, but since we are trying to get things in shape I only ask a few extra things at this stage: * please ensure any new classes have at least one sentence as the class javadocs * please ensure any new packages have a package.html with at least a description of what the package is * please ensure any added files have the apache license header thoughts? objections? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
[ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235580#comment-13235580 ] Martijn van Groningen commented on SOLR-3265: - This is trappy! This should be changed. TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
[ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235581#comment-13235581 ] Robert Muir commented on SOLR-3265: --- {quote} especially as 3.x ant test is taking 50+ minutes {quote} Erick do you have a 386? TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
No, I have an OS X about 3 years old. Sometimes in only feels like a 386 G... On Thu, Mar 22, 2012 at 9:56 AM, Robert Muir (Commented) (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235581#comment-13235581 ] Robert Muir commented on SOLR-3265: --- {quote} especially as 3.x ant test is taking 50+ minutes {quote} Erick do you have a 386? TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235588#comment-13235588 ] Dawid Weiss commented on LUCENE-3867: - Yep, that assumption was wrong -- indeed: {noformat} WildClasses.Wild_2_5 wc = new WildClasses.Wild_2_5(); wc.fld_6_0_92 = 0x1122; wc.fld_0_2_5 = Float.intBitsToFloat(0xa1a2a3a4); wc.fld_0_1_85 = Double.longBitsToDouble(0xb1b2b3b4b5b6b7L); System.out.println(ExpMemoryDumper.dumpObjectMem(wc)); {noformat} results in: {noformat} 0x b0 3d 6f 01 00 00 00 00 0e 80 79 01 00 00 00 00 0x0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030 22 11 00 00 a4 a3 a2 a1 b7 b6 b5 b4 b3 b2 b1 00 0x0040 00 00 00 00 00 00 00 00 {noformat} And you can see they are reordered and longs are aligned. I'll provide a cumulative patch of changes in the evening, there's one more thing I wanted to add (cache of fields) because this affects processing speed. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
:-) If I run the tests on my personal macbook it also takes a very very very long time to complete. This macbook is 5 years old... Luckily I do have another faster machine. On 22 March 2012 15:00, Erick Erickson erickerick...@gmail.com wrote: No, I have an OS X about 3 years old. Sometimes in only feels like a 386 G... On Thu, Mar 22, 2012 at 9:56 AM, Robert Muir (Commented) (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235581#comment-13235581] Robert Muir commented on SOLR-3265: --- {quote} especially as 3.x ant test is taking 50+ minutes {quote} Erick do you have a 386? TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen
[jira] [Commented] (LUCENE-3847) LuceneTestCase should check for modifications on System properties
[ https://issues.apache.org/jira/browse/LUCENE-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235592#comment-13235592 ] Robert Muir commented on LUCENE-3847: - Strangely i trip the timezone issue when running any solr tests from Eclipse... but not lucene tests? E.g. if i run TestDemo from lucene its fine, but if i run TestRussianFilter (org.apache.solr.analysis) then i hit: {noformat} java.lang.AssertionError: System properties invariant violated. Different values: [old]user.timezone= [new]user.timezone=America/New_York at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:46) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) {noformat} LuceneTestCase should check for modifications on System properties -- Key: LUCENE-3847 URL: https://issues.apache.org/jira/browse/LUCENE-3847 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3847.patch - fail the test if changes have been detected. - revert the state of system properties before the suite. - cleanup after the suite. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3883) Analysis for Irish
[ https://issues.apache.org/jira/browse/LUCENE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3883: Attachment: LUCENE-3883.patch Same patch but with the solr pieces too (factory/test for the lowercasefilter, text_ga fieldtype, resources synced, etc). Analysis for Irish -- Key: LUCENE-3883 URL: https://issues.apache.org/jira/browse/LUCENE-3883 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Jim Regan Assignee: Robert Muir Priority: Trivial Labels: analysis, newbie Attachments: LUCENE-3883.patch, LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl Adds analysis for Irish. The stemmer is generated from a snowball stemmer. I've sent it to Martin Porter, who says it will be added during the week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3260) Improve exception handling / logging for ScriptTransformer.init()
[ https://issues.apache.org/jira/browse/SOLR-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235600#comment-13235600 ] Steven Rowe commented on SOLR-3260: --- James, the trunk Maven build is still unhappy: {noformat} Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/433/ 1 tests failed. FAILED: org.apache.solr.handler.dataimport.TestScriptTransformer.testOneparam Error Message: Cannot load Script Engine for language: JavaScript Stack Trace: org.apache.solr.handler.dataimport.DataImportHandlerException: Cannot load Script Engine for language: JavaScript at org.apache.solr.handler.dataimport.ScriptTransformer.initEngine(ScriptTransformer.java:76) {noformat} Improve exception handling / logging for ScriptTransformer.init() - Key: SOLR-3260 URL: https://issues.apache.org/jira/browse/SOLR-3260 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Trivial Fix For: 3.6, 4.0 Attachments: SOLR-3260.patch This came up on the user-list. ScriptTransformer logs the same need a =1.6 jre message for several problems, making debugging difficult for users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3260) Improve exception handling / logging for ScriptTransformer.init()
[ https://issues.apache.org/jira/browse/SOLR-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235609#comment-13235609 ] James Dyer commented on SOLR-3260: -- I missed one. Sorry about that. Should be fixed now. Improve exception handling / logging for ScriptTransformer.init() - Key: SOLR-3260 URL: https://issues.apache.org/jira/browse/SOLR-3260 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Trivial Fix For: 3.6, 4.0 Attachments: SOLR-3260.patch This came up on the user-list. ScriptTransformer logs the same need a =1.6 jre message for several problems, making debugging difficult for users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.6 branching
Robert, I think this is a very good idea. +1. Christian http://atilika.com On Mar 22, 2012, at 8:48 PM, Robert Muir wrote: Hello, I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). As Hossman noted in his last email: we are doing some JIRA reorganization etc to get things organized. Also related to this: because we intend for this to be the last 3.x release, I want to make sure people have a few more days to get their changes in. New features are fine, of course bugfixes, tests, and docs, but since we are trying to get things in shape I only ask a few extra things at this stage: * please ensure any new classes have at least one sentence as the class javadocs * please ensure any new packages have a package.html with at least a description of what the package is * please ensure any added files have the apache license header thoughts? objections? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: 3.6 branching
Yeah, renaming the branch after release is a good idea. We should use the current 3.x branch to work on the release and there should be no 3.7 anymore. Techincally: branching 3.6 and deleting the branch_3x is not different in the sense of SVN to renaming (a rename is atomic copyadd + delete). If we have major bugs in 3.6, we can still release 3.6.1, but this would be simply a new TAG in the 3.6 branch. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, March 22, 2012 12:48 PM To: dev@lucene.apache.org Subject: 3.6 branching Hello, I propose for 3.6 that we don't create a release branch but just use our branch_3x as the release branch. We can 'svn mv' it to 'lucene_solr_3_6' when the release is ready. Normally we would branch and open up branch_3x as 3.7 for changes, but from previous discussions we intend to release 4.0 next (and put 3.x in maintenance mode). As Hossman noted in his last email: we are doing some JIRA reorganization etc to get things organized. Also related to this: because we intend for this to be the last 3.x release, I want to make sure people have a few more days to get their changes in. New features are fine, of course bugfixes, tests, and docs, but since we are trying to get things in shape I only ask a few extra things at this stage: * please ensure any new classes have at least one sentence as the class javadocs * please ensure any new packages have a package.html with at least a description of what the package is * please ensure any added files have the apache license header thoughts? objections? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3903) javadocs very very ugly if you generate with java7
javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235628#comment-13235628 ] Robert Muir commented on LUCENE-3903: - I really think we should fix this for 3.6: its not just that its ugly but it looks actually broken. javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1303828 - in /lucene/dev/trunk: lucene/ modules/analysis/ modules/benchmark/ modules/facet/ modules/grouping/ modules/join/ modules/queries/ modules/queryparser/ modules/suggest/ solr
Oh it's already 2012? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: rm...@apache.org [mailto:rm...@apache.org] Sent: Thursday, March 22, 2012 4:21 PM To: comm...@lucene.apache.org Subject: svn commit: r1303828 - in /lucene/dev/trunk: lucene/ modules/analysis/ modules/benchmark/ modules/facet/ modules/grouping/ modules/join/ modules/queries/ modules/queryparser/ modules/suggest/ solr/ Author: rmuir Date: Thu Mar 22 15:21:17 2012 New Revision: 1303828 URL: http://svn.apache.org/viewvc?rev=1303828view=rev Log: happy new year Modified: lucene/dev/trunk/lucene/NOTICE.txt lucene/dev/trunk/modules/analysis/NOTICE.txt lucene/dev/trunk/modules/benchmark/NOTICE.txt lucene/dev/trunk/modules/facet/NOTICE.txt lucene/dev/trunk/modules/grouping/NOTICE.txt lucene/dev/trunk/modules/join/NOTICE.txt lucene/dev/trunk/modules/queries/NOTICE.txt lucene/dev/trunk/modules/queryparser/NOTICE.txt lucene/dev/trunk/modules/suggest/NOTICE.txt lucene/dev/trunk/solr/NOTICE.txt Modified: lucene/dev/trunk/lucene/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/NOTICE.txt?rev=130382 8r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/lucene/NOTICE.txt (original) +++ lucene/dev/trunk/lucene/NOTICE.txt Thu Mar 22 15:21:17 2012 @@ -1,5 +1,5 @@ Apache Lucene -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/analysis/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/NOTICE.txt?r ev=1303828r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/modules/analysis/NOTICE.txt (original) +++ lucene/dev/trunk/modules/analysis/NOTICE.txt Thu Mar 22 15:21:17 +++ 2012 @@ -1,5 +1,5 @@ Apache Lucene -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/benchmark/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/benchmark/NOTICE.t xt?rev=1303828r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/modules/benchmark/NOTICE.txt (original) +++ lucene/dev/trunk/modules/benchmark/NOTICE.txt Thu Mar 22 15:21:17 +++ 2012 @@ -1,5 +1,5 @@ Apache Lucene Benchmark -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/facet/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/facet/NOTICE.txt?rev =1303828r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/modules/facet/NOTICE.txt (original) +++ lucene/dev/trunk/modules/facet/NOTICE.txt Thu Mar 22 15:21:17 2012 @@ -1,5 +1,5 @@ Apache Lucene Facets -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/grouping/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/grouping/NOTICE.txt? rev=1303828r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/modules/grouping/NOTICE.txt (original) +++ lucene/dev/trunk/modules/grouping/NOTICE.txt Thu Mar 22 15:21:17 +++ 2012 @@ -1,5 +1,5 @@ Apache Lucene Grouping -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/join/NOTICE.txt URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/join/NOTICE.txt?rev= 1303828r1=1303827r2=1303828view=diff == --- lucene/dev/trunk/modules/join/NOTICE.txt (original) +++ lucene/dev/trunk/modules/join/NOTICE.txt Thu Mar 22 15:21:17 2012 @@ -1,5 +1,5 @@ Apache Lucene Join -Copyright 2011 The Apache Software Foundation +Copyright 2012 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/). Modified: lucene/dev/trunk/modules/queries/NOTICE.txt URL:
[jira] [Commented] (LUCENE-3887) 'ant javadocs' should fail if a package is missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235635#comment-13235635 ] Michael McCandless commented on LUCENE-3887: OK I committed the basic checking for smoke tester... I'll leave this open for having ant javadocs fail when things are missing... 'ant javadocs' should fail if a package is missing a package.html - Key: LUCENE-3887 URL: https://issues.apache.org/jira/browse/LUCENE-3887 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Attachments: LUCENE-3887.patch, LUCENE-3887.patch While reviewing the javadocs I noticed many packages are missing a basic package.html. For 3.x I committed some package.html files where they were missing (I will port forward to trunk). I think all packages should have this... really all public/protected classes/methods/constants, but this would be a good step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
[ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235642#comment-13235642 ] Luca Cavanna commented on SOLR-3265: Looks like this has already been fixed on trunk some time ago. Erick, if you haven't started yet working on this I can provide a patch for 3x soon. TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3903: Attachment: java7docs.jpg javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
[ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Cavanna updated SOLR-3265: --- Attachment: SOLR-3265.patch Patch against 3.x branch to solve the TestSolrEntityProcessorEndToEnd port problem. Trunk is already ok. TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-3265.patch When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3778) Create a grouping convenience class
[ https://issues.apache.org/jira/browse/LUCENE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen resolved LUCENE-3778. --- Resolution: Fixed Lucene Fields: (was: New) Committed to trunk. Feature work (distributed grouping, grouped facets etc.) will be done in new issues. Create a grouping convenience class --- Key: LUCENE-3778 URL: https://issues.apache.org/jira/browse/LUCENE-3778 Project: Lucene - Java Issue Type: Improvement Components: modules/grouping Reporter: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-3778.patch, LUCENE-3778.patch, LUCENE-3778.patch, LUCENE-3778.patch Currently the grouping module has many collector classes with a lot of different options per class. I think it would be a good idea to have a GroupUtil (Or another name?) convenience class. I think this could be a builder, because of the many options (sort,sortWithinGroup,groupOffset,groupCount and more) and implementations (term/dv/function) grouping has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3260) Improve exception handling / logging for ScriptTransformer.init()
[ https://issues.apache.org/jira/browse/SOLR-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235688#comment-13235688 ] Steven Rowe commented on SOLR-3260: --- bq. I missed one. Sorry about that. Should be fixed now. Thanks James, I think it's fixed - just now in the console output from the Jenkins Maven trunk job (still running as I write this), I saw: {noformat} Running org.apache.solr.handler.dataimport.TestScriptTransformer NOTE: Assume failed in 'testCheckScript(org.apache.solr.handler.dataimport.TestScriptTransformer)' (ignored): got: org.apache.lucene.util.InternalAssumptionViolatedException: failed assumption: This JVM does not have Rhino installed. Test Skipped., expected: null NOTE: Assume failed in 'testBasic(org.apache.solr.handler.dataimport.TestScriptTransformer)' (ignored): got: org.apache.lucene.util.InternalAssumptionViolatedException: failed assumption: This JVM does not have Rhino installed. Test Skipped., expected: null NOTE: Assume failed in 'testOneparam(org.apache.solr.handler.dataimport.TestScriptTransformer)' (ignored): got: org.apache.lucene.util.InternalAssumptionViolatedException: failed assumption: This JVM does not have Rhino installed. Test Skipped., expected: null Tests run: 4, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.023 sec {noformat} Improve exception handling / logging for ScriptTransformer.init() - Key: SOLR-3260 URL: https://issues.apache.org/jira/browse/SOLR-3260 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Trivial Fix For: 3.6, 4.0 Attachments: SOLR-3260.patch This came up on the user-list. ScriptTransformer logs the same need a =1.6 jre message for several problems, making debugging difficult for users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-3903: - Assignee: Uwe Schindler javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Attachments: LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3903: -- Attachment: LUCENE-3903.patch Patch that fixes the issue(s): - Simply append the pretify.css to the one created by javadocs itsself (as post-javadoc-task concat/) - Fix javascript issues by Java 7: The code that triggered prettyprint was relying on an implementation specific javascript function name no longer existent in Java 7. I changed the window.onload handler to dynamically append the 2nd handler. javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-2382. -- Resolution: Fixed commit to 3.x: r1303792 ( r1303822 - license headers) DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter_standalone.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382_3x.patch, TestCachedSqlEntityProcessor.java-break-where-clause.patch, TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch, TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch, TestThreaded.java.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity
[jira] [Updated] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3901: --- Attachment: LUCENE-3901.patch Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENENET-477) NullReferenceException in ThreadLocal when Lucene.Net compiled for .Net 2.0
[ https://issues.apache.org/jira/browse/LUCENENET-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens resolved LUCENENET-477. --- Resolution: Fixed Fix Version/s: Lucene.Net 3.0.3 Thanks for the patch. It's been applied to trunk for version 3.0.3. NullReferenceException in ThreadLocal when Lucene.Net compiled for .Net 2.0 --- Key: LUCENENET-477 URL: https://issues.apache.org/jira/browse/LUCENENET-477 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4g Environment: .Net 2.0 Reporter: Andrew Sampson Fix For: Lucene.Net 3.0.3 Attachments: CloseableThreadLocal.cs.patch A NullReferenceException occurs in Lucene.Net.Util.ThreadLocal. This class is only included when Lucene is compiled for .Net 2.0. The cause is that the threadstatic slots variable is lazily-initialized, but there is no null-check in the dispose. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (LUCENENET-179) SnowballFilter speed improvment
[ https://issues.apache.org/jira/browse/LUCENENET-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens closed LUCENENET-179. - Resolution: Invalid It's been so long since this patch was submitted (2009), that it's no longer needed. The new version of the SnowballFilter from 3.0.3 only uses reflection in the constructor to create the filter (as does the patch). It's too bad this didn't make it into 2.9.4, where it could have really been used. SnowballFilter speed improvment --- Key: LUCENENET-179 URL: https://issues.apache.org/jira/browse/LUCENENET-179 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2 Reporter: Arian Bär Fix For: Lucene.Net 3.0.3 Attachments: FailOverSnowballFilter.cs I'm using Lucene.Net along with snowball stemming to index text from a database. The class Lucene.Net.Analysis.Snowball.SnowballFilter uses the reflection API and the invoke method to call the stem methods of snowball. I have written a Snowball filter which creates a delegate and uses this delegate to stem the words afterwards. This approach improves the indexing speed of my indexing program by about 10%. I would be happy if you include this code into lucene.net. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3847) LuceneTestCase should check for modifications on System properties
[ https://issues.apache.org/jira/browse/LUCENE-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235710#comment-13235710 ] Dawid Weiss commented on LUCENE-3847: - Well... something is changing it, the question is what it is. I'll take a look. LuceneTestCase should check for modifications on System properties -- Key: LUCENE-3847 URL: https://issues.apache.org/jira/browse/LUCENE-3847 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3847.patch - fail the test if changes have been detected. - revert the state of system properties before the suite. - cleanup after the suite. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENENET-372) NLS pack for Lucene.NET: BR, CJK, CN, CZ, DE, FR, NL, RU analyzers
[ https://issues.apache.org/jira/browse/LUCENENET-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens closed LUCENENET-372. - Resolution: Won't Fix Assignee: (was: Prescott Nasser) We're not doing a separate contrib release for these, and it's already ported into 3.0.3. Closing issue as won't fix. I apologize that this didn't make it into the official release of 2.9.4. I hope this doesn't discourage you from contributing in the future. NLS pack for Lucene.NET: BR, CJK, CN, CZ, DE, FR, NL, RU analyzers -- Key: LUCENENET-372 URL: https://issues.apache.org/jira/browse/LUCENENET-372 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Contrib Reporter: Pasha Bizhan Priority: Minor Labels: Analyzers Attachments: lucene-net-nls.zip Port of java analyzers. Sorry for 1.4 version, it's from 2005 year Update to 2.9.2/2.9.4 compatibility for 2.9.4 release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3901: --- Attachment: LUCENE-3901.patch Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch, LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1303792 [1/2] - in /lucene/dev/branches/branch_3x/solr: contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/ contrib/dataimporthandler/src/java/org/apache/sol
I'm a little worried about doing anything automated (I think it would be bad to stamp a wrong license on something or whatever). That's why that task doesn't touch anything it cannot recognize and reports it. I used ant rat-sources to find these problems though, so detecting them is automated... Ok. D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235735#comment-13235735 ] Robert Muir commented on LUCENE-3903: - +1 Tested on branch_3x with Java5, 6, and 7 (just patch --merge) javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Attachments: LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12842 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12842/ All tests passed Build Log (for compile errors): [...truncated 22095 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235746#comment-13235746 ] Christian Moen commented on LUCENE-3901: Find attached a patch for this. The stemming is done by {{KuromojiKatakanaStemFilter}}, which has been added to {{KuromojiAnalyzer}} and a corresponding {{KuromojiKatakanaStemFilterFactory}} has been added to the {{text_ja}} field type in {{schema.xml}}. Note that this stemming is now turned on by default and I think it makes good sense to do so. The minimum length of a token considered for stemming is configurable and I've made the default of 4 explicit in {{schema.xml}} to convey that it's there. The stemmer only supports full-width katakana and should be used in combination with a {{CJKWidthFilter}} if stemming half-width characters is required and you're doing your wiring. Both {{text_ja}} and {{KuromojiAnalyzer}} takes care of this, and the default overall processing is the same. There are some test cases in {{TestKuromojiKatakanaStemFilter}}, but I've added a case to {{TestKuromojiAnalyzer}} that demonstrates how the stemming works in combination with katakana compound splitting. In Japanese, manager can be written both as マネージャー and マネージャ (and probably also マネジャー), and for the compound シニアプロジェクトマネージャー (senior project manager), we now get tokens シニア (senior) プロジェクト (project) マネージャ (manager), and we've stemmed the last token by removing the trailing ー. Kuromoji also makes the compound シニアプロジェクトマネージャ a synonym to シニア, and ー is also removed for the synonym compound. Tests pass and I've also tested this end-to-end in a Solr trunk build. Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch, LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3847) LuceneTestCase should check for modifications on System properties
[ https://issues.apache.org/jira/browse/LUCENE-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235749#comment-13235749 ] Dawid Weiss commented on LUCENE-3847: - I know what's changing it. Eh. So -- there is a warning being printed: {noformat} Mar 22, 2012 6:20:33 PM org.apache.solr.core.Config parseLuceneVersionString WARNING: You should not use LUCENE_CURRENT as luceneMatchVersion property: if you use this setting, and then Solr upgrades to a newer release of Lucene, sizable changes may happen. If precise back compatibility is important then you should instead explicitly specify an actual Lucene version. Mar 22, 2012 6:20:33 PM org.apache.solr.analysis.BaseTokenStreamFactory warnDeprecated WARNING: RussianLetterTokenizerFactory is deprecated. Use StandardTokenizerFactory instead. {noformat} These warnings go through Java logging and this in turn is localized (date format, warning info, etc.). This in turn asks for the default TimeZone and this in turn sets the system property (I mentioned it a while ago). I suggest that we just ignore user.timezone as it is triggered from multiple locations and doesn't seem that important? LuceneTestCase should check for modifications on System properties -- Key: LUCENE-3847 URL: https://issues.apache.org/jira/browse/LUCENE-3847 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3847.patch - fail the test if changes have been detected. - revert the state of system properties before the suite. - cleanup after the suite. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235753#comment-13235753 ] Robert Muir commented on LUCENE-3901: - patch looks great! Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch, LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3847) LuceneTestCase should check for modifications on System properties
[ https://issues.apache.org/jira/browse/LUCENE-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235755#comment-13235755 ] Robert Muir commented on LUCENE-3847: - {quote} I suggest that we just ignore user.timezone as it is triggered from multiple locations and doesn't seem that important? {quote} +1, we know its a side effect of our testcase itself randomizing the locale/timezone... LuceneTestCase should check for modifications on System properties -- Key: LUCENE-3847 URL: https://issues.apache.org/jira/browse/LUCENE-3847 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3847.patch - fail the test if changes have been detected. - revert the state of system properties before the suite. - cleanup after the suite. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3903: -- Attachment: LUCENE-3903.patch Minor tweaks: - Moved the javascript into bottom, as its then not duplicated multiple times - fixed attributes corrumption in the CDATA. I will commit this later! javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Attachments: LUCENE-3903.patch, LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
java7-style docs for the website for 3.6?
Hello, After Uwe fixes https://issues.apache.org/jira/browse/LUCENE-3903, its possible to build the nice looking java7-style javadocs for lucene. we could pass the java5 bootclasspath so that its all linked up with java5 and not confusing in any way, just looks nicer (less geocities-like). Any opinions? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Proposal - a high performance Key-Value store based on Lucene APIs/concepts
I've been spending quite a bit of time recently benchmarking various Key-Value stores for a demanding project and been largely disappointed with results However, I have developed a promising implementation based on these concepts: http://www.slideshare.net/MarkHarwood/lucene-kvstore The code needs some packaging before I can release it but the slide deck should give a good overview of the design. Is this something that it is likely to be of interest as a contrib module here? I appreciate this is a departure from the regular search focus but it builds on some common ground in Lucene core and may have some applications here. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3903: -- Attachment: LUCENE-3903.patch Robert and me noticed a small issue: Javadoc does not regenerate the stylesheet, if its already there. This leads to appending the same prettyprint.css all the time. I added a delete for this file before running javadocs, so its regenerated. Now its final :-) javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Attachments: LUCENE-3903.patch, LUCENE-3903.patch, LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-3011: - Attachment: SOLR-3011.patch Here is a cleaned-up version of the last patch. - simplified TestThreaded. - Added a logged deprecation warning that threads will be removed in a future release. - ran the DIH tests a few times and everything passed. This I will commit shortly to the 3.x branch. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5 Reporter: Mikhail Khludnev Priority: Minor Fix For: 3.6 Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal - a high performance Key-Value store based on Lucene APIs/concepts
+1 The one potential problem is the use of Trove for primitives On Thu, Mar 22, 2012 at 10:42 AM, mark harwood markharw...@yahoo.co.uk wrote: I've been spending quite a bit of time recently benchmarking various Key-Value stores for a demanding project and been largely disappointed with results However, I have developed a promising implementation based on these concepts: http://www.slideshare.net/MarkHarwood/lucene-kvstore The code needs some packaging before I can release it but the slide deck should give a good overview of the design. Is this something that it is likely to be of interest as a contrib module here? I appreciate this is a departure from the regular search focus but it builds on some common ground in Lucene core and may have some applications here. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: java7-style docs for the website for 3.6?
I created this (java7 using bootclasspath of java5) and uploaded it here so you can see: http://people.apache.org/~rmuir/java7-style-javadocs/ On Thu, Mar 22, 2012 at 1:36 PM, Robert Muir rcm...@gmail.com wrote: Hello, After Uwe fixes https://issues.apache.org/jira/browse/LUCENE-3903, its possible to build the nice looking java7-style javadocs for lucene. we could pass the java5 bootclasspath so that its all linked up with java5 and not confusing in any way, just looks nicer (less geocities-like). Any opinions? -- lucidimagination.com -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3903. --- Resolution: Fixed Fix Version/s: 4.0 3.6 Committed trunk revision: 1303916 Committed 3.x revision: 1303922 javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3903.patch, LUCENE-3903.patch, LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235799#comment-13235799 ] Mikhail Khludnev commented on SOLR-3011: James, I'm glad to hear it. Let me know if you like me to refresh patches at SOLR-2961 and SOLR-2804. They are also blockers for using threads. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5 Reporter: Mikhail Khludnev Priority: Minor Fix For: 3.6 Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3903) javadocs very very ugly if you generate with java7
[ https://issues.apache.org/jira/browse/LUCENE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235812#comment-13235812 ] Robert Muir commented on LUCENE-3903: - Thanks Uwe! javadocs very very ugly if you generate with java7 -- Key: LUCENE-3903 URL: https://issues.apache.org/jira/browse/LUCENE-3903 Project: Lucene - Java Issue Type: Bug Components: general/javadocs Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3903.patch, LUCENE-3903.patch, LUCENE-3903.patch, java7docs.jpg Java7 changes its javadocs to look much nicer, but this involves different CSS styles. Lucene overrides the CSS with stylesheet+prettify.css which is a combination of java5/6 stylesheet + google prettify: but there are problems because java7 has totally different styles. So if you generate javadocs with java7, its like you have no stylesheet at all. A solution might be to make stylesheet7+prettify.css and conditionalize a property in ant based on java version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Attachment: SOLR-2921-trunk.patch SOLR-2921-3x.patch 3x r:1303937 Trunk r: 1303939 Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-trunk.patch SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-2921. -- Resolution: Fixed Fix Version/s: 4.0 3.6 Let's open up any further issues in a new JIRA? Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-trunk.patch SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235824#comment-13235824 ] James Dyer commented on SOLR-3011: -- That would be great if you can. Lucene/Solr 3.6 is going to be the last 3.x release and it is closing for new functionality soon. SOLR-2804 for sure looks like something that should be there. Is SOLR-2961 just for Tika? DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5 Reporter: Mikhail Khludnev Priority: Minor Fix For: 3.6 Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal - a high performance Key-Value store based on Lucene APIs/concepts
Mark, can you share more on what K-V (NoSQL) stores have you've been benchmarking and what have been the results? Did you try all the well known ones? http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis -- J On Thu, Mar 22, 2012 at 10:42 AM, mark harwood markharw...@yahoo.co.ukwrote: I've been spending quite a bit of time recently benchmarking various Key-Value stores for a demanding project and been largely disappointed with results However, I have developed a promising implementation based on these concepts: http://www.slideshare.net/MarkHarwood/lucene-kvstore The code needs some packaging before I can release it but the slide deck should give a good overview of the design. Is this something that it is likely to be of interest as a contrib module here? I appreciate this is a departure from the regular search focus but it builds on some common ground in Lucene core and may have some applications here. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235829#comment-13235829 ] Mikhail Khludnev commented on SOLR-3011: bq. Is SOLR-2961 just for Tika? yep. it seems so. Why do you ask, we don't need to support it further? DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5 Reporter: Mikhail Khludnev Priority: Minor Fix For: 3.6 Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3011. -- Resolution: Fixed Assignee: James Dyer Committed branch_3x (only): r1303949 Thank you Mikhail! I realize this took a lot of patience and unforgiving work on your part. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5 Reporter: Mikhail Khludnev Assignee: James Dyer Priority: Minor Fix For: 3.6 Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2961) DIH with threads and TikaEntityProcessor JDBC ISsue
[ https://issues.apache.org/jira/browse/SOLR-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235842#comment-13235842 ] James Dyer commented on SOLR-2961: -- {quote} Mikhail Khludnev commented on SOLR-3011: bq. Is SOLR-2961 just for Tika? yep. it seems so. Why do you ask, we don't need to support it further? {quote} I don't think we have to support _threads_ with everything. (This is one reason why I want to remove threads on Trunk. Its going to be very difficult to support every use-case.) On the other hand, if you or someone else puts up a good patch in the very near-term I will try to get it into 3.6. DIH with threads and TikaEntityProcessor JDBC ISsue --- Key: SOLR-2961 URL: https://issues.apache.org/jira/browse/SOLR-2961 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.4, 3.5 Environment: Windows Server 2008, Apache Tomcat 6, Oracle 11g, ojdbc 11.2.0.1 Reporter: David Webb Labels: dih, tika Attachments: SOLR-2961.patch, data-config.xml I have a DIH Configuration that works great when I dont specify threads=X in the root entity. As soon as I give a value for threads, I get the following error messages in the stacktrace. Please advise. SEVERE: JdbcDataSource was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! Dec 10, 2011 1:18:33 PM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection java.sql.SQLRecoverableException: IO Error: Socket closed at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:511) at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:3931) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:401) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:392) at org.apache.solr.handler.dataimport.JdbcDataSource.finalize(JdbcDataSource.java:380) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Unknown Source) at java.lang.ref.Finalizer.access$100(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source) Caused by: java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at oracle.net.ns.DataPacket.send(DataPacket.java:199) at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211) at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227) at oracle.net.ns.NetInputStream.read(NetInputStream.java:175) at oracle.net.ns.NetInputStream.read(NetInputStream.java:100) at oracle.net.ns.NetInputStream.read(NetInputStream.java:85) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79) at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122) at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191) at oracle.jdbc.driver.T4C7Ocommoncall.doOLOGOFF(T4C7Ocommoncall.java:61) at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:498) ... 8 more Dec 10, 2011 1:18:34 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow SEVERE: Exception in entity : null org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: f2 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:333) at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:99) at org.apache.solr.handler.dataimport.ThreadedContext.getDataSource(ThreadedContext.java:66) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:101) at org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:446) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399) at
Re: Proposal - a high performance Key-Value store based on Lucene APIs/concepts
Mark, can you share more on what K-V (NoSQL) stores have you've been benchmarking and what have been the results? Mongo, Cassandra, Krati, Bdb a Java version of BitCask, Lucene, MySQL I was interested in benchmarking the single-server stores rather than a distributed setup because your choice of store could be plugged into the likes of Voldemort for scale out. The design is similar to the Bitcask paper but keeps only hashes of keys in ram not the full key. My implementation was the only store that didn't degrade noticeably as you get into 10s of millions of keys in the store. Did you try all the well known ones? http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis -- J On Thu, Mar 22, 2012 at 10:42 AM, mark harwood markharw...@yahoo.co.uk wrote: I've been spending quite a bit of time recently benchmarking various Key-Value stores for a demanding project and been largely disappointed with results However, I have developed a promising implementation based on these concepts: http://www.slideshare.net/MarkHarwood/lucene-kvstore The code needs some packaging before I can release it but the slide deck should give a good overview of the design. Is this something that it is likely to be of interest as a contrib module here? I appreciate this is a departure from the regular search focus but it builds on some common ground in Lucene core and may have some applications here. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal - a high performance Key-Value store based on Lucene APIs/concepts
On Thu, Mar 22, 2012 at 7:29 PM, Mark Harwood markharw...@yahoo.co.ukwrote: Mark, can you share more on what K-V (NoSQL) stores have you've been benchmarking and what have been the results? Mongo, Cassandra, Krati, Bdb a Java version of BitCask, Lucene, MySQL I was interested in benchmarking the single-server stores rather than a distributed setup because your choice of store could be plugged into the likes of Voldemort for scale out. The design is similar to the Bitcask paper but keeps only hashes of keys in ram not the full key. My implementation was the only store that didn't degrade noticeably as you get into 10s of millions of keys in the store. Random question: Do you basically end up with something very similar to LevelDB that many people where talking about a few weeks ago ?
[jira] [Assigned] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen reassigned LUCENE-3901: -- Assignee: Christian Moen Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch, LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3901) Add katakana stem filter to better deal with certain katakana spelling variants
[ https://issues.apache.org/jira/browse/LUCENE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235877#comment-13235877 ] Christian Moen commented on LUCENE-3901: Thanks a lot, Robert. I'll do some more testing and hopefully I can commit this to {{trunk}} and {{branch_3x}} tomorrow. Add katakana stem filter to better deal with certain katakana spelling variants --- Key: LUCENE-3901 URL: https://issues.apache.org/jira/browse/LUCENE-3901 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3901.patch, LUCENE-3901.patch Many Japanese katakana words end in a long sound that is sometimes optional. For example, パーティー and パーティ are both perfectly valid for party. Similarly we have センター and センタ that are variants of center as well as サーバー and サーバ for server. I'm proposing that we add a katakana stemmer that removes this long sound if the terms are longer than a configurable length. It's also possible to add the variant as a synonym, but I think stemming is preferred from a ranking point of view. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
trunk javadoc failures?
I think there must be something wonky with the javadoc classpath (or whatever it's called in javadoc) on trunk when using the java 6 javadoc. I'm seeing solr/contrib/uima complain a lot about packages/files not existing when using ant javadoc (either at the top level or just in solr). is anyone else seeing this?... [javadoc] Constructing Javadoc information... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProvider; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: cannot find symbol [javadoc] symbol : class AEProvider [javadoc] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javadoc] private AEProvider aeProvider; [javadoc] ^ [javadoc] Standard Doclet version 1.6.0_24 [javadoc] Building tree for all the packages and classes... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/uima/processor/exception//FieldMappingException.html... [javadoc] Copying file /home/hossman/lucene/dev/solr/core/src/java/doc-files/tutorial.html to directory /home/hossman/lucene/dev/solr/build/docs/api/doc-files... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/util//package-summary.html... [javadoc] Copying file /home/hossman/lucene/dev/solr/core/src/java/org/apache/solr/util/doc-files/min-should-match.html to directory /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/util/doc-files... [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/serialized-form.html... [javadoc] Copying file /home/hossman/lucene/dev/lucene/tools/prettify/stylesheet+prettify.css to file /home/hossman/lucene/dev/solr/build/docs/api/stylesheet+prettify.css... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Building index for all the packages and classes... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found:
RE: trunk javadoc failures?
This is as far as I remember a bug in the build scripts. Building Javadocs from inside a contrib seems to be broken... - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, March 22, 2012 8:09 PM To: Lucene Dev Subject: trunk javadoc failures? I think there must be something wonky with the javadoc classpath (or whatever it's called in javadoc) on trunk when using the java 6 javadoc. I'm seeing solr/contrib/uima complain a lot about packages/files not existing when using ant javadoc (either at the top level or just in solr). is anyone else seeing this?... [javadoc] Constructing Javadoc information... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMAAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javadoc] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ processor/UIMAUpdateRequestProcessor.java:26: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProvider; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ processor/UIMAUpdateRequestProcessor.java:27: package org.apache.lucene.analysis.uima.ae does not exist [javadoc] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javadoc] ^ [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ processor/UIMAUpdateRequestProcessor.java:51: cannot find symbol [javadoc] symbol : class AEProvider [javadoc] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javadoc] private AEProvider aeProvider; [javadoc] ^ [javadoc] Standard Doclet version 1.6.0_24 [javadoc] Building tree for all the packages and classes... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/uima/process or/exception//FieldMappingException.html... [javadoc] Copying file /home/hossman/lucene/dev/solr/core/src/java/doc-files/tutorial.html to directory /home/hossman/lucene/dev/solr/build/docs/api/doc-files... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/util//package -summary.html... [javadoc] Copying file /home/hossman/lucene/dev/solr/core/src/java/org/apache/solr/util/doc- files/min-should-match.html to directory /home/hossman/lucene/dev/solr/build/docs/api/org/apache/solr/util/doc- files... [javadoc] Generating /home/hossman/lucene/dev/solr/build/docs/api/serialized-form.html... [javadoc] Copying file /home/hossman/lucene/dev/lucene/tools/prettify/stylesheet+prettify.css to file /home/hossman/lucene/dev/solr/build/docs/api/stylesheet+prettify.css... [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMAAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found: UIMATypeAwareAnnotationsTokenizer [javadoc] /home/hossman/lucene/dev/solr/contrib/uima/src/java/org/apache/solr/uima/ analysis/UIMAAnnotationsTokenizerFactory.java:30: warning - Tag @link: reference not found:
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12843 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12843/ All tests passed Build Log (for compile errors): [...truncated 22106 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org