[jira] [Created] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
Shawn Heisey created SOLR-4762: -- Summary: Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 When a customer tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-4762: --- Description: When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. was: When a customer tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at
[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-4762: --- Description: When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. was: When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at
[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641490#comment-13641490 ] Uwe Schindler commented on SOLR-4762: - The problem is that Solr uses a newer version of commons-lang.jar than the one which is already in classpath of weblogic. The mentioned setting changes web-logic to use class commons-lang from web-inf with preference. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4358) SolrJ, by preventing multi-part post, loses key information about file name that Tika needs
[ https://issues.apache.org/jira/browse/SOLR-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641504#comment-13641504 ] Karl Wright commented on SOLR-4358: --- Has this ticket fix been pulled up into the latest RC for Solr 4.3? SolrJ, by preventing multi-part post, loses key information about file name that Tika needs --- Key: SOLR-4358 URL: https://issues.apache.org/jira/browse/SOLR-4358 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.0 Reporter: Karl Wright Assignee: Ryan McKinley Attachments: additional_changes.diff, SOLR-4358.patch, SOLR-4358.patch, SOLR-4358.patch SolrJ accepts a ContentStream, which has a name field. Within HttpSolrServer.java, if SolrJ makes the decision to use multipart posts, this filename is transmitted as part of the form boundary information. However, if SolrJ chooses not to use multipart post, the filename information is lost. This information is used by SolrCell (Tika) to make decisions about content extraction, so it is very important that it makes it into Solr in one way or another. Either SolrJ should set appropriate equivalent headers to send the filename automatically, or it should force multipart posts when this information is present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing
this looks pretty serious! any chance we can get this index? On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641567#comment-13641567 ] Adrien Grand commented on LUCENE-4955: -- Given that offsets can't go backwards and that tokens in the same position must have the same start offset, I think that the only way to get NGramTokenFilter out of TestRandomChains' exclusion list (LUCENE-4641) is to fix position increments (this issue), change the order tokens are emitted in (LUCENE-3920) and stop modifying offsets? I know some people rely on the current behavior but I think it's more important to get this filter out of TestRandomChains' exclusions since it causes highlighting bugs and makes the term vectors files unnecessary larger. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4735) Improve Solr metrics reporting
[ https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-4735: Attachment: SOLR-4735.patch New patch, moving everything to a single registry per-core, and adding a graphite reporter in contrib/. JMX naming still isn't working right, and it needs some tests, but I think this is a decent way forward. More eyes welcome... Improve Solr metrics reporting -- Key: SOLR-4735 URL: https://issues.apache.org/jira/browse/SOLR-4735 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-4735.patch, SOLR-4735.patch Following on from a discussion on the mailing list: http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+ It would be good to make Solr play more nicely with existing devops monitoring systems, such as Graphite or Ganglia. Stats monitoring at the moment is poll-only, either via JMX or through the admin stats page. I'd like to refactor things a bit to make this more pluggable. This patch is a start. It adds a new interface, InstrumentedBean, which extends SolrInfoMBean to return a [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a couple of MetricReporters (which basically just duplicate the JMX and admin page reporting that's there at the moment, but which should be more extensible). The patch includes a change to RequestHandlerBase showing how this could work. The idea would be to eventually replace the getStatistics() call on SolrInfoMBean with this instead. The next step would be to allow more MetricReporters to be defined in solrconfig.xml. The Metrics library comes with ganglia and graphite reporting modules, and we can add contrib plugins for both of those. There's some more general cleanup that could be done around SolrInfoMBean (we've got two plugin handlers at /mbeans and /plugins that basically do the same thing, and the beans themselves have some weirdly inconsistent data on them - getVersion() returns different things for different impls, and getSource() seems pretty useless), but maybe that's for another issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4763) Performance issue when using group.facet=true
Alexander Koval created SOLR-4763: - Summary: Performance issue when using group.facet=true Key: SOLR-4763 URL: https://issues.apache.org/jira/browse/SOLR-4763 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Koval I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think without {{group.facet}} Solr uses cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4763) Performance issue when using group.facet=true
[ https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Koval updated SOLR-4763: -- Description: I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? was: I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think without {{group.facet}} Solr uses cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? Performance issue when using group.facet=true - Key: SOLR-4763 URL: https://issues.apache.org/jira/browse/SOLR-4763 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Koval I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4758) Zk bootstrapping does not work with the new solr.xml format and core discovery by directory structure.
[ https://issues.apache.org/jira/browse/SOLR-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641660#comment-13641660 ] Erick Erickson commented on SOLR-4758: -- I wrestled with this and deferred it until later. There are several code paths in several places that are of the form 'if (corecontainer==null){} else {}'. As near as I can tell this is ONLY ever an issue in the test harness. I detest having code that is only necessary for the tests scattered about the real code, but haven't had any time to try to fix the test harness, which is what I _think_ the real solution is. FWIW Zk bootstrapping does not work with the new solr.xml format and core discovery by directory structure. -- Key: SOLR-4758 URL: https://issues.apache.org/jira/browse/SOLR-4758 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing
OK I pulled it down ... it looks like this: -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe -rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs -rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx -rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm -rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc -rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos -rw-r--r-- 1 501 mike 1548 Apr 25 06:50 _0_Lucene41_0.tim -rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip -rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe -rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si -rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd -rw-r--r-- 1 501 mike 1490 Apr 25 06:50 _0.tvf -rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx -rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe -rw-r--r-- 1 501 mike 4296 Apr 25 06:50 _1_dv.cfs -rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt -rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx -rw-r--r-- 1 501 mike 1225 Apr 25 06:50 _1.fnm -rw-r--r-- 1 501 mike 2739 Apr 25 06:50 _1_Lucene41_0.doc -rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos -rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim -rw-r--r-- 1 501 mike 3902 Apr 25 06:50 _1_Lucene41_0.tip -rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe -rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si -rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd -rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf -rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe -rw-r--r-- 1 501 mike 7648 Apr 25 06:50 _2_dv.cfs -rw-r--r-- 1 501 mike 79656 Apr 25 06:50 _2.fdt -rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx -rw-r--r-- 1 501 mike 2550 Apr 25 06:50 _2.fnm -rw-r--r-- 1 501 mike 5478 Apr 25 06:50 _2_Lucene41_0.doc -rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay -rw-r--r-- 1 501 mike 16885 Apr 25 06:50 _2_Lucene41_0.pos -rw-r--r-- 1 501 mike 94246 Apr 25 06:50 _2_Lucene41_0.tim -rw-r--r-- 1 501 mike 2225 Apr 25 06:50 _2_Lucene41_0.tip -rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe -rw-r--r-- 1 501 mike 2193 Apr 25 06:50 _2_nrm.cfs -rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si -rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd -rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf -rw-r--r-- 1 501 mike 1937 Apr 25 06:50 _2.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdt -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvd -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvf -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 segments_1 -rw-r--r-- 1 501 mike 0 Apr 25 06:50 write.lock Which looks to be exactly the case in LUCENE-4738, where the crash happened during the first commit. In this case we make (intentionally) make no effort to be smart about this and happily declare the index is corrupt... So the good news is this test now discovers the issue (it did not before) ... but we need to fix this test to make an exception for the first commit ... I'll do that. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer simon.willna...@gmail.comwrote: this looks pretty serious! any chance we can get this index? On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at
Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing
mike, how can you pull this index? Do we have this on the wiki where to go etc? On Thu, Apr 25, 2013 at 1:23 PM, Michael McCandless luc...@mikemccandless.com wrote: OK I pulled it down ... it looks like this: -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe -rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs -rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx -rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm -rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc -rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos -rw-r--r-- 1 501 mike 1548 Apr 25 06:50 _0_Lucene41_0.tim -rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip -rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe -rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si -rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd -rw-r--r-- 1 501 mike 1490 Apr 25 06:50 _0.tvf -rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx -rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe -rw-r--r-- 1 501 mike 4296 Apr 25 06:50 _1_dv.cfs -rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt -rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx -rw-r--r-- 1 501 mike 1225 Apr 25 06:50 _1.fnm -rw-r--r-- 1 501 mike 2739 Apr 25 06:50 _1_Lucene41_0.doc -rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos -rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim -rw-r--r-- 1 501 mike 3902 Apr 25 06:50 _1_Lucene41_0.tip -rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe -rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si -rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd -rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf -rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe -rw-r--r-- 1 501 mike 7648 Apr 25 06:50 _2_dv.cfs -rw-r--r-- 1 501 mike 79656 Apr 25 06:50 _2.fdt -rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx -rw-r--r-- 1 501 mike 2550 Apr 25 06:50 _2.fnm -rw-r--r-- 1 501 mike 5478 Apr 25 06:50 _2_Lucene41_0.doc -rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay -rw-r--r-- 1 501 mike 16885 Apr 25 06:50 _2_Lucene41_0.pos -rw-r--r-- 1 501 mike 94246 Apr 25 06:50 _2_Lucene41_0.tim -rw-r--r-- 1 501 mike 2225 Apr 25 06:50 _2_Lucene41_0.tip -rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe -rw-r--r-- 1 501 mike 2193 Apr 25 06:50 _2_nrm.cfs -rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si -rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd -rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf -rw-r--r-- 1 501 mike 1937 Apr 25 06:50 _2.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdt -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvd -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvf -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 segments_1 -rw-r--r-- 1 501 mike 0 Apr 25 06:50 write.lock Which looks to be exactly the case in LUCENE-4738, where the crash happened during the first commit. In this case we make (intentionally) make no effort to be smart about this and happily declare the index is corrupt... So the good news is this test now discovers the issue (it did not before) ... but we need to fix this test to make an exception for the first commit ... I'll do that. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer simon.willna...@gmail.com wrote: this looks pretty serious! any chance we can get this index? On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing
I just log into lucene.zones.apache.org and go to the directory (the full path is in the Jenkins failure) Mike McCandless http://blog.mikemccandless.com On Thu, Apr 25, 2013 at 7:30 AM, Simon Willnauer simon.willna...@gmail.comwrote: mike, how can you pull this index? Do we have this on the wiki where to go etc? On Thu, Apr 25, 2013 at 1:23 PM, Michael McCandless luc...@mikemccandless.com wrote: OK I pulled it down ... it looks like this: -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe -rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs -rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx -rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm -rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc -rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos -rw-r--r-- 1 501 mike 1548 Apr 25 06:50 _0_Lucene41_0.tim -rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip -rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe -rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si -rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd -rw-r--r-- 1 501 mike 1490 Apr 25 06:50 _0.tvf -rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx -rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe -rw-r--r-- 1 501 mike 4296 Apr 25 06:50 _1_dv.cfs -rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt -rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx -rw-r--r-- 1 501 mike 1225 Apr 25 06:50 _1.fnm -rw-r--r-- 1 501 mike 2739 Apr 25 06:50 _1_Lucene41_0.doc -rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos -rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim -rw-r--r-- 1 501 mike 3902 Apr 25 06:50 _1_Lucene41_0.tip -rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe -rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs -rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si -rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd -rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf -rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe -rw-r--r-- 1 501 mike 7648 Apr 25 06:50 _2_dv.cfs -rw-r--r-- 1 501 mike 79656 Apr 25 06:50 _2.fdt -rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx -rw-r--r-- 1 501 mike 2550 Apr 25 06:50 _2.fnm -rw-r--r-- 1 501 mike 5478 Apr 25 06:50 _2_Lucene41_0.doc -rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay -rw-r--r-- 1 501 mike 16885 Apr 25 06:50 _2_Lucene41_0.pos -rw-r--r-- 1 501 mike 94246 Apr 25 06:50 _2_Lucene41_0.tim -rw-r--r-- 1 501 mike 2225 Apr 25 06:50 _2_Lucene41_0.tip -rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe -rw-r--r-- 1 501 mike 2193 Apr 25 06:50 _2_nrm.cfs -rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si -rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd -rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf -rw-r--r-- 1 501 mike 1937 Apr 25 06:50 _2.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdt -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.fdx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvd -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvf -rw-r--r-- 1 501 mike 0 Apr 25 06:50 _3.tvx -rw-r--r-- 1 501 mike 0 Apr 25 06:50 segments_1 -rw-r--r-- 1 501 mike 0 Apr 25 06:50 write.lock Which looks to be exactly the case in LUCENE-4738, where the crash happened during the first commit. In this case we make (intentionally) make no effort to be smart about this and happily declare the index is corrupt... So the good news is this test now discovers the issue (it did not before) ... but we need to fix this test to make an exception for the first commit ... I'll do that. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer simon.willna...@gmail.com wrote: this looks pretty serious! any chance we can get this index? On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147) at
Re: [VOTE] Lucene Solr 4.3.0 RC3
+1 smoke tester happy here. Shai On Thu, Apr 25, 2013 at 12:18 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ +1 to releasing the artifacts with the following SHA1 signatures as Lucene/Solr 4.3.0... 3e1ec78f7b5bad2723dcf2f963d933758046afb9 *lucene-4.3.0-src.tgz 26843d53c86a9937d700f13f1d686adaca718244 *lucene-4.3.0.tgz 72b526a5aa21c7499954978a74e14ceac3a607ea *lucene-4.3.0.zip 9fd7abc7e478dbc5474658460da58ec360d6b1e4 *solr-4.3.0-src.tgz 5dca6da9f30830dc20163623b0a4f63749777f24 *solr-4.3.0.tgz ba6c86209614e3fe8cddeb3193bb8a09299ea457 *solr-4.3.0.zip FWIW: During my testing I did encounter one new bug: SOLR-4754, but since it has a workarround (and i have no idea yet what the underlying problem is to even try for a quick fix) I don't think it should block the release. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities
[ https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641692#comment-13641692 ] Roman commented on SOLR-2356: - Why this issue is marked as minor? Data import can bee speeded up to 5-10 times on most machines. It seems pretty important. indexing using DataImportHandler does not use entire CPU capacities --- Key: SOLR-2356 URL: https://issues.apache.org/jira/browse/SOLR-2356 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA Environment: intel xeon processor (4 cores), Debian Linux Lenny, OpenJDK 64bits server v1.6.0 Reporter: colby Priority: Minor Labels: test Original Estimate: 168h Remaining Estimate: 168h When I use a DataImportHandler to index a large number of documents (~35M), cpu usage doesn't go over than 100% cpu (i.e. just one core). When I configure 4 threads for the entity tag, the cpu usage is splitted to 25% per core but never use 400% of cpu (i.e 100% of the 4 cores) I use solr embedded with jetty server. Is there a way to tune this feature in order to use all cores and improve indexing performances ? Because for the moment, an extra script (PHP) gives better indexing performances than DIH. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641704#comment-13641704 ] Robert Muir commented on LUCENE-4955: - +1 Adrien. these analysis components should either be fixed or removed. We can speed up the process now by changing IndexWriter to reject this kinda bogus shit. We shouldnt be putting broken data into e.g. term vectors. That should encourage the fixing process. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4955: Attachment: highlighter-test.patch bq. We can speed up the process now by changing IndexWriter to reject this kinda bogus shit. We shouldnt be putting broken data into e.g. term vectors. That should encourage the fixing process. +1 I updated the highlighter test and added analysis-common as a test dependency such that this can be run with ant. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641706#comment-13641706 ] Adrien Grand commented on LUCENE-4955: -- +1 I'll work on fixing NGramTokenizer and NGramTokenFilter. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641709#comment-13641709 ] Robert Muir commented on LUCENE-4955: - I don't think we should add analysis-common as a test dependency to the highlighter. I worked pretty hard to clean all this up with e.g. mocktokenizer so we didnt have dependency hell. It also keeps our tests clean. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0
Robert Muir created LUCENE-4957: --- Summary: Stop IndexWriter from writing broken term vector offset data in 5.0 Key: LUCENE-4957 URL: https://issues.apache.org/jira/browse/LUCENE-4957 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Today we allow this in (some analyzers are broken), and only reject them if someone is indexing offsets into the postings lists. But we should ban this also when term vectors are enabled. Its time to stop writing this broken data and let broken analyzers be broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641710#comment-13641710 ] Simon Willnauer commented on LUCENE-4955: - robert I agree, I added this as sep. patch to make sure that whatever we commit here we can at least test that the ngram filter doesn't throw an IOOB anymore. I just wanted to make it easier to run the test. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, highlighter-test.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0
[ https://issues.apache.org/jira/browse/LUCENE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641723#comment-13641723 ] Uwe Schindler commented on LUCENE-4957: --- +1 Stop IndexWriter from writing broken term vector offset data in 5.0 --- Key: LUCENE-4957 URL: https://issues.apache.org/jira/browse/LUCENE-4957 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Today we allow this in (some analyzers are broken), and only reject them if someone is indexing offsets into the postings lists. But we should ban this also when term vectors are enabled. Its time to stop writing this broken data and let broken analyzers be broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641739#comment-13641739 ] Zack Zullick commented on LUCENE-2899: -- Some information for those wanting to try this after fighting it for a day: the latest patch posted, LUCENE-2899-RJN.patch for 4.1 does not have Em's OpenNLPFilter.java and OpenNLPTokenizer.java fixed applied. So after applying the patch, make sure to replace those classes with Em's version or the bug that causes the NLP system to only be utilized on the first request will still be present. I was also able to successfully apply this patch to 4.2.1 with minor modification (mostly to the build/ivy xml files). Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.3 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899-RJN.patch, OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
SOLR-2894 - am I the only one who gets this behaviour?
Hi, I tested the latest patch for SOLR-2894 a couple of weeks ago, and while it worked fine for string fields I got no output if one of the facet.pivot fields is a date field. (SOLR-2894 is about implementing distributed pivot faceting) https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13627641page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13627641 Am I the only one who gets this behavior? If so, I'll look into my test environment again. I'm more than happy to test any new patch to this issue, as I have a test environment set up which runs multiple scenarios with pivot faceting and date fields in a SolrCloud with two machines :-) I have looked through the code changes in the latest patch as well, but since I do not know the Solr code base I didn't see anything obvious. But I can help with testing if anyone wants any testing done. Best, Stein J. Gran
[jira] [Updated] (SOLR-4759) Cleanup Velocity Templates
[ https://issues.apache.org/jira/browse/SOLR-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Bennett updated SOLR-4759: --- Attachment: velocity-SOLR-4759.zip Because the patch includes both file renaming and content changes the patch command gives errors. Per Erik H, this is a binary version of the changes. (includes .svn dirs). Meant to be extracted from solr/example/solr/collection1/conf/ Cleanup Velocity Templates -- Key: SOLR-4759 URL: https://issues.apache.org/jira/browse/SOLR-4759 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Mark Bennett Attachments: SOLR-4759.patch, velocity-SOLR-4759.zip Cleanup to Velocity templates shipped under solr/example/solr/collection1/conf/velocity * Add README.txt file with complete file list * Add comments to all files * Add indenting where feasible, fixed indenting in other places. I don't believe I've broken anything that required precise indenting. * Make file naming consistent. We had this_that, thisThat and this-that Changed all to this_that, though also considered this-that. * Modularize some files * Included a hit_plain.vm example, though not active by default. * Rewrote city/lon/lat selector to work from a hash, though doesn't change the behavior. * CSS changes, primarily to make top tabs actually look like Tabs (primitive CSS, but at least conveys the idea) As far as I know this doesn't change any behavior of the system, nor does it fix any existing bugs. Although I might do bug fixing in a later patch, I wanted to keep this as a pure code readability patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Fwd: Contributing the Korean Analyzer
Forwarding to the dev list: Begin forwarded message: From: 이수명 smlee0...@gmail.com Subject: Re: Contributing the Korean Analyzer Date: April 24, 2013 10:00:18 PM EDT To: Steve Rowe sar...@gmail.com Hello Steve, Yes, I'm the only author of the code. It took 2 years for me to finish the korean analyzer and dictionaries in the first development. I posted the source code and the binary in 2008 on the online community of internet portal(http://cafe.naver.com/korlucene) for korean people. I got recieved bug reports through the online community and got upgraded it for over 4 years. I also posted the source code on the sourceforge in 2009 as you already see it. I finished to create a jira issue (LUCENE-4956) and attach the file that I contribute. If you uncompress the file, you can find two source code directories (src and morph). The morph directory is including the dictionaries and the korean morphological analyzer. Best regards. Soomyung Lee 2013/4/25 Steve Rowe sar...@gmail.com Hi Soomyung, I agree with Christian, this sounds fantastic! First, we need to know a couple things: 1. Are you the only author of the code? We need to get agreement from all contributors. (When I browse CVS on the SourceForge site, the only author I see is smlee0818, which I assume is you.) 2. Do you need permission from your employer to make this donation? If so, we'll need your employer to submit a Corporate CLA (Contributor License Agreement)[1] before we can accept the donation. To get started, the first step is creating a Lucene JIRA issue here: https://issues.apache.org/jira/browse/LUCENE - you'll need to create an ASF JIRA account first if you don't already have one: click the Log In link at the top right of the page, then click the Sign up link where it says Not a member? Sign up for an account. Once you've created a JIRA issue, you should make a compressed tarball of everything you want to contribute - as far as I can tell, this is everything in the lucenekorean sourceforge project in CVS under modules kr.dictionary, kr.analysis.4x, and kr.morph - and then attach it to the JIRA issue, with the MD5 hash for the tarball in the comment that you provide when you attach the tarball to the issue. Once you've created the JIRA issue and attached your contribution, we can make progress on further steps that need to be taken: you should submit an individual CLA[2] and a code grant[3], and I (in my role as Lucene PMC chair) will be managing the IP clearance process[4][5]. See http://wiki.apache.org/lucene-java/HowToContribute for more information about contributing. I look forward to working with you on this - thank you for contributing! Steve [1] http://www.apache.org/licenses/cla-corporate.txt [1] http://www.apache.org/licenses/icla.txt [2] http://www.apache.org/licenses/software-grant.txt [3] http://incubator.apache.org/ip-clearance/index.html [4] http://incubator.apache.org/ip-clearance/ip-clearance-template.html On Apr 24, 2013, at 7:00 AM, Christian Moen c...@atilika.com wrote: Hello Soomyung, Thanks a lot for this. This is very good news. Let's await the PMC Chair's suggestion on next steps. See LUCENE-3305 to get an idea how the process was for Japanese. If the process goes well, I'm happy to see how I can set aside some time after Lucene Revolution to work on integrating this. Best regards, Christian Moen アティリカ株式会社 http://www.atilika.com On Apr 24, 2013, at 7:40 PM, 이수명 smlee0...@gmail.com wrote: Hello Christian. Thanks for your reply. I'm happy to hear about a code grant process. To make the dictionaries, I collected words itself and word features from books and internet. And I organized all of the information that I collected to make the korean morphological analyzer. Therefore the dictionaries is that I made. I think It is enough to attach a file(License Notice) that describe on where the dictionaries originate from and the kind of licensing (Apache License 2.0). If it is not enough, please leave me a message and give me some guide. thanks. Soomyung Lee 2013/4/24 Christian Moen c...@atilika.com Hello SooMyung, Thanks a lot! It will be great to get Korean supported out-of-the-box in Lucene/Solr. In terms of process, I'll leave this to Steve Rowe, PMC Chair, to comment on, but a code grant process sounds likely. I'm seeing that the code itself has an Apache License 2.0, but could you elaborate on where the dictionaries originate from and what kind of licensing terms that are applicable? Many thanks, Christian Moen On Apr 24, 2013, at 2:05 PM, smlee0...@gmail.com wrote: Hello, I've developed the Korean Analyzer and distributed it since 2008. Many people who use lucene with korean use it. I posted it to the sourceforge
[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641876#comment-13641876 ] Mark Miller commented on SOLR-4761: --- +1, patch looks good! bq. it won't actually kick in until after the first reopen. I think we may want to just ditch our lazy creation of the indexwriter and create it upfront. I don't think it saves too much to not create it. add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true
[ https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641883#comment-13641883 ] Otis Gospodnetic commented on SOLR-4763: If you don't know if it's a bug or not it's best to bring it up on the mailing list first, so devs don't have to manage invalid JIRA issues and so you can get a better discussion (and help) going. I'm not sure if this a big or not. Maybe [~martijn.v.groningen] will know. Performance issue when using group.facet=true - Key: SOLR-4763 URL: https://issues.apache.org/jira/browse/SOLR-4763 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Koval I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4764) When using NRT, just init the reader from IndexWriter
Robert Muir created SOLR-4764: - Summary: When using NRT, just init the reader from IndexWriter Key: SOLR-4764 URL: https://issues.apache.org/jira/browse/SOLR-4764 Project: Solr Issue Type: Improvement Reporter: Robert Muir Spinoff from SOLR-4761 Solr first opens a DirectoryReader from the directory, then later will pass this to IW openIfChanged. I noticed this when i was confused that mergedsegmentwarmer doesn't appear to work at first until after you've reopened... I'm not totally sure what the current behavior causes (does IW's pool reuse segments from this passed-in external reader, or is this causing some horrible doubling-up/inefficient stuff etc?). To some extent i think we should change it even if its actually performant: I think its confusing. I think ideally we'd change IndexReaderFactory's method to take writer instead of directory so that custom DirectoryReaders can still work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true
[ https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641922#comment-13641922 ] Alexander Koval commented on SOLR-4763: --- I'm sorry for that. I found 2 discussions in the mailing list: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-td3995245.html http://lucene.472066.n3.nabble.com/group-facet-true-performances-td4021639.html The solution has not been found. I think with this issue it is not possible to use {{group.facet=true}} option in production. Performance issue when using group.facet=true - Key: SOLR-4763 URL: https://issues.apache.org/jira/browse/SOLR-4763 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Koval I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4655) The Overseer should assign node names by default.
[ https://issues.apache.org/jira/browse/SOLR-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4655: -- Attachment: SOLR-4655.patch To trunk. The Overseer should assign node names by default. - Key: SOLR-4655 URL: https://issues.apache.org/jira/browse/SOLR-4655 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.3, 5.0 Attachments: SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch Currently we make a unique node name by using the host address as part of the name. This means that if you want a node with a new address to take over, the node name is misleading. It's best if you set custom names for each node before starting your cluster. This is cumbersome though, and cannot currently be done with the collections API. Instead, the overseer could assign a more generic name such as nodeN by default. Then you can easily swap in another node with no pre planning and no confusion in the name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities
[ https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641944#comment-13641944 ] Shawn Heisey commented on SOLR-2356: Roman, patches are welcome. If you know how to fix it, get the source code and go for it, then upload the patch. The issue is more than two years old, so if it were an easy fix, the people that really know DIH would have fixed it already. You can use the SolrJ library to write a multi-threaded application to import data. If the design is solid, it could ultimately become the basis for a new DIH. It used to be possible to configure multiple threads in the DIH config, but that was removed in 4.x because it was unstable. Also, it didn't really help, as the issue reporter found. It will probably take a complete redesign to fix this issue, and DIH is a contrib module, not part of the main Solr code. That is why this is marked minor. indexing using DataImportHandler does not use entire CPU capacities --- Key: SOLR-2356 URL: https://issues.apache.org/jira/browse/SOLR-2356 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA Environment: intel xeon processor (4 cores), Debian Linux Lenny, OpenJDK 64bits server v1.6.0 Reporter: colby Priority: Minor Labels: test Original Estimate: 168h Remaining Estimate: 168h When I use a DataImportHandler to index a large number of documents (~35M), cpu usage doesn't go over than 100% cpu (i.e. just one core). When I configure 4 threads for the entity tag, the cpu usage is splitted to 25% per core but never use 400% of cpu (i.e 100% of the 4 cores) I use solr embedded with jetty server. Is there a way to tune this feature in order to use all cores and improve indexing performances ? Because for the moment, an extra script (PHP) gives better indexing performances than DIH. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641959#comment-13641959 ] Shawn Heisey commented on SOLR-4762: [~thetaphi] that is the conclusion I came to as well. I guess the question is whether preferring application classes will cause unintended side effects. That solution worked for some people, though none of the accounts that I came across were using Solr. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641960#comment-13641960 ] soomyung commented on LUCENE-4956: -- Thanks for your help and your great concern , Christian! I visited your website. I noticed that you are not a Japanese and you developed a Japanese Morphological Analyzer. How could it be possible? I'm surprising at your work. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-4762: --- Attachment: SOLR-4762.patch Patch that might fix the issue. I will run tests and wait for feedback before committing. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 Attachments: SOLR-4762.patch When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene Solr 4.3.0 RC3
+1 On Tue, Apr 23, 2013 at 5:20 PM, Simon Willnauer simon.willna...@gmail.comwrote: Here is a new RC candidate... http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ here is my +1 thanks for voting... simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Regards, Shalin Shekhar Mangar.
Re: [VOTE] Lucene Solr 4.3.0 RC3
+1 - Mark On Apr 23, 2013, at 7:50 AM, Simon Willnauer simon.willna...@gmail.com wrote: Here is a new RC candidate... http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ here is my +1 thanks for voting... simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641968#comment-13641968 ] Lance Norskog commented on LUCENE-2899: --- Maciej- This is a good point. This package needs changes in a lot of places and it might be easier to package it the way you say. Zack- The churn in the APIs is a major problem in the Lucene code management. The original patch worked in the 4.x branch and trunk when it was posted. What Em fixed is in an area which is very very basic to Lucene. The API changed with no notice and no change in versions or method names. Everyone- It's great that this has gained some interest. Please create a new master patch with whatever changes are needed for the current code base. Lucene grand masters- Please don't say hey kids, write plugins, they're cool! and then make subtle incompatible changes in APIs. Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.3 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899-RJN.patch, OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.
Mark Miller created SOLR-4765: - Summary: The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. Key: SOLR-4765 URL: https://issues.apache.org/jira/browse/SOLR-4765 Project: Solr Issue Type: Bug Components: Tests Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4765: -- Issue Type: Test (was: Bug) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. --- Key: SOLR-4765 URL: https://issues.apache.org/jira/browse/SOLR-4765 Project: Solr Issue Type: Test Components: Tests Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 35158 - Failure!
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/35158/ 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestFilteredSearch Error Message: 6 threads leaked from SUITE scope at org.apache.lucene.search.TestFilteredSearch: 1) Thread[id=317, name=LuceneTestCase-39-thread-3, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)2) Thread[id=315, name=LuceneTestCase-39-thread-1, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)3) Thread[id=318, name=LuceneTestCase-39-thread-4, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)4) Thread[id=320, name=LuceneTestCase-39-thread-6, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)5) Thread[id=316, name=LuceneTestCase-39-thread-2, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)6) Thread[id=319, name=LuceneTestCase-39-thread-5, state=WAITING, group=TGRP-TestFilteredSearch] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 6 threads leaked from SUITE scope at
[jira] [Commented] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642001#comment-13642001 ] Commit Tag Bot commented on SOLR-4765: -- [trunk commit] markrmiller http://svn.apache.org/viewvc?view=revisionrevision=1475869 SOLR-4765: The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. --- Key: SOLR-4765 URL: https://issues.apache.org/jira/browse/SOLR-4765 Project: Solr Issue Type: Test Components: Tests Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4958) unnecessary assert on docid
John Wang created LUCENE-4958: - Summary: unnecessary assert on docid Key: LUCENE-4958 URL: https://issues.apache.org/jira/browse/LUCENE-4958 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: John Wang In DocFieldProcessor, on line 353, there is this assert: assert docValuesConsumerAndDocID.docID docState.docID; Is this assert necessary? I don't see in the indexing pipeline that this guarantee is needed. Can we remove this? We have implemented a custom indexingchain that rewrites docState.docID is reverse order and it is working well. But we have to do ugly workarounds in our test to avoid this assert. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642023#comment-13642023 ] Commit Tag Bot commented on SOLR-4765: -- [branch_4x commit] markrmiller http://svn.apache.org/viewvc?view=revisionrevision=1475879 SOLR-4765: The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. --- Key: SOLR-4765 URL: https://issues.apache.org/jira/browse/SOLR-4765 Project: Solr Issue Type: Test Components: Tests Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4959) Incorrect return value from SimpleNaiveBayesClassifier.assignClass
Alexey Kutin created LUCENE-4959: Summary: Incorrect return value from SimpleNaiveBayesClassifier.assignClass Key: LUCENE-4959 URL: https://issues.apache.org/jira/browse/LUCENE-4959 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.2.1, 5.0 Reporter: Alexey Kutin The local copy of BytesRef referenced by foundClass is affected by subsequent TermsEnum.iterator.next() calls as the shared BytesRef.bytes changes. If a term test gives a good match and a next term in the terms collection is classification with a lower match score then the return result will be clas -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4960) Require minimum ivy version
Shawn Heisey created LUCENE-4960: Summary: Require minimum ivy version Key: LUCENE-4960 URL: https://issues.apache.org/jira/browse/LUCENE-4960 Project: Lucene - Core Issue Type: Bug Components: general/build Affects Versions: 4.2.1 Reporter: Shawn Heisey Priority: Minor Fix For: 5.0, 4.4 Someone on solr-user ran into a problem while trying to run 'ant idea' so they could work on Solr in their IDE. [~steve_rowe] indicated that this is probably due to IVY-1194, requiring an ivy jar upgrade. The build system should check for a minimum ivy version, just like it does with ant. The absolute minimum we require appears to be 2.2.0, but do we want to make it 2.3.0 due to IVY-1388? I'm not sure how to go about checking the ivy version. Checking the ant version is easy because it's ant itself that does the checking. There might be other component versions that should be checked too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642056#comment-13642056 ] Michael McCandless commented on SOLR-4761: -- +1, I like SimpleMergedSegmentWarmer. Maybe we should put that class in lucene core? It seems generically useful and most users won't know the APIs to enum fields / touch the data structures... add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4764) When using NRT, just init the reader from IndexWriter
[ https://issues.apache.org/jira/browse/SOLR-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642072#comment-13642072 ] Michael McCandless commented on SOLR-4764: -- +1, this is very costly, because the first NRT open will open an entirely new set of SegmentReaders (not sharing anything from the non-NRT reader passed in to openIfChanged). When using NRT, just init the reader from IndexWriter - Key: SOLR-4764 URL: https://issues.apache.org/jira/browse/SOLR-4764 Project: Solr Issue Type: Improvement Reporter: Robert Muir Spinoff from SOLR-4761 Solr first opens a DirectoryReader from the directory, then later will pass this to IW openIfChanged. I noticed this when i was confused that mergedsegmentwarmer doesn't appear to work at first until after you've reopened... I'm not totally sure what the current behavior causes (does IW's pool reuse segments from this passed-in external reader, or is this causing some horrible doubling-up/inefficient stuff etc?). To some extent i think we should change it even if its actually performant: I think its confusing. I think ideally we'd change IndexReaderFactory's method to take writer instead of directory so that custom DirectoryReaders can still work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index
[ https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642087#comment-13642087 ] Commit Tag Bot commented on LUCENE-4738: [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1475905 LUCENE-4738: only CheckIndex when the last commit is segments_1 Killed JVM when first commit was running will generate a corrupted index Key: LUCENE-4738 URL: https://issues.apache.org/jira/browse/LUCENE-4738 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64 Java: java version 1.7.0_05 Lucene: lucene-core-4.0.0 Reporter: Billow Gao Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738_test.patch 1. Start a NEW IndexWriterBuilder on an empty folder, add some documents to the index 2. Call commit 3. When the segments_1 file with 0 byte was created, kill the JVM We will end with a corrupted index with an empty segments_1. We only have issue with the first commit crash. Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like: === org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) === So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file. If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit. And a concurrent IndexSearcher can access to the index(No match is better than exception). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index
[ https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642088#comment-13642088 ] Commit Tag Bot commented on LUCENE-4738: [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1475906 LUCENE-4738: only CheckIndex when the last commit is segments_1 Killed JVM when first commit was running will generate a corrupted index Key: LUCENE-4738 URL: https://issues.apache.org/jira/browse/LUCENE-4738 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64 Java: java version 1.7.0_05 Lucene: lucene-core-4.0.0 Reporter: Billow Gao Assignee: Michael McCandless Fix For: 5.0, 4.3 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738_test.patch 1. Start a NEW IndexWriterBuilder on an empty folder, add some documents to the index 2. Call commit 3. When the segments_1 file with 0 byte was created, kill the JVM We will end with a corrupted index with an empty segments_1. We only have issue with the first commit crash. Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like: === org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) === So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file. If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit. And a concurrent IndexSearcher can access to the index(No match is better than exception). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
[ https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642093#comment-13642093 ] Shawn Heisey commented on SOLR-4762: Tests and precommit pass. I'm hoping someone can tell me whether this actually works for affected weblogic versions. Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach --- Key: SOLR-4762 URL: https://issues.apache.org/jira/browse/SOLR-4762 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 5.0, 4.4 Attachments: SOLR-4762.patch When a user tried to deploy on weblogic 10.3, they got this exception: {noformat} Error 500--Internal Server Error java.lang.NoSuchMethodError: replaceEach at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:821) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) {noformat} The solution to this problem appears to be adding the following to weblogic.xml in WEB-INF: {noformat} container-descriptor prefer-web-inf-classestrue/prefer-web-inf-classes /container-descriptor {noformat} Since Solr's WEB-INF directory already contains this file and it already has the container-descriptor tag, I'm hoping this is a benign change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4958) unnecessary assert on docid
[ https://issues.apache.org/jira/browse/LUCENE-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642117#comment-13642117 ] Michael McCandless commented on LUCENE-4958: I think this assert is gone in 4.2? unnecessary assert on docid --- Key: LUCENE-4958 URL: https://issues.apache.org/jira/browse/LUCENE-4958 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: John Wang In DocFieldProcessor, on line 353, there is this assert: assert docValuesConsumerAndDocID.docID docState.docID; Is this assert necessary? I don't see in the indexing pipeline that this guarantee is needed. Can we remove this? We have implemented a custom indexingchain that rewrites docState.docID is reverse order and it is working well. But we have to do ugly workarounds in our test to avoid this assert. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-4761: -- Attachment: SOLR-4761.patch updated patch. I also put this guy in the test rotation. add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch, SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642124#comment-13642124 ] Michael McCandless commented on SOLR-4761: -- +1, looks great! Thanks Rob. add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch, SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting
[ https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642139#comment-13642139 ] Ryan McKinley commented on SOLR-4735: - This looks like it creates a new registry for every core (am I reading that wrong?) If so, I think sharing one registry would be best. Can the registry be in the CoreContainer rather then the core? I guess that would involve some cleanup when a core is unloaded, but it would let us share a single registry across cores and other apps (the case I am actually concerned with) Improve Solr metrics reporting -- Key: SOLR-4735 URL: https://issues.apache.org/jira/browse/SOLR-4735 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-4735.patch, SOLR-4735.patch Following on from a discussion on the mailing list: http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+ It would be good to make Solr play more nicely with existing devops monitoring systems, such as Graphite or Ganglia. Stats monitoring at the moment is poll-only, either via JMX or through the admin stats page. I'd like to refactor things a bit to make this more pluggable. This patch is a start. It adds a new interface, InstrumentedBean, which extends SolrInfoMBean to return a [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a couple of MetricReporters (which basically just duplicate the JMX and admin page reporting that's there at the moment, but which should be more extensible). The patch includes a change to RequestHandlerBase showing how this could work. The idea would be to eventually replace the getStatistics() call on SolrInfoMBean with this instead. The next step would be to allow more MetricReporters to be defined in solrconfig.xml. The Metrics library comes with ganglia and graphite reporting modules, and we can add contrib plugins for both of those. There's some more general cleanup that could be done around SolrInfoMBean (we've got two plugin handlers at /mbeans and /plugins that basically do the same thing, and the beans themselves have some weirdly inconsistent data on them - getVersion() returns different things for different impls, and getSource() seems pretty useless), but maybe that's for another issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting
[ https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642148#comment-13642148 ] Ryan McKinley commented on SOLR-4735: - ideally CoreContainer could have a function like: {code:java} MetricsRegistry createMetricsRegistry( ?? config ) { return new MetricsRegistry(); } {code} This would let other applications slip in their own registry -- that already has reporting hooked up! Improve Solr metrics reporting -- Key: SOLR-4735 URL: https://issues.apache.org/jira/browse/SOLR-4735 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-4735.patch, SOLR-4735.patch Following on from a discussion on the mailing list: http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+ It would be good to make Solr play more nicely with existing devops monitoring systems, such as Graphite or Ganglia. Stats monitoring at the moment is poll-only, either via JMX or through the admin stats page. I'd like to refactor things a bit to make this more pluggable. This patch is a start. It adds a new interface, InstrumentedBean, which extends SolrInfoMBean to return a [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a couple of MetricReporters (which basically just duplicate the JMX and admin page reporting that's there at the moment, but which should be more extensible). The patch includes a change to RequestHandlerBase showing how this could work. The idea would be to eventually replace the getStatistics() call on SolrInfoMBean with this instead. The next step would be to allow more MetricReporters to be defined in solrconfig.xml. The Metrics library comes with ganglia and graphite reporting modules, and we can add contrib plugins for both of those. There's some more general cleanup that could be done around SolrInfoMBean (we've got two plugin handlers at /mbeans and /plugins that basically do the same thing, and the beans themselves have some weirdly inconsistent data on them - getVersion() returns different things for different impls, and getSource() seems pretty useless), but maybe that's for another issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642161#comment-13642161 ] Steve Rowe commented on LUCENE-4947: bq. Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secret...@apache.org. I monitor commits to the ICLA and code grants record files, and neither the ICLA nor the code grant document has been recorded yet. I'll post on this issue once the code grant has been recorded. [~klawson88], did you send the code grant to legal-arch...@apache.org in addition to sending it to secret...@apache.org? This is mentioned as a requirement in step 3 of the process section in [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4959) Incorrect return value from SimpleNaiveBayesClassifier.assignClass
[ https://issues.apache.org/jira/browse/LUCENE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand reassigned LUCENE-4959: Assignee: Adrien Grand Incorrect return value from SimpleNaiveBayesClassifier.assignClass --- Key: LUCENE-4959 URL: https://issues.apache.org/jira/browse/LUCENE-4959 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.0, 4.2.1 Reporter: Alexey Kutin Assignee: Adrien Grand Labels: classification The local copy of BytesRef referenced by foundClass is affected by subsequent TermsEnum.iterator.next() calls as the shared BytesRef.bytes changes. If a term test gives a good match and a next term in the terms collection is classification with a lower match score then the return result will be clas -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0
[ https://issues.apache.org/jira/browse/LUCENE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642201#comment-13642201 ] Adrien Grand commented on LUCENE-4957: -- +1 Stop IndexWriter from writing broken term vector offset data in 5.0 --- Key: LUCENE-4957 URL: https://issues.apache.org/jira/browse/LUCENE-4957 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Today we allow this in (some analyzers are broken), and only reject them if someone is indexing offsets into the postings lists. But we should ban this also when term vectors are enabled. Its time to stop writing this broken data and let broken analyzers be broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642218#comment-13642218 ] Dawid Weiss commented on LUCENE-4956: - That's because Christian has ninja superpowers. http://goo.gl/5EPMr the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities
[ https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642219#comment-13642219 ] Jan Høydahl commented on SOLR-2356: --- In my opinion, DIH should be completely redesigned as a standalone webapp. It is a major design flaw that it is a RequestHandler within a Solr Core/collection. As a standalone web app it could easily be deplyed on its own, talk to multiple collections and be parallellized. indexing using DataImportHandler does not use entire CPU capacities --- Key: SOLR-2356 URL: https://issues.apache.org/jira/browse/SOLR-2356 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA Environment: intel xeon processor (4 cores), Debian Linux Lenny, OpenJDK 64bits server v1.6.0 Reporter: colby Priority: Minor Labels: test Original Estimate: 168h Remaining Estimate: 168h When I use a DataImportHandler to index a large number of documents (~35M), cpu usage doesn't go over than 100% cpu (i.e. just one core). When I configure 4 threads for the entity tag, the cpu usage is splitted to 25% per core but never use 400% of cpu (i.e 100% of the 4 cores) I use solr embedded with jetty server. Is there a way to tune this feature in order to use all cores and improve indexing performances ? Because for the moment, an extra script (PHP) gives better indexing performances than DIH. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene Solr 4.3.0 RC3
-1 It seems SLF4j packaging is busted? I thought I remembered slf4j jars were removed from the war, in favor of putting them in the classpath. But I see slf4j jars in the maven war file, but not in the tgz war file. On Thu, Apr 25, 2013 at 10:19 AM, Mark Miller markrmil...@gmail.com wrote: +1 - Mark On Apr 23, 2013, at 7:50 AM, Simon Willnauer simon.willna...@gmail.com wrote: Here is a new RC candidate... http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ here is my +1 thanks for voting... simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4953) readerClosedListener is not invoked for ParallelCompositeReader's leaves
[ https://issues.apache.org/jira/browse/LUCENE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4953: -- Attachment: LUCENE-4953.patch Patch that adds the DONT_TOUCH_SUBREADERS mode. I will now check the tests by enforcing the always wrapping with PCR, so bugs can be detected. readerClosedListener is not invoked for ParallelCompositeReader's leaves Key: LUCENE-4953 URL: https://issues.apache.org/jira/browse/LUCENE-4953 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 5.0, 4.4 Attachments: LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch There was a test failure last night: {noformat} 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testBasic Error Message: testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): Insane FieldCache usage(s) found expected:0 but was:2 Stack Trace: java.lang.AssertionError: testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): Insane FieldCache usage(s) found expected:0 but was:2 at __randomizedtesting.SeedInfo.seed([1F9C2A2AD23A8E02:B466373F0DE6082C]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:592) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:55) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) Build Log: [...truncated 6904 lines...] [junit4:junit4] Suite: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest [junit4:junit4] 2 *** BEGIN
[jira] [Commented] (LUCENE-4953) readerClosedListener is not invoked for ParallelCompositeReader's leaves
[ https://issues.apache.org/jira/browse/LUCENE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642302#comment-13642302 ] Uwe Schindler commented on LUCENE-4953: --- I checked the other tests by hardcoding maybeWrapReader to always wrap with ParallelCompositeReader at the end. No other failures. I will commit this tomorrow. readerClosedListener is not invoked for ParallelCompositeReader's leaves Key: LUCENE-4953 URL: https://issues.apache.org/jira/browse/LUCENE-4953 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 5.0, 4.4 Attachments: LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch There was a test failure last night: {noformat} 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testBasic Error Message: testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): Insane FieldCache usage(s) found expected:0 but was:2 Stack Trace: java.lang.AssertionError: testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): Insane FieldCache usage(s) found expected:0 but was:2 at __randomizedtesting.SeedInfo.seed([1F9C2A2AD23A8E02:B466373F0DE6082C]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:592) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:55) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) Build Log: [...truncated 6904 lines...] [junit4:junit4] Suite: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest [junit4:junit4] 2 *** BEGIN
[jira] [Updated] (LUCENE-4955) NGramTokenFilter increments positions for each gram
[ https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4955: - Attachment: LUCENE-4955.patch I tried to iterate on Simon's patch: * NGramTokenFilter doesn't modify offsets and emits all n-grams of a single term at the same position * NGramTokenizer uses a sliding window. * NGramTokenizer and NGramTokenFilter removed from TestRandomChains exclusions. It was very hard to add the compatibility version support to NGramTokenizer so there are now two distinct classes and the factory picks the right one depending on the Lucene match version. Simon's highlighting test now fails because the highlighted content is different, but not because of a broken token stream. NGramTokenFilter increments positions for each gram --- Key: LUCENE-4955 URL: https://issues.apache.org/jira/browse/LUCENE-4955 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.3 Reporter: Simon Willnauer Fix For: 5.0, 4.4 Attachments: highlighter-test.patch, highlighter-test.patch, LUCENE-4955.patch, LUCENE-4955.patch NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches. I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.
[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated LUCENE-949: --- Attachment: LUCENE-949.patch Refactored a bit and added a few more tests. AnalyzingQueryParser can't work with leading wildcards. --- Key: LUCENE-949 URL: https://issues.apache.org/jira/browse/LUCENE-949 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 2.2 Reporter: Stefan Klein Attachments: LUCENE-949.patch, LUCENE-949.patch The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following changes to accept leading wildcards: protected Query getWildcardQuery(String field, String termStr) throws ParseException { String useTermStr = termStr; String leadingWildcard = null; if (*.equals(field)) { if (*.equals(useTermStr)) return new MatchAllDocsQuery(); } boolean hasLeadingWildcard = (useTermStr.startsWith(*) || useTermStr.startsWith(?)) ? true : false; if (!getAllowLeadingWildcard() hasLeadingWildcard) throw new ParseException('*' or '?' not allowed as first character in WildcardQuery); if (getLowercaseExpandedTerms()) { useTermStr = useTermStr.toLowerCase(); } if (hasLeadingWildcard) { leadingWildcard = useTermStr.substring(0, 1); useTermStr = useTermStr.substring(1); } List tlist = new ArrayList(); List wlist = new ArrayList(); /* * somewhat a hack: find/store wildcard chars in order to put them back * after analyzing */ boolean isWithinToken = (!useTermStr.startsWith(?) !useTermStr.startsWith(*)); isWithinToken = true; StringBuffer tmpBuffer = new StringBuffer(); char[] chars = useTermStr.toCharArray(); for (int i = 0; i useTermStr.length(); i++) { if (chars[i] == '?' || chars[i] == '*') { if (isWithinToken) { tlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = false; } else { if (!isWithinToken) { wlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = true; } tmpBuffer.append(chars[i]); } if (isWithinToken) { tlist.add(tmpBuffer.toString()); } else { wlist.add(tmpBuffer.toString()); } // get Analyzer from superclass and tokenize the term TokenStream source = getAnalyzer().tokenStream(field, new StringReader(useTermStr)); org.apache.lucene.analysis.Token t; int countTokens = 0; while (true) { try { t = source.next(); } catch (IOException e) { t = null; } if (t == null) { break; } if (!.equals(t.termText())) { try { tlist.set(countTokens++, t.termText()); } catch (IndexOutOfBoundsException ioobe) { countTokens = -1; } } } try { source.close(); } catch (IOException e) { // ignore } if (countTokens != tlist.size()) {
[jira] [Updated] (SOLR-4705) HttpShardHandler null point exception
[ https://issues.apache.org/jira/browse/SOLR-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4705: --- Attachment: SOLR-4705.patch Thanks for the patch Raintung! I've updated it to include test cases of all the various shards param possibilities to future proof us against similar bugs down the line. i'm still running a bunch of iterations to verify the test itself isn't flawed, and then i'll commit HttpShardHandler null point exception - Key: SOLR-4705 URL: https://issues.apache.org/jira/browse/SOLR-4705 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.2, 4.2.1 Reporter: Raintung Li Priority: Minor Attachments: patch-4705.txt, SOLR-4705.patch Call search URL; select?q=testshards=ip/solr/ checkDistributed method throw null pointer exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4705) HttpShardHandler null point exception
[ https://issues.apache.org/jira/browse/SOLR-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4705: --- Fix Version/s: 4.4 Assignee: Hoss Man HttpShardHandler null point exception - Key: SOLR-4705 URL: https://issues.apache.org/jira/browse/SOLR-4705 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.2, 4.2.1 Reporter: Raintung Li Assignee: Hoss Man Priority: Minor Fix For: 4.4 Attachments: patch-4705.txt, SOLR-4705.patch Call search URL; select?q=testshards=ip/solr/ checkDistributed method throw null pointer exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642519#comment-13642519 ] Commit Tag Bot commented on SOLR-4761: -- [trunk commit] rmuir http://svn.apache.org/viewvc?view=revisionrevision=1476026 SOLR-4761: add option to plug in mergedSegmentWarmer add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch, SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642527#comment-13642527 ] Commit Tag Bot commented on SOLR-4761: -- [branch_4x commit] rmuir http://svn.apache.org/viewvc?view=revisionrevision=1476030 SOLR-4761: add option to plug in mergedSegmentWarmer add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4761.patch, SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4761) add option to plug in mergedsegmentwarmer
[ https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-4761. --- Resolution: Fixed Fix Version/s: 4.4 5.0 add option to plug in mergedsegmentwarmer - Key: SOLR-4761 URL: https://issues.apache.org/jira/browse/SOLR-4761 Project: Solr Issue Type: New Feature Reporter: Robert Muir Fix For: 5.0, 4.4 Attachments: SOLR-4761.patch, SOLR-4761.patch This is pretty expert, but can be useful in some cases. We can also provide a simple minimalist implementation that just ensures datastructures are primed so the first queries aren't e.g. causing norms to be read from disk etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4766) smoketester to check war files have the same contents
Robert Muir created SOLR-4766: - Summary: smoketester to check war files have the same contents Key: SOLR-4766 URL: https://issues.apache.org/jira/browse/SOLR-4766 Project: Solr Issue Type: Test Components: Build Affects Versions: 4.3 Reporter: Robert Muir Fix For: 4.3 As Ryan points out on [VOTE] Lucene Solr 4.3.0 RC3 thread, somehow the .war file in the binary packaging has different contents than the maven one (in particular, one contains logging jars, the other does not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4766) smoketester to check war files have the same contents
[ https://issues.apache.org/jira/browse/SOLR-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642540#comment-13642540 ] Robert Muir commented on SOLR-4766: --- My initial idea is where we check jars and wars in checkIdenticalMavenArtifacts, to compare their zip TOCs and ensure they have the same sets of files. smoketester to check war files have the same contents - Key: SOLR-4766 URL: https://issues.apache.org/jira/browse/SOLR-4766 Project: Solr Issue Type: Test Components: Build Affects Versions: 4.3 Reporter: Robert Muir Fix For: 4.3 As Ryan points out on [VOTE] Lucene Solr 4.3.0 RC3 thread, somehow the .war file in the binary packaging has different contents than the maven one (in particular, one contains logging jars, the other does not). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities
[ https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642576#comment-13642576 ] Shalin Shekhar Mangar commented on SOLR-2356: - bq. In my opinion, DIH should be completely redesigned as a standalone webapp. It is a major design flaw that it is a RequestHandler within a Solr Core/collection. Actually, DIH started as a standalone webapp inside AOL. We changed it because we didn't want to duplicate the schema in two places and also because we wanted to have it available by default in Solr installations. Another web app means you need to procure hardware, plan capacity/failover, create firewall holes etc bq. As a standalone web app it could easily be deplyed on its own, talk to multiple collections and be parallellized. Talking to multiple collections was never a goal for DIH -- I'm not sure what value it will bring. The multi-threading support in DIH can use a lot of improvement for sure. indexing using DataImportHandler does not use entire CPU capacities --- Key: SOLR-2356 URL: https://issues.apache.org/jira/browse/SOLR-2356 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0-ALPHA Environment: intel xeon processor (4 cores), Debian Linux Lenny, OpenJDK 64bits server v1.6.0 Reporter: colby Priority: Minor Labels: test Original Estimate: 168h Remaining Estimate: 168h When I use a DataImportHandler to index a large number of documents (~35M), cpu usage doesn't go over than 100% cpu (i.e. just one core). When I configure 4 threads for the entity tag, the cpu usage is splitted to 25% per core but never use 400% of cpu (i.e 100% of the 4 cores) I use solr embedded with jetty server. Is there a way to tune this feature in order to use all cores and improve indexing performances ? Because for the moment, an extra script (PHP) gives better indexing performances than DIH. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org