[jira] [Created] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-4762:
--

 Summary: Deploying on weblogic: java.lang.NoSuchMethodError: 
replaceEach
 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4


When a customer tried to deploy on weblogic 10.3, they got this exception:

{noformat}
Error 500--Internal Server Error

java.lang.NoSuchMethodError: replaceEach
at
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
at
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
{noformat}

The solution to this problem appears to be adding the following to weblogic.xml 
in WEB-INF:

{noformat}
container-descriptor
  prefer-web-inf-classestrue/prefer-web-inf-classes
/container-descriptor
{noformat}

Since Solr's WEB-INF directory already contains this file and it already has 
the container-descriptor tag, I'm hoping this is a benign change.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4762:
---

Description: 
When a user tried to deploy on weblogic 10.3, they got this exception:

{noformat}
Error 500--Internal Server Error

java.lang.NoSuchMethodError: replaceEach
at
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
at
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
{noformat}

The solution to this problem appears to be adding the following to weblogic.xml 
in WEB-INF:

{noformat}
container-descriptor
  prefer-web-inf-classestrue/prefer-web-inf-classes
/container-descriptor
{noformat}

Since Solr's WEB-INF directory already contains this file and it already has 
the container-descriptor tag, I'm hoping this is a benign change.


  was:
When a customer tried to deploy on weblogic 10.3, they got this exception:

{noformat}
Error 500--Internal Server Error

java.lang.NoSuchMethodError: replaceEach
at
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
at
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
{noformat}

The solution to this problem appears to be adding the following to weblogic.xml 
in WEB-INF:

{noformat}
container-descriptor
  prefer-web-inf-classestrue/prefer-web-inf-classes
/container-descriptor
{noformat}

Since Solr's WEB-INF directory already contains this file and it already has 
the container-descriptor tag, I'm hoping this is a benign change.



 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at
 

[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4762:
---

Description: 
When a user tried to deploy on weblogic 10.3, they got this exception:

{noformat}
Error 500--Internal Server Error

java.lang.NoSuchMethodError: replaceEach
at org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
at 
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at 
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at 
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at 
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at 
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at 
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at 
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
{noformat}

The solution to this problem appears to be adding the following to weblogic.xml 
in WEB-INF:

{noformat}
container-descriptor
  prefer-web-inf-classestrue/prefer-web-inf-classes
/container-descriptor
{noformat}

Since Solr's WEB-INF directory already contains this file and it already has 
the container-descriptor tag, I'm hoping this is a benign change.


  was:
When a user tried to deploy on weblogic 10.3, they got this exception:

{noformat}
Error 500--Internal Server Error

java.lang.NoSuchMethodError: replaceEach
at
org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
at
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
{noformat}

The solution to this problem appears to be adding the following to weblogic.xml 
in WEB-INF:

{noformat}
container-descriptor
  prefer-web-inf-classestrue/prefer-web-inf-classes
/container-descriptor
{noformat}

Since Solr's WEB-INF directory already contains this file and it already has 
the container-descriptor tag, I'm hoping this is a benign change.



 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at 
 

[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641490#comment-13641490
 ] 

Uwe Schindler commented on SOLR-4762:
-

The problem is that Solr uses a newer version of commons-lang.jar than the one 
which is already in classpath of weblogic. The mentioned setting changes 
web-logic to use class commons-lang from web-inf with preference.

 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
 at 
 weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
 at 
 weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
 at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
 at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
 at 
 weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(Unknown Source)
 at 
 weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
 at 
 weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
 at 
 weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
 {noformat}
 The solution to this problem appears to be adding the following to 
 weblogic.xml in WEB-INF:
 {noformat}
 container-descriptor
   prefer-web-inf-classestrue/prefer-web-inf-classes
 /container-descriptor
 {noformat}
 Since Solr's WEB-INF directory already contains this file and it already has 
 the container-descriptor tag, I'm hoping this is a benign change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4358) SolrJ, by preventing multi-part post, loses key information about file name that Tika needs

2013-04-25 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641504#comment-13641504
 ] 

Karl Wright commented on SOLR-4358:
---

Has this ticket fix been pulled up into the latest RC for Solr 4.3?


 SolrJ, by preventing multi-part post, loses key information about file name 
 that Tika needs
 ---

 Key: SOLR-4358
 URL: https://issues.apache.org/jira/browse/SOLR-4358
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.0
Reporter: Karl Wright
Assignee: Ryan McKinley
 Attachments: additional_changes.diff, SOLR-4358.patch, 
 SOLR-4358.patch, SOLR-4358.patch


 SolrJ accepts a ContentStream, which has a name field.  Within 
 HttpSolrServer.java, if SolrJ makes the decision to use multipart posts, this 
 filename is transmitted as part of the form boundary information.  However, 
 if SolrJ chooses not to use multipart post, the filename information is lost.
 This information is used by SolrCell (Tika) to make decisions about content 
 extraction, so it is very important that it makes it into Solr in one way or 
 another.  Either SolrJ should set appropriate equivalent headers to send the 
 filename automatically, or it should force multipart posts when this 
 information is present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing

2013-04-25 Thread Simon Willnauer
this looks pretty serious! any chance we can get this index?

On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

 Error Message:
 CheckIndex failed

 Stack Trace:
 java.lang.RuntimeException: CheckIndex failed
 at 
 __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0)
 at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221)
 at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209)
 at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141)
 at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
 at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
 at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 

[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641567#comment-13641567
 ] 

Adrien Grand commented on LUCENE-4955:
--

Given that offsets can't go backwards and that tokens in the same position must 
have the same start offset, I think that the only way to get NGramTokenFilter 
out of TestRandomChains' exclusion list (LUCENE-4641) is to fix position 
increments (this issue), change the order tokens are emitted in (LUCENE-3920) 
and stop modifying offsets? I know some people rely on the current behavior but 
I think it's more important to get this filter out of TestRandomChains' 
exclusions since it causes highlighting bugs and makes the term vectors files 
unnecessary larger.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4735) Improve Solr metrics reporting

2013-04-25 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-4735:


Attachment: SOLR-4735.patch

New patch, moving everything to a single registry per-core, and adding a 
graphite reporter in contrib/.

JMX naming still isn't working right, and it needs some tests, but I think this 
is a decent way forward.  More eyes welcome...

 Improve Solr metrics reporting
 --

 Key: SOLR-4735
 URL: https://issues.apache.org/jira/browse/SOLR-4735
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-4735.patch, SOLR-4735.patch


 Following on from a discussion on the mailing list:
 http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+
 It would be good to make Solr play more nicely with existing devops 
 monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
 moment is poll-only, either via JMX or through the admin stats page.  I'd 
 like to refactor things a bit to make this more pluggable.
 This patch is a start.  It adds a new interface, InstrumentedBean, which 
 extends SolrInfoMBean to return a 
 [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
 couple of MetricReporters (which basically just duplicate the JMX and admin 
 page reporting that's there at the moment, but which should be more 
 extensible).  The patch includes a change to RequestHandlerBase showing how 
 this could work.  The idea would be to eventually replace the getStatistics() 
 call on SolrInfoMBean with this instead.
 The next step would be to allow more MetricReporters to be defined in 
 solrconfig.xml.  The Metrics library comes with ganglia and graphite 
 reporting modules, and we can add contrib plugins for both of those.
 There's some more general cleanup that could be done around SolrInfoMBean 
 (we've got two plugin handlers at /mbeans and /plugins that basically do the 
 same thing, and the beans themselves have some weirdly inconsistent data on 
 them - getVersion() returns different things for different impls, and 
 getSource() seems pretty useless), but maybe that's for another issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4763) Performance issue when using group.facet=true

2013-04-25 Thread Alexander Koval (JIRA)
Alexander Koval created SOLR-4763:
-

 Summary: Performance issue when using group.facet=true
 Key: SOLR-4763
 URL: https://issues.apache.org/jira/browse/SOLR-4763
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Koval


I do not know whether this is bug or not. But calculating facets with 
{{group.facet=true}} is too slow.

I have query that:
{code}
matches: 730597,
ngroups: 24024,
{code}

1. All queries with {{group.facet=true}}:
{code}
QTime: 5171
facet: {
time: 4716
{code}

2. Without {{group.facet}}:
* First query:
{code}
QTime: 3284
facet: {
time: 3104
{code}

* Next queries:
{code}
QTime: 230,
facet: {
time: 76
{code}

So I think without {{group.facet}} Solr uses cache to calculate facets.

Is it possible to improve performance of facets when {{group.facet=true}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4763) Performance issue when using group.facet=true

2013-04-25 Thread Alexander Koval (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Koval updated SOLR-4763:
--

Description: 
I do not know whether this is bug or not. But calculating facets with 
{{group.facet=true}} is too slow.

I have query that:
{code}
matches: 730597,
ngroups: 24024,
{code}

1. All queries with {{group.facet=true}}:
{code}
QTime: 5171
facet: {
time: 4716
{code}

2. Without {{group.facet}}:
* First query:
{code}
QTime: 3284
facet: {
time: 3104
{code}

* Next queries:
{code}
QTime: 230,
facet: {
time: 76
{code}

So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets.

Is it possible to improve performance of facets when {{group.facet=true}}?

  was:
I do not know whether this is bug or not. But calculating facets with 
{{group.facet=true}} is too slow.

I have query that:
{code}
matches: 730597,
ngroups: 24024,
{code}

1. All queries with {{group.facet=true}}:
{code}
QTime: 5171
facet: {
time: 4716
{code}

2. Without {{group.facet}}:
* First query:
{code}
QTime: 3284
facet: {
time: 3104
{code}

* Next queries:
{code}
QTime: 230,
facet: {
time: 76
{code}

So I think without {{group.facet}} Solr uses cache to calculate facets.

Is it possible to improve performance of facets when {{group.facet=true}}?


 Performance issue when using group.facet=true
 -

 Key: SOLR-4763
 URL: https://issues.apache.org/jira/browse/SOLR-4763
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Koval

 I do not know whether this is bug or not. But calculating facets with 
 {{group.facet=true}} is too slow.
 I have query that:
 {code}
 matches: 730597,
 ngroups: 24024,
 {code}
 1. All queries with {{group.facet=true}}:
 {code}
 QTime: 5171
 facet: {
 time: 4716
 {code}
 2. Without {{group.facet}}:
 * First query:
 {code}
 QTime: 3284
 facet: {
 time: 3104
 {code}
 * Next queries:
 {code}
 QTime: 230,
 facet: {
 time: 76
 {code}
 So I think with {{group.facet=true}} Solr doesn't use cache to calculate 
 facets.
 Is it possible to improve performance of facets when {{group.facet=true}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4758) Zk bootstrapping does not work with the new solr.xml format and core discovery by directory structure.

2013-04-25 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641660#comment-13641660
 ] 

Erick Erickson commented on SOLR-4758:
--

I wrestled with this and deferred it until later. There are several code paths 
in several places that are of the form 'if (corecontainer==null){} else {}'. As 
near as I can tell this is ONLY ever an issue in the test harness. I detest 
having code that is only necessary for the tests scattered about the real code, 
but haven't had any time to try to fix the test harness, which is what I 
_think_ the real solution is.

FWIW

 Zk bootstrapping does not work with the new solr.xml format and core 
 discovery by directory structure.
 --

 Key: SOLR-4758
 URL: https://issues.apache.org/jira/browse/SOLR-4758
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing

2013-04-25 Thread Michael McCandless
OK I pulled it down ... it looks like this:

-rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe
-rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs
-rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt
-rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx
-rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm
-rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc
-rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos
-rw-r--r-- 1 501 mike   1548 Apr 25 06:50 _0_Lucene41_0.tim
-rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip
-rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe
-rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs
-rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si
-rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd
-rw-r--r-- 1 501 mike   1490 Apr 25 06:50 _0.tvf
-rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx
-rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del
-rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe
-rw-r--r-- 1 501 mike   4296 Apr 25 06:50 _1_dv.cfs
-rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt
-rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx
-rw-r--r-- 1 501 mike   1225 Apr 25 06:50 _1.fnm
-rw-r--r-- 1 501 mike   2739 Apr 25 06:50 _1_Lucene41_0.doc
-rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos
-rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim
-rw-r--r-- 1 501 mike   3902 Apr 25 06:50 _1_Lucene41_0.tip
-rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe
-rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs
-rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si
-rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd
-rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf
-rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx
-rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del
-rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe
-rw-r--r-- 1 501 mike   7648 Apr 25 06:50 _2_dv.cfs
-rw-r--r-- 1 501 mike  79656 Apr 25 06:50 _2.fdt
-rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx
-rw-r--r-- 1 501 mike   2550 Apr 25 06:50 _2.fnm
-rw-r--r-- 1 501 mike   5478 Apr 25 06:50 _2_Lucene41_0.doc
-rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay
-rw-r--r-- 1 501 mike  16885 Apr 25 06:50 _2_Lucene41_0.pos
-rw-r--r-- 1 501 mike  94246 Apr 25 06:50 _2_Lucene41_0.tim
-rw-r--r-- 1 501 mike   2225 Apr 25 06:50 _2_Lucene41_0.tip
-rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe
-rw-r--r-- 1 501 mike   2193 Apr 25 06:50 _2_nrm.cfs
-rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si
-rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd
-rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf
-rw-r--r-- 1 501 mike   1937 Apr 25 06:50 _2.tvx
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdt
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdx
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvd
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvf
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvx
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 segments_1
-rw-r--r-- 1 501 mike  0 Apr 25 06:50 write.lock

Which looks to be exactly the case in LUCENE-4738, where the crash happened
during the first commit.  In this case we make (intentionally) make no
effort to be smart about this and happily declare the index is corrupt...

So the good news is this test now discovers the issue (it did not before)
... but we need to fix this test to make an exception for the first commit
... I'll do that.


Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer
simon.willna...@gmail.comwrote:

 this looks pretty serious! any chance we can get this index?

 On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/
 
  1 tests failed.
  REGRESSION:
  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads
 
  Error Message:
  CheckIndex failed
 
  Stack Trace:
  java.lang.RuntimeException: CheckIndex failed
  at
 __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0)
  at
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221)
  at
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209)
  at
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141)
  at
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
  at
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
  at
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at
 

Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing

2013-04-25 Thread Simon Willnauer
mike,

how can you pull this index? Do we have this on the wiki where to go etc?

On Thu, Apr 25, 2013 at 1:23 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 OK I pulled it down ... it looks like this:

 -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe
 -rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs
 -rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt
 -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx
 -rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm
 -rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc
 -rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos
 -rw-r--r-- 1 501 mike   1548 Apr 25 06:50 _0_Lucene41_0.tim
 -rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip
 -rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe
 -rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs
 -rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si
 -rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd
 -rw-r--r-- 1 501 mike   1490 Apr 25 06:50 _0.tvf
 -rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx
 -rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del
 -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe
 -rw-r--r-- 1 501 mike   4296 Apr 25 06:50 _1_dv.cfs
 -rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt
 -rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx
 -rw-r--r-- 1 501 mike   1225 Apr 25 06:50 _1.fnm
 -rw-r--r-- 1 501 mike   2739 Apr 25 06:50 _1_Lucene41_0.doc
 -rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos
 -rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim
 -rw-r--r-- 1 501 mike   3902 Apr 25 06:50 _1_Lucene41_0.tip
 -rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe
 -rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs
 -rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si
 -rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd
 -rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf
 -rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx
 -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del
 -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe
 -rw-r--r-- 1 501 mike   7648 Apr 25 06:50 _2_dv.cfs
 -rw-r--r-- 1 501 mike  79656 Apr 25 06:50 _2.fdt
 -rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx
 -rw-r--r-- 1 501 mike   2550 Apr 25 06:50 _2.fnm
 -rw-r--r-- 1 501 mike   5478 Apr 25 06:50 _2_Lucene41_0.doc
 -rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay
 -rw-r--r-- 1 501 mike  16885 Apr 25 06:50 _2_Lucene41_0.pos
 -rw-r--r-- 1 501 mike  94246 Apr 25 06:50 _2_Lucene41_0.tim
 -rw-r--r-- 1 501 mike   2225 Apr 25 06:50 _2_Lucene41_0.tip
 -rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe
 -rw-r--r-- 1 501 mike   2193 Apr 25 06:50 _2_nrm.cfs
 -rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si
 -rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd
 -rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf
 -rw-r--r-- 1 501 mike   1937 Apr 25 06:50 _2.tvx
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdt
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdx
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvd
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvf
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvx
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 segments_1
 -rw-r--r-- 1 501 mike  0 Apr 25 06:50 write.lock

 Which looks to be exactly the case in LUCENE-4738, where the crash happened
 during the first commit.  In this case we make (intentionally) make no
 effort to be smart about this and happily declare the index is corrupt...

 So the good news is this test now discovers the issue (it did not before)
 ... but we need to fix this test to make an exception for the first commit
 ... I'll do that.


 Mike McCandless

 http://blog.mikemccandless.com


 On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer simon.willna...@gmail.com
 wrote:

 this looks pretty serious! any chance we can get this index?

 On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server
 jenk...@builds.apache.org wrote:
  Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/
 
  1 tests failed.
  REGRESSION:
  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads
 
  Error Message:
  CheckIndex failed
 
  Stack Trace:
  java.lang.RuntimeException: CheckIndex failed
  at
  __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0)
  at
  org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221)
  at
  org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209)
  at
  org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141)
  at
  org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
  at
  org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
  at
  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:62)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 

Re: [JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 245 - Still Failing

2013-04-25 Thread Michael McCandless
I just log into lucene.zones.apache.org and go to the directory (the full
path is in the Jenkins failure)

Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 25, 2013 at 7:30 AM, Simon Willnauer
simon.willna...@gmail.comwrote:

 mike,

 how can you pull this index? Do we have this on the wiki where to go etc?

 On Thu, Apr 25, 2013 at 1:23 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
  OK I pulled it down ... it looks like this:
 
  -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _0_dv.cfe
  -rw-r--r-- 1 501 mike208 Apr 25 06:50 _0_dv.cfs
  -rw-r--r-- 1 501 mike931 Apr 25 06:50 _0.fdt
  -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _0.fdx
  -rw-r--r-- 1 501 mike734 Apr 25 06:50 _0.fnm
  -rw-r--r-- 1 501 mike 80 Apr 25 06:50 _0_Lucene41_0.doc
  -rw-r--r-- 1 501 mike193 Apr 25 06:50 _0_Lucene41_0.pos
  -rw-r--r-- 1 501 mike   1548 Apr 25 06:50 _0_Lucene41_0.tim
  -rw-r--r-- 1 501 mike197 Apr 25 06:50 _0_Lucene41_0.tip
  -rw-r--r-- 1 501 mike113 Apr 25 06:50 _0_nrm.cfe
  -rw-r--r-- 1 501 mike229 Apr 25 06:50 _0_nrm.cfs
  -rw-r--r-- 1 501 mike377 Apr 25 06:50 _0.si
  -rw-r--r-- 1 501 mike 42 Apr 25 06:50 _0.tvd
  -rw-r--r-- 1 501 mike   1490 Apr 25 06:50 _0.tvf
  -rw-r--r-- 1 501 mike 65 Apr 25 06:50 _0.tvx
  -rw-r--r-- 1 501 mike 37 Apr 25 06:50 _1_1.del
  -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _1_dv.cfe
  -rw-r--r-- 1 501 mike   4296 Apr 25 06:50 _1_dv.cfs
  -rw-r--r-- 1 501 mike 365007 Apr 25 06:50 _1.fdt
  -rw-r--r-- 1 501 mike 55 Apr 25 06:50 _1.fdx
  -rw-r--r-- 1 501 mike   1225 Apr 25 06:50 _1.fnm
  -rw-r--r-- 1 501 mike   2739 Apr 25 06:50 _1_Lucene41_0.doc
  -rw-r--r-- 1 501 mike 159869 Apr 25 06:50 _1_Lucene41_0.pos
  -rw-r--r-- 1 501 mike 173644 Apr 25 06:50 _1_Lucene41_0.tim
  -rw-r--r-- 1 501 mike   3902 Apr 25 06:50 _1_Lucene41_0.tip
  -rw-r--r-- 1 501 mike192 Apr 25 06:50 _1_nrm.cfe
  -rw-r--r-- 1 501 mike461 Apr 25 06:50 _1_nrm.cfs
  -rw-r--r-- 1 501 mike377 Apr 25 06:50 _1.si
  -rw-r--r-- 1 501 mike286 Apr 25 06:50 _1.tvd
  -rw-r--r-- 1 501 mike 560455 Apr 25 06:50 _1.tvf
  -rw-r--r-- 1 501 mike849 Apr 25 06:50 _1.tvx
  -rw-r--r-- 1 501 mike 45 Apr 25 06:50 _2_1.del
  -rw-r--r-- 1 501 mike 87 Apr 25 06:50 _2_dv.cfe
  -rw-r--r-- 1 501 mike   7648 Apr 25 06:50 _2_dv.cfs
  -rw-r--r-- 1 501 mike  79656 Apr 25 06:50 _2.fdt
  -rw-r--r-- 1 501 mike 58 Apr 25 06:50 _2.fdx
  -rw-r--r-- 1 501 mike   2550 Apr 25 06:50 _2.fnm
  -rw-r--r-- 1 501 mike   5478 Apr 25 06:50 _2_Lucene41_0.doc
  -rw-r--r-- 1 501 mike 34 Apr 25 06:50 _2_Lucene41_0.pay
  -rw-r--r-- 1 501 mike  16885 Apr 25 06:50 _2_Lucene41_0.pos
  -rw-r--r-- 1 501 mike  94246 Apr 25 06:50 _2_Lucene41_0.tim
  -rw-r--r-- 1 501 mike   2225 Apr 25 06:50 _2_Lucene41_0.tip
  -rw-r--r-- 1 501 mike464 Apr 25 06:50 _2_nrm.cfe
  -rw-r--r-- 1 501 mike   2193 Apr 25 06:50 _2_nrm.cfs
  -rw-r--r-- 1 501 mike395 Apr 25 06:50 _2.si
  -rw-r--r-- 1 501 mike652 Apr 25 06:50 _2.tvd
  -rw-r--r-- 1 501 mike 134321 Apr 25 06:50 _2.tvf
  -rw-r--r-- 1 501 mike   1937 Apr 25 06:50 _2.tvx
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdt
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.fdx
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvd
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvf
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 _3.tvx
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 segments_1
  -rw-r--r-- 1 501 mike  0 Apr 25 06:50 write.lock
 
  Which looks to be exactly the case in LUCENE-4738, where the crash
 happened
  during the first commit.  In this case we make (intentionally) make no
  effort to be smart about this and happily declare the index is corrupt...
 
  So the good news is this test now discovers the issue (it did not before)
  ... but we need to fix this test to make an exception for the first
 commit
  ... I'll do that.
 
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Thu, Apr 25, 2013 at 4:18 AM, Simon Willnauer 
 simon.willna...@gmail.com
  wrote:
 
  this looks pretty serious! any chance we can get this index?
 
  On Thu, Apr 25, 2013 at 7:59 AM, Apache Jenkins Server
  jenk...@builds.apache.org wrote:
   Build:
 https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/245/
  
   1 tests failed.
   REGRESSION:
   org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads
  
   Error Message:
   CheckIndex failed
  
   Stack Trace:
   java.lang.RuntimeException: CheckIndex failed
   at
  
 __randomizedtesting.SeedInfo.seed([357616123B0638E9:AEAF02097AFD2E82]:0)
   at
   org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:221)
   at
   org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:209)
   at
  
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:141)
   at
  
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:147)
   at
  
 

Re: [VOTE] Lucene Solr 4.3.0 RC3

2013-04-25 Thread Shai Erera
+1 smoke tester happy here.

Shai


On Thu, Apr 25, 2013 at 12:18 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/

 +1 to releasing the artifacts with the following SHA1 signatures as
 Lucene/Solr 4.3.0...

 3e1ec78f7b5bad2723dcf2f963d933758046afb9 *lucene-4.3.0-src.tgz
 26843d53c86a9937d700f13f1d686adaca718244 *lucene-4.3.0.tgz
 72b526a5aa21c7499954978a74e14ceac3a607ea *lucene-4.3.0.zip
 9fd7abc7e478dbc5474658460da58ec360d6b1e4 *solr-4.3.0-src.tgz
 5dca6da9f30830dc20163623b0a4f63749777f24 *solr-4.3.0.tgz
 ba6c86209614e3fe8cddeb3193bb8a09299ea457 *solr-4.3.0.zip


 FWIW: During my testing I did encounter one new bug: SOLR-4754, but since
 it has a workarround (and i have no idea yet what the underlying problem
 is to even try for a quick fix) I don't think it should block the release.


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities

2013-04-25 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641692#comment-13641692
 ] 

Roman commented on SOLR-2356:
-

Why this issue is marked as minor? Data import can bee speeded up to 5-10 times 
on most machines. It seems pretty important.

 indexing using DataImportHandler does not use entire CPU capacities
 ---

 Key: SOLR-2356
 URL: https://issues.apache.org/jira/browse/SOLR-2356
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA
 Environment: intel xeon processor (4 cores), Debian Linux Lenny, 
 OpenJDK 64bits server v1.6.0
Reporter: colby
Priority: Minor
  Labels: test
   Original Estimate: 168h
  Remaining Estimate: 168h

 When I use a DataImportHandler to index a large number of documents (~35M), 
 cpu usage doesn't go over than 100% cpu (i.e. just one core).
 When I configure 4 threads for the entity tag, the cpu usage is splitted to 
 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
 I use solr embedded with jetty server.
 Is there a way to tune this feature in order to use all cores and improve 
 indexing performances ?
 Because for the moment, an extra script (PHP) gives better indexing 
 performances than DIH.
 thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641704#comment-13641704
 ] 

Robert Muir commented on LUCENE-4955:
-

+1 Adrien. these analysis components should either be fixed or removed.

We can speed up the process now by changing IndexWriter to reject this kinda 
bogus shit. We shouldnt be putting broken data into e.g. term vectors. That 
should encourage the fixing process.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4955:


Attachment: highlighter-test.patch

bq. We can speed up the process now by changing IndexWriter to reject this 
kinda bogus shit. We shouldnt be putting broken data into e.g. term vectors. 
That should encourage the fixing process.

+1

I updated the highlighter test and added analysis-common as a test dependency 
such that this can be run with ant.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, highlighter-test.patch, 
 LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641706#comment-13641706
 ] 

Adrien Grand commented on LUCENE-4955:
--

+1

I'll work on fixing NGramTokenizer and NGramTokenFilter.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, highlighter-test.patch, 
 LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641709#comment-13641709
 ] 

Robert Muir commented on LUCENE-4955:
-

I don't think we should add analysis-common as a test dependency to the 
highlighter. I worked pretty hard to clean all this up with e.g. mocktokenizer 
so we didnt have dependency hell. It also keeps our tests clean.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, highlighter-test.patch, 
 LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0

2013-04-25 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4957:
---

 Summary: Stop IndexWriter from writing broken term vector offset 
data in 5.0
 Key: LUCENE-4957
 URL: https://issues.apache.org/jira/browse/LUCENE-4957
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Today we allow this in (some analyzers are broken), and only reject them if 
someone is indexing offsets into the postings lists.

But we should ban this also when term vectors are enabled. Its time to stop 
writing this broken data and let broken analyzers be broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641710#comment-13641710
 ] 

Simon Willnauer commented on LUCENE-4955:
-

robert I agree, I added this as sep. patch to make sure that whatever we commit 
here we can at least test that the ngram filter doesn't throw an IOOB anymore. 
I just wanted to make it easier to run the test.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, highlighter-test.patch, 
 LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0

2013-04-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641723#comment-13641723
 ] 

Uwe Schindler commented on LUCENE-4957:
---

+1

 Stop IndexWriter from writing broken term vector offset data in 5.0
 ---

 Key: LUCENE-4957
 URL: https://issues.apache.org/jira/browse/LUCENE-4957
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Today we allow this in (some analyzers are broken), and only reject them if 
 someone is indexing offsets into the postings lists.
 But we should ban this also when term vectors are enabled. Its time to stop 
 writing this broken data and let broken analyzers be broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-04-25 Thread Zack Zullick (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641739#comment-13641739
 ] 

Zack Zullick commented on LUCENE-2899:
--

Some information for those wanting to try this after fighting it for a day: the 
latest patch posted, LUCENE-2899-RJN.patch for 4.1 does not have Em's 
OpenNLPFilter.java and OpenNLPTokenizer.java fixed applied. So after applying 
the patch, make sure to replace those classes with Em's version or the bug that 
causes the NLP system to only be utilized on the first request will still be 
present. I was also able to successfully apply this patch to 4.2.1 with minor 
modification (mostly to the build/ivy xml files).

 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899-RJN.patch, OpenNLPFilter.java, OpenNLPTokenizer.java, 
 opennlp_trunk.patch


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



SOLR-2894 - am I the only one who gets this behaviour?

2013-04-25 Thread Stein Gran
Hi,

I tested the latest patch for SOLR-2894 a couple of weeks ago, and while it
worked fine for string fields I got no output if one of the facet.pivot
fields is a date field. (SOLR-2894 is about implementing distributed pivot
faceting)

https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13627641page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13627641

Am I the only one who gets this behavior? If so, I'll look into my test
environment again.

I'm more than happy to test any new patch to this issue, as I have a test
environment set up which runs multiple scenarios with pivot faceting and
date fields in a SolrCloud with two machines :-)

I have looked through the code changes in the latest patch as well, but
since I do not know the Solr code base I didn't see anything obvious.  But
I can help with testing if anyone wants any testing done.

Best,
Stein J. Gran


[jira] [Updated] (SOLR-4759) Cleanup Velocity Templates

2013-04-25 Thread Mark Bennett (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bennett updated SOLR-4759:
---

Attachment: velocity-SOLR-4759.zip

Because the patch includes both file renaming and content changes the patch 
command gives errors.  Per Erik H, this is a binary version of the changes.  
(includes .svn dirs).  Meant to be extracted from 
solr/example/solr/collection1/conf/

 Cleanup Velocity Templates
 --

 Key: SOLR-4759
 URL: https://issues.apache.org/jira/browse/SOLR-4759
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Mark Bennett
 Attachments: SOLR-4759.patch, velocity-SOLR-4759.zip


 Cleanup to Velocity templates shipped under 
 solr/example/solr/collection1/conf/velocity
 * Add README.txt file with complete file list
 * Add comments to all files
 * Add indenting where feasible, fixed indenting in other places.  I don't 
 believe I've broken anything that required precise indenting.
 * Make file naming consistent.  We had this_that, thisThat and this-that
   Changed all to this_that, though also considered this-that.
 * Modularize some files
 * Included a hit_plain.vm example, though not active by default.
 * Rewrote city/lon/lat selector to work from a hash, though doesn't change 
 the behavior.
 * CSS changes, primarily to make top tabs actually look like Tabs 
 (primitive CSS, but at least conveys the idea)
 As far as I know this doesn't change any behavior of the system, nor does it 
 fix any existing bugs.  Although I might do bug fixing in a later patch, I 
 wanted to keep this as a pure code readability patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: Contributing the Korean Analyzer

2013-04-25 Thread Steve Rowe
Forwarding to the dev list:

Begin forwarded message:

 From: 이수명 smlee0...@gmail.com
 Subject: Re: Contributing the Korean Analyzer
 Date: April 24, 2013 10:00:18 PM EDT
 To: Steve Rowe sar...@gmail.com
 
 Hello Steve,
 
 Yes, I'm the only author of the code.
 
 It took 2 years for me to finish the korean analyzer and dictionaries in the 
 first development.
 I posted the source code and the binary in 2008 on the online community of 
 internet portal(http://cafe.naver.com/korlucene) for korean people.
 I got recieved bug reports through the online community and got upgraded it 
 for over 4 years.
 I also posted the source code on the sourceforge in 2009 as you already see 
 it.
 
 I finished to create a jira issue (LUCENE-4956) and attach the file that I 
 contribute.
 If you uncompress the file, you can find two source code directories (src 
 and morph).
 The morph directory is including the dictionaries and the korean 
 morphological analyzer.
 
 Best regards.
 
 Soomyung Lee
 
 2013/4/25 Steve Rowe sar...@gmail.com
 Hi Soomyung,
 
 I agree with Christian, this sounds fantastic!
 
 First, we need to know a couple things:
 
 1. Are you the only author of the code?  We need to get agreement from all 
 contributors.  (When I browse CVS on the SourceForge site, the only author I 
 see is smlee0818, which I assume is you.)
 
 2. Do you need permission from your employer to make this donation?  If so, 
 we'll need your employer to submit a Corporate CLA (Contributor License 
 Agreement)[1] before we can accept the donation.
 
 To get started, the first step is creating a Lucene JIRA issue here: 
 https://issues.apache.org/jira/browse/LUCENE - you'll need to create an ASF 
 JIRA account first if you don't already have one: click the Log In link at 
 the top right of the page, then click the Sign up link where it says Not a 
 member? Sign up for an account.
 
 Once you've created a JIRA issue, you should make a compressed tarball of 
 everything you want to contribute - as far as I can tell, this is everything 
 in the lucenekorean sourceforge project in CVS under modules 
 kr.dictionary, kr.analysis.4x, and kr.morph - and then attach it to the 
 JIRA issue, with the MD5 hash for the tarball in the comment that you provide 
 when you attach the tarball to the issue.
 
 Once you've created the JIRA issue and attached your contribution, we can 
 make progress on further steps that need to be taken: you should submit an 
 individual CLA[2] and a code grant[3], and I (in my role as Lucene PMC chair) 
 will be managing the IP clearance process[4][5].
 
 See http://wiki.apache.org/lucene-java/HowToContribute for more information 
 about contributing.
 
 I look forward to working with you on this - thank you for contributing!
 
 Steve
 
 [1] http://www.apache.org/licenses/cla-corporate.txt
 [1] http://www.apache.org/licenses/icla.txt
 [2] http://www.apache.org/licenses/software-grant.txt
 [3] http://incubator.apache.org/ip-clearance/index.html
 [4] http://incubator.apache.org/ip-clearance/ip-clearance-template.html
 
 On Apr 24, 2013, at 7:00 AM, Christian Moen c...@atilika.com wrote:
 
  Hello Soomyung,
 
  Thanks a lot for this.  This is very good news.
 
  Let's await the PMC Chair's suggestion on next steps.  See LUCENE-3305 to 
  get an idea how the process was for Japanese.
 
  If the process goes well, I'm happy to see how I can set aside some time 
  after Lucene Revolution to work on integrating this.
 
  Best regards,
 
  Christian Moen
  アティリカ株式会社
  http://www.atilika.com
 
  On Apr 24, 2013, at 7:40 PM, 이수명 smlee0...@gmail.com wrote:
 
  Hello Christian.
 
  Thanks for your reply.
  I'm happy to hear about a code grant process.
 
  To make the dictionaries, I collected words itself and word features from 
  books and internet.
  And I organized all of the information that I collected to make the korean 
  morphological analyzer.
  Therefore the dictionaries is that I made.
 
  I think It is enough to attach a file(License Notice) that describe on 
  where the dictionaries originate from and the kind of licensing (Apache 
  License 2.0).
 
  If it is not enough, please leave me a message and give me some guide.
 
  thanks.
 
  Soomyung Lee
 
 
  2013/4/24 Christian Moen c...@atilika.com
  Hello SooMyung,
 
  Thanks a lot!  It will be great to get Korean supported out-of-the-box in 
  Lucene/Solr.
 
  In terms of process, I'll leave this to Steve Rowe, PMC Chair, to comment 
  on, but a code grant process sounds likely.
 
  I'm seeing that the code itself has an Apache License 2.0, but could you 
  elaborate on where the dictionaries originate from and what kind of 
  licensing terms that are applicable?
 
  Many thanks,
 
 
  Christian Moen
 
  On Apr 24, 2013, at 2:05 PM, smlee0...@gmail.com wrote:
 
  Hello,
 
  I've developed the Korean Analyzer and distributed it since 2008.
  Many people who use lucene with korean use it.
 
  I posted it to the sourceforge 
  

[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641876#comment-13641876
 ] 

Mark Miller commented on SOLR-4761:
---

+1, patch looks good!

bq.  it won't actually kick in until after the first reopen. 

I think we may want to just ditch our lazy creation of the indexwriter and 
create it upfront. I don't think it saves too much to not create it.

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

2013-04-25 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641883#comment-13641883
 ] 

Otis Gospodnetic commented on SOLR-4763:


If you don't know if it's a bug or not it's best to bring it up on the mailing 
list first, so devs don't have to manage invalid JIRA issues and so you can get 
a better discussion (and help) going.  I'm not sure if this a big or not.  
Maybe [~martijn.v.groningen] will know.

 Performance issue when using group.facet=true
 -

 Key: SOLR-4763
 URL: https://issues.apache.org/jira/browse/SOLR-4763
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Koval

 I do not know whether this is bug or not. But calculating facets with 
 {{group.facet=true}} is too slow.
 I have query that:
 {code}
 matches: 730597,
 ngroups: 24024,
 {code}
 1. All queries with {{group.facet=true}}:
 {code}
 QTime: 5171
 facet: {
 time: 4716
 {code}
 2. Without {{group.facet}}:
 * First query:
 {code}
 QTime: 3284
 facet: {
 time: 3104
 {code}
 * Next queries:
 {code}
 QTime: 230,
 facet: {
 time: 76
 {code}
 So I think with {{group.facet=true}} Solr doesn't use cache to calculate 
 facets.
 Is it possible to improve performance of facets when {{group.facet=true}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4764) When using NRT, just init the reader from IndexWriter

2013-04-25 Thread Robert Muir (JIRA)
Robert Muir created SOLR-4764:
-

 Summary: When using NRT, just init the reader from IndexWriter
 Key: SOLR-4764
 URL: https://issues.apache.org/jira/browse/SOLR-4764
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir


Spinoff from SOLR-4761

Solr first opens a DirectoryReader from the directory, then later will pass 
this to IW openIfChanged.

I noticed this when i was confused that mergedsegmentwarmer doesn't appear to 
work at first until after you've reopened...

I'm not totally sure what the current behavior causes (does IW's pool reuse 
segments from this passed-in external reader, or is this causing some 
horrible doubling-up/inefficient stuff etc?). To some extent i think we should 
change it even if its actually performant: I think its confusing.

I think ideally we'd change IndexReaderFactory's method to take writer instead 
of directory so that custom DirectoryReaders can still work.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

2013-04-25 Thread Alexander Koval (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641922#comment-13641922
 ] 

Alexander Koval commented on SOLR-4763:
---

I'm sorry for that. I found 2 discussions in the mailing list:
http://lucene.472066.n3.nabble.com/Grouping-performance-problem-td3995245.html
http://lucene.472066.n3.nabble.com/group-facet-true-performances-td4021639.html

The solution has not been found.

I think with this issue it is not possible to use {{group.facet=true}} option 
in production.

 Performance issue when using group.facet=true
 -

 Key: SOLR-4763
 URL: https://issues.apache.org/jira/browse/SOLR-4763
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Koval

 I do not know whether this is bug or not. But calculating facets with 
 {{group.facet=true}} is too slow.
 I have query that:
 {code}
 matches: 730597,
 ngroups: 24024,
 {code}
 1. All queries with {{group.facet=true}}:
 {code}
 QTime: 5171
 facet: {
 time: 4716
 {code}
 2. Without {{group.facet}}:
 * First query:
 {code}
 QTime: 3284
 facet: {
 time: 3104
 {code}
 * Next queries:
 {code}
 QTime: 230,
 facet: {
 time: 76
 {code}
 So I think with {{group.facet=true}} Solr doesn't use cache to calculate 
 facets.
 Is it possible to improve performance of facets when {{group.facet=true}}?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4655) The Overseer should assign node names by default.

2013-04-25 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4655:
--

Attachment: SOLR-4655.patch

To trunk.

 The Overseer should assign node names by default.
 -

 Key: SOLR-4655
 URL: https://issues.apache.org/jira/browse/SOLR-4655
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0

 Attachments: SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, 
 SOLR-4655.patch, SOLR-4655.patch


 Currently we make a unique node name by using the host address as part of the 
 name. This means that if you want a node with a new address to take over, the 
 node name is misleading. It's best if you set custom names for each node 
 before starting your cluster. This is cumbersome though, and cannot currently 
 be done with the collections API. Instead, the overseer could assign a more 
 generic name such as nodeN by default. Then you can easily swap in another 
 node with no pre planning and no confusion in the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities

2013-04-25 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641944#comment-13641944
 ] 

Shawn Heisey commented on SOLR-2356:


Roman, patches are welcome.  If you know how to fix it, get the source code and 
go for it, then upload the patch.  The issue is more than two years old, so if 
it were an easy fix, the people that really know DIH would have fixed it 
already.  You can use the SolrJ library to write a multi-threaded application 
to import data.  If the design is solid, it could ultimately become the basis 
for a new DIH.

It used to be possible to configure multiple threads in the DIH config, but 
that was removed in 4.x because it was unstable.  Also, it didn't really help, 
as the issue reporter found.  It will probably take a complete redesign to fix 
this issue, and DIH is a contrib module, not part of the main Solr code.  That 
is why this is marked minor.


 indexing using DataImportHandler does not use entire CPU capacities
 ---

 Key: SOLR-2356
 URL: https://issues.apache.org/jira/browse/SOLR-2356
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA
 Environment: intel xeon processor (4 cores), Debian Linux Lenny, 
 OpenJDK 64bits server v1.6.0
Reporter: colby
Priority: Minor
  Labels: test
   Original Estimate: 168h
  Remaining Estimate: 168h

 When I use a DataImportHandler to index a large number of documents (~35M), 
 cpu usage doesn't go over than 100% cpu (i.e. just one core).
 When I configure 4 threads for the entity tag, the cpu usage is splitted to 
 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
 I use solr embedded with jetty server.
 Is there a way to tune this feature in order to use all cores and improve 
 indexing performances ?
 Because for the moment, an extra script (PHP) gives better indexing 
 performances than DIH.
 thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641959#comment-13641959
 ] 

Shawn Heisey commented on SOLR-4762:


[~thetaphi] that is the conclusion I came to as well.  I guess the question is 
whether preferring application classes will cause unintended side effects.  
That solution worked for some people, though none of the accounts that I came 
across were using Solr.


 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
 at 
 weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
 at 
 weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
 at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
 at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
 at 
 weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(Unknown Source)
 at 
 weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
 at 
 weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
 at 
 weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
 {noformat}
 The solution to this problem appears to be adding the following to 
 weblogic.xml in WEB-INF:
 {noformat}
 container-descriptor
   prefer-web-inf-classestrue/prefer-web-inf-classes
 /container-descriptor
 {noformat}
 Since Solr's WEB-INF directory already contains this file and it already has 
 the container-descriptor tag, I'm hoping this is a benign change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-25 Thread soomyung (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641960#comment-13641960
 ] 

soomyung commented on LUCENE-4956:
--

Thanks for your help and your great concern , Christian!

I visited your website. I noticed that you are not a Japanese and you developed 
a Japanese Morphological Analyzer.

How could it be possible? I'm surprising at your work.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4762:
---

Attachment: SOLR-4762.patch

Patch that might fix the issue.  I will run tests and wait for feedback before 
committing.

 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4

 Attachments: SOLR-4762.patch


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
 at 
 weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
 at 
 weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
 at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
 at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
 at 
 weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(Unknown Source)
 at 
 weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
 at 
 weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
 at 
 weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
 {noformat}
 The solution to this problem appears to be adding the following to 
 weblogic.xml in WEB-INF:
 {noformat}
 container-descriptor
   prefer-web-inf-classestrue/prefer-web-inf-classes
 /container-descriptor
 {noformat}
 Since Solr's WEB-INF directory already contains this file and it already has 
 the container-descriptor tag, I'm hoping this is a benign change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene Solr 4.3.0 RC3

2013-04-25 Thread Shalin Shekhar Mangar
+1


On Tue, Apr 23, 2013 at 5:20 PM, Simon Willnauer
simon.willna...@gmail.comwrote:

 Here is a new RC candidate...


 http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/

 here is my +1

 thanks for voting...

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Regards,
Shalin Shekhar Mangar.


Re: [VOTE] Lucene Solr 4.3.0 RC3

2013-04-25 Thread Mark Miller
+1

- Mark

On Apr 23, 2013, at 7:50 AM, Simon Willnauer simon.willna...@gmail.com wrote:

 Here is a new RC candidate...
 
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/
 
 here is my +1
 
 thanks for voting...
 
 simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-04-25 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641968#comment-13641968
 ] 

Lance Norskog commented on LUCENE-2899:
---

Maciej- This is a good point. This package needs changes in a lot of places and 
it might be easier to package it the way you say. 

Zack- The churn in the APIs is a major problem in the Lucene code management. 
The original patch worked in the 4.x branch and trunk when it was posted. What 
Em fixed is in an area which is very very basic to Lucene. The API changed with 
no notice and no change in versions or method names. 

Everyone- It's great that this has gained some interest. Please create a new 
master patch with whatever changes are needed for the current code base.

Lucene grand masters- Please don't say hey kids, write plugins, they're cool! 
and then make subtle incompatible changes in APIs. 

 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.3

 Attachments: LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899-RJN.patch, OpenNLPFilter.java, OpenNLPTokenizer.java, 
 opennlp_trunk.patch


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.

2013-04-25 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4765:
-

 Summary: The new Collections API test 
deleteCollectionWithDownNodes fails often with a server 500 error.
 Key: SOLR-4765
 URL: https://issues.apache.org/jira/browse/SOLR-4765
 Project: Solr
  Issue Type: Bug
  Components: Tests
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.

2013-04-25 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4765:
--

Issue Type: Test  (was: Bug)

 The new Collections API test deleteCollectionWithDownNodes fails often with a 
 server 500 error.
 ---

 Key: SOLR-4765
 URL: https://issues.apache.org/jira/browse/SOLR-4765
 Project: Solr
  Issue Type: Test
  Components: Tests
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 35158 - Failure!

2013-04-25 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/35158/

3 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestFilteredSearch

Error Message:
6 threads leaked from SUITE scope at 
org.apache.lucene.search.TestFilteredSearch: 1) Thread[id=317, 
name=LuceneTestCase-39-thread-3, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)2) Thread[id=315, 
name=LuceneTestCase-39-thread-1, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)3) Thread[id=318, 
name=LuceneTestCase-39-thread-4, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)4) Thread[id=320, 
name=LuceneTestCase-39-thread-6, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)5) Thread[id=316, 
name=LuceneTestCase-39-thread-2, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)6) Thread[id=319, 
name=LuceneTestCase-39-thread-5, state=WAITING, group=TGRP-TestFilteredSearch]  
   at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
at java.lang.Thread.run(Thread.java:722)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 6 threads leaked from SUITE 
scope at 

[jira] [Commented] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642001#comment-13642001
 ] 

Commit Tag Bot commented on SOLR-4765:
--

[trunk commit] markrmiller
http://svn.apache.org/viewvc?view=revisionrevision=1475869

SOLR-4765: The new Collections API test deleteCollectionWithDownNodes fails 
often with a server 500 error.

 The new Collections API test deleteCollectionWithDownNodes fails often with a 
 server 500 error.
 ---

 Key: SOLR-4765
 URL: https://issues.apache.org/jira/browse/SOLR-4765
 Project: Solr
  Issue Type: Test
  Components: Tests
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4958) unnecessary assert on docid

2013-04-25 Thread John Wang (JIRA)
John Wang created LUCENE-4958:
-

 Summary: unnecessary assert on docid
 Key: LUCENE-4958
 URL: https://issues.apache.org/jira/browse/LUCENE-4958
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.1
Reporter: John Wang


In DocFieldProcessor, on line 353, there is this assert:

  assert docValuesConsumerAndDocID.docID  docState.docID;

Is this assert necessary? I don't see in the indexing pipeline that this 
guarantee is needed. Can we remove this?

We have implemented a custom indexingchain that rewrites docState.docID is 
reverse order and it is working well. But we have to do ugly workarounds in our 
test to avoid this assert.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642023#comment-13642023
 ] 

Commit Tag Bot commented on SOLR-4765:
--

[branch_4x commit] markrmiller
http://svn.apache.org/viewvc?view=revisionrevision=1475879

SOLR-4765: The new Collections API test deleteCollectionWithDownNodes fails 
often with a server 500 error.

 The new Collections API test deleteCollectionWithDownNodes fails often with a 
 server 500 error.
 ---

 Key: SOLR-4765
 URL: https://issues.apache.org/jira/browse/SOLR-4765
 Project: Solr
  Issue Type: Test
  Components: Tests
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4959) Incorrect return value from SimpleNaiveBayesClassifier.assignClass

2013-04-25 Thread Alexey Kutin (JIRA)
Alexey Kutin created LUCENE-4959:


 Summary: Incorrect return value from 
SimpleNaiveBayesClassifier.assignClass 
 Key: LUCENE-4959
 URL: https://issues.apache.org/jira/browse/LUCENE-4959
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.2.1, 5.0
Reporter: Alexey Kutin


The local copy of BytesRef referenced by foundClass is affected by subsequent 
TermsEnum.iterator.next() calls as the shared BytesRef.bytes changes. 

If a term test gives a good match and a next term in the terms collection is 
classification with a lower match score then the return result will be clas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4960) Require minimum ivy version

2013-04-25 Thread Shawn Heisey (JIRA)
Shawn Heisey created LUCENE-4960:


 Summary: Require minimum ivy version
 Key: LUCENE-4960
 URL: https://issues.apache.org/jira/browse/LUCENE-4960
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Affects Versions: 4.2.1
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 5.0, 4.4


Someone on solr-user ran into a problem while trying to run 'ant idea' so they 
could work on Solr in their IDE.  [~steve_rowe] indicated that this is probably 
due to IVY-1194, requiring an ivy jar upgrade.

The build system should check for a minimum ivy version, just like it does with 
ant.  The absolute minimum we require appears to be 2.2.0, but do we want to 
make it 2.3.0 due to IVY-1388?

I'm not sure how to go about checking the ivy version.  Checking the ant 
version is easy because it's ant itself that does the checking.

There might be other component versions that should be checked too.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642056#comment-13642056
 ] 

Michael McCandless commented on SOLR-4761:
--

+1, I like SimpleMergedSegmentWarmer.  Maybe we should put that class in lucene 
core?  It seems generically useful and most users won't know the APIs to enum 
fields / touch the data structures...

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4764) When using NRT, just init the reader from IndexWriter

2013-04-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642072#comment-13642072
 ] 

Michael McCandless commented on SOLR-4764:
--

+1, this is very costly, because the first NRT open will open an entirely new 
set of SegmentReaders (not sharing anything from the non-NRT reader passed in 
to openIfChanged).


 When using NRT, just init the reader from IndexWriter
 -

 Key: SOLR-4764
 URL: https://issues.apache.org/jira/browse/SOLR-4764
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir

 Spinoff from SOLR-4761
 Solr first opens a DirectoryReader from the directory, then later will pass 
 this to IW openIfChanged.
 I noticed this when i was confused that mergedsegmentwarmer doesn't appear to 
 work at first until after you've reopened...
 I'm not totally sure what the current behavior causes (does IW's pool reuse 
 segments from this passed-in external reader, or is this causing some 
 horrible doubling-up/inefficient stuff etc?). To some extent i think we 
 should change it even if its actually performant: I think its confusing.
 I think ideally we'd change IndexReaderFactory's method to take writer 
 instead of directory so that custom DirectoryReaders can still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642087#comment-13642087
 ] 

Commit Tag Bot commented on LUCENE-4738:


[trunk commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1475905

LUCENE-4738: only CheckIndex when the last commit is  segments_1

 Killed JVM when first commit was running will generate a corrupted index
 

 Key: LUCENE-4738
 URL: https://issues.apache.org/jira/browse/LUCENE-4738
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64
 Java: java version 1.7.0_05
 Lucene: lucene-core-4.0.0 
Reporter: Billow Gao
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738.patch, 
 LUCENE-4738_test.patch


 1. Start a NEW IndexWriterBuilder on an empty folder,
add some documents to the index
 2. Call commit
 3. When the segments_1 file with 0 byte was created, kill the JVM
 We will end with a corrupted index with an empty segments_1.
 We only have issue with the first commit crash.
 Also, if you tried to open an IndexSearcher on a new index. And the first 
 commit on the index was not finished yet. Then you will see exception like:
 ===
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in 
 org.apache.lucene.store.MMapDirectory@C:\tmp\testdir 
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: 
 [write.lock, _0.fdt, _0.fdx]
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
 ===
 So when a new index was created, we should first create an empty index. We 
 should not wait for the commit/close call to create the segment file.
 If we had an empty index there. It won't leave a corrupted index when there 
 were a power issue on the first commit. 
 And a concurrent IndexSearcher can access to the index(No match is better 
 than exception).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642088#comment-13642088
 ] 

Commit Tag Bot commented on LUCENE-4738:


[branch_4x commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1475906

LUCENE-4738: only CheckIndex when the last commit is  segments_1

 Killed JVM when first commit was running will generate a corrupted index
 

 Key: LUCENE-4738
 URL: https://issues.apache.org/jira/browse/LUCENE-4738
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64
 Java: java version 1.7.0_05
 Lucene: lucene-core-4.0.0 
Reporter: Billow Gao
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738.patch, 
 LUCENE-4738_test.patch


 1. Start a NEW IndexWriterBuilder on an empty folder,
add some documents to the index
 2. Call commit
 3. When the segments_1 file with 0 byte was created, kill the JVM
 We will end with a corrupted index with an empty segments_1.
 We only have issue with the first commit crash.
 Also, if you tried to open an IndexSearcher on a new index. And the first 
 commit on the index was not finished yet. Then you will see exception like:
 ===
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in 
 org.apache.lucene.store.MMapDirectory@C:\tmp\testdir 
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: 
 [write.lock, _0.fdt, _0.fdx]
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
 ===
 So when a new index was created, we should first create an empty index. We 
 should not wait for the commit/close call to create the segment file.
 If we had an empty index there. It won't leave a corrupted index when there 
 were a power issue on the first commit. 
 And a concurrent IndexSearcher can access to the index(No match is better 
 than exception).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4762) Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach

2013-04-25 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642093#comment-13642093
 ] 

Shawn Heisey commented on SOLR-4762:


Tests and precommit pass.  I'm hoping someone can tell me whether this actually 
works for affected weblogic versions.

 Deploying on weblogic: java.lang.NoSuchMethodError: replaceEach
 ---

 Key: SOLR-4762
 URL: https://issues.apache.org/jira/browse/SOLR-4762
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Shawn Heisey
Assignee: Shawn Heisey
 Fix For: 5.0, 4.4

 Attachments: SOLR-4762.patch


 When a user tried to deploy on weblogic 10.3, they got this exception:
 {noformat}
 Error 500--Internal Server Error
 java.lang.NoSuchMethodError: replaceEach
 at 
 org.apache.solr.servlet.LoadAdminUiServlet.doGet(LoadAdminUiServlet.java:70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:821)
 at 
 weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
 at 
 weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
 at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
 at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:27)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:142)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
 at 
 weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
 at 
 weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(Unknown Source)
 at 
 weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
 at 
 weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
 at 
 weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
 {noformat}
 The solution to this problem appears to be adding the following to 
 weblogic.xml in WEB-INF:
 {noformat}
 container-descriptor
   prefer-web-inf-classestrue/prefer-web-inf-classes
 /container-descriptor
 {noformat}
 Since Solr's WEB-INF directory already contains this file and it already has 
 the container-descriptor tag, I'm hoping this is a benign change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4958) unnecessary assert on docid

2013-04-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642117#comment-13642117
 ] 

Michael McCandless commented on LUCENE-4958:


I think this assert is gone in 4.2?

 unnecessary assert on docid
 ---

 Key: LUCENE-4958
 URL: https://issues.apache.org/jira/browse/LUCENE-4958
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.1
Reporter: John Wang

 In DocFieldProcessor, on line 353, there is this assert:
   assert docValuesConsumerAndDocID.docID  docState.docID;
 Is this assert necessary? I don't see in the indexing pipeline that this 
 guarantee is needed. Can we remove this?
 We have implemented a custom indexingchain that rewrites docState.docID is 
 reverse order and it is working well. But we have to do ugly workarounds in 
 our test to avoid this assert.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-4761:
--

Attachment: SOLR-4761.patch

updated patch. I also put this guy in the test rotation.

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch, SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642124#comment-13642124
 ] 

Michael McCandless commented on SOLR-4761:
--

+1, looks great!  Thanks Rob.

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch, SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2013-04-25 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642139#comment-13642139
 ] 

Ryan McKinley commented on SOLR-4735:
-

This looks like it creates a new registry for every core (am I reading that 
wrong?)  If so, I think sharing one registry would be best.

Can the registry be in the CoreContainer rather then the core?

I guess that would involve some cleanup when a core is unloaded, but it would 
let us share a single registry across cores and other apps (the case I am 
actually concerned with)

 Improve Solr metrics reporting
 --

 Key: SOLR-4735
 URL: https://issues.apache.org/jira/browse/SOLR-4735
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-4735.patch, SOLR-4735.patch


 Following on from a discussion on the mailing list:
 http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+
 It would be good to make Solr play more nicely with existing devops 
 monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
 moment is poll-only, either via JMX or through the admin stats page.  I'd 
 like to refactor things a bit to make this more pluggable.
 This patch is a start.  It adds a new interface, InstrumentedBean, which 
 extends SolrInfoMBean to return a 
 [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
 couple of MetricReporters (which basically just duplicate the JMX and admin 
 page reporting that's there at the moment, but which should be more 
 extensible).  The patch includes a change to RequestHandlerBase showing how 
 this could work.  The idea would be to eventually replace the getStatistics() 
 call on SolrInfoMBean with this instead.
 The next step would be to allow more MetricReporters to be defined in 
 solrconfig.xml.  The Metrics library comes with ganglia and graphite 
 reporting modules, and we can add contrib plugins for both of those.
 There's some more general cleanup that could be done around SolrInfoMBean 
 (we've got two plugin handlers at /mbeans and /plugins that basically do the 
 same thing, and the beans themselves have some weirdly inconsistent data on 
 them - getVersion() returns different things for different impls, and 
 getSource() seems pretty useless), but maybe that's for another issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2013-04-25 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642148#comment-13642148
 ] 

Ryan McKinley commented on SOLR-4735:
-

ideally CoreContainer could have a function like:
{code:java} 
  MetricsRegistry createMetricsRegistry( ?? config ) {
return new MetricsRegistry();
  }
{code}

This would let other applications slip in their own registry -- that already 
has reporting hooked up!

 Improve Solr metrics reporting
 --

 Key: SOLR-4735
 URL: https://issues.apache.org/jira/browse/SOLR-4735
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-4735.patch, SOLR-4735.patch


 Following on from a discussion on the mailing list:
 http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+
 It would be good to make Solr play more nicely with existing devops 
 monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
 moment is poll-only, either via JMX or through the admin stats page.  I'd 
 like to refactor things a bit to make this more pluggable.
 This patch is a start.  It adds a new interface, InstrumentedBean, which 
 extends SolrInfoMBean to return a 
 [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
 couple of MetricReporters (which basically just duplicate the JMX and admin 
 page reporting that's there at the moment, but which should be more 
 extensible).  The patch includes a change to RequestHandlerBase showing how 
 this could work.  The idea would be to eventually replace the getStatistics() 
 call on SolrInfoMBean with this instead.
 The next step would be to allow more MetricReporters to be defined in 
 solrconfig.xml.  The Metrics library comes with ganglia and graphite 
 reporting modules, and we can add contrib plugins for both of those.
 There's some more general cleanup that could be done around SolrInfoMBean 
 (we've got two plugin handlers at /mbeans and /plugins that basically do the 
 same thing, and the beans themselves have some weirdly inconsistent data on 
 them - getVersion() returns different things for different impls, and 
 getSource() seems pretty useless), but maybe that's for another issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642161#comment-13642161
 ] 

Steve Rowe commented on LUCENE-4947:


bq. Just updating the thread to notify everyone that I've just e-mailed the ICA 
and code grant documents (and their GPG-related files) to secret...@apache.org.

I monitor commits to the ICLA and code grants record files, and neither the 
ICLA nor the code grant document has been recorded yet.  I'll post on this 
issue once the code grant has been recorded.

[~klawson88], did you send the code grant to legal-arch...@apache.org in 
addition to sending it to secret...@apache.org?  This is mentioned as a 
requirement in step 3 of the process section in 
[http://incubator.apache.org/ip-clearance/ip-clearance-template.html].

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson
 Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip


 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4959) Incorrect return value from SimpleNaiveBayesClassifier.assignClass

2013-04-25 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand reassigned LUCENE-4959:


Assignee: Adrien Grand

 Incorrect return value from SimpleNaiveBayesClassifier.assignClass 
 ---

 Key: LUCENE-4959
 URL: https://issues.apache.org/jira/browse/LUCENE-4959
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.0, 4.2.1
Reporter: Alexey Kutin
Assignee: Adrien Grand
  Labels: classification

 The local copy of BytesRef referenced by foundClass is affected by subsequent 
 TermsEnum.iterator.next() calls as the shared BytesRef.bytes changes. 
 If a term test gives a good match and a next term in the terms collection 
 is classification with a lower match score then the return result will be 
 clas

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4957) Stop IndexWriter from writing broken term vector offset data in 5.0

2013-04-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642201#comment-13642201
 ] 

Adrien Grand commented on LUCENE-4957:
--

+1

 Stop IndexWriter from writing broken term vector offset data in 5.0
 ---

 Key: LUCENE-4957
 URL: https://issues.apache.org/jira/browse/LUCENE-4957
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Today we allow this in (some analyzers are broken), and only reject them if 
 someone is indexing offsets into the postings lists.
 But we should ban this also when term vectors are enabled. Its time to stop 
 writing this broken data and let broken analyzers be broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-04-25 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642218#comment-13642218
 ] 

Dawid Weiss commented on LUCENE-4956:
-

That's because Christian has ninja superpowers.
http://goo.gl/5EPMr

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities

2013-04-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642219#comment-13642219
 ] 

Jan Høydahl commented on SOLR-2356:
---

In my opinion, DIH should be completely redesigned as a standalone webapp. It 
is a major design flaw that it is a RequestHandler within a Solr 
Core/collection.

As a standalone web app it could easily be deplyed on its own, talk to multiple 
collections and be parallellized.

 indexing using DataImportHandler does not use entire CPU capacities
 ---

 Key: SOLR-2356
 URL: https://issues.apache.org/jira/browse/SOLR-2356
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA
 Environment: intel xeon processor (4 cores), Debian Linux Lenny, 
 OpenJDK 64bits server v1.6.0
Reporter: colby
Priority: Minor
  Labels: test
   Original Estimate: 168h
  Remaining Estimate: 168h

 When I use a DataImportHandler to index a large number of documents (~35M), 
 cpu usage doesn't go over than 100% cpu (i.e. just one core).
 When I configure 4 threads for the entity tag, the cpu usage is splitted to 
 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
 I use solr embedded with jetty server.
 Is there a way to tune this feature in order to use all cores and improve 
 indexing performances ?
 Because for the moment, an extra script (PHP) gives better indexing 
 performances than DIH.
 thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene Solr 4.3.0 RC3

2013-04-25 Thread Ryan Ernst
-1

It seems SLF4j packaging is busted?  I thought I remembered slf4j jars were
removed from the war, in favor of putting them in the classpath.  But I see
slf4j jars in the maven war file, but not in the tgz war file.


On Thu, Apr 25, 2013 at 10:19 AM, Mark Miller markrmil...@gmail.com wrote:

 +1

 - Mark

 On Apr 23, 2013, at 7:50 AM, Simon Willnauer simon.willna...@gmail.com
 wrote:

  Here is a new RC candidate...
 
 
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/
 
  here is my +1
 
  thanks for voting...
 
  simon
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Updated] (LUCENE-4953) readerClosedListener is not invoked for ParallelCompositeReader's leaves

2013-04-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4953:
--

Attachment: LUCENE-4953.patch

Patch that adds the DONT_TOUCH_SUBREADERS mode.

I will now check the tests by enforcing the always wrapping with PCR, so bugs 
can be detected.

 readerClosedListener is not invoked for ParallelCompositeReader's leaves
 

 Key: LUCENE-4953
 URL: https://issues.apache.org/jira/browse/LUCENE-4953
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch, 
 LUCENE-4953.patch


 There was a test failure last night:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testBasic
 Error Message:
 testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): 
 Insane FieldCache usage(s) found expected:0 but was:2
 Stack Trace:
 java.lang.AssertionError: 
 testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): 
 Insane FieldCache usage(s) found expected:0 but was:2
 at 
 __randomizedtesting.SeedInfo.seed([1F9C2A2AD23A8E02:B466373F0DE6082C]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:592)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:55)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:722)
 Build Log:
 [...truncated 6904 lines...]
 [junit4:junit4] Suite: 
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest
 [junit4:junit4]   2 *** BEGIN 
 

[jira] [Commented] (LUCENE-4953) readerClosedListener is not invoked for ParallelCompositeReader's leaves

2013-04-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642302#comment-13642302
 ] 

Uwe Schindler commented on LUCENE-4953:
---

I checked the other tests by hardcoding maybeWrapReader to always wrap with 
ParallelCompositeReader at the end. No other failures.

I will commit this tomorrow.

 readerClosedListener is not invoked for ParallelCompositeReader's leaves
 

 Key: LUCENE-4953
 URL: https://issues.apache.org/jira/browse/LUCENE-4953
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4953.patch, LUCENE-4953.patch, LUCENE-4953.patch, 
 LUCENE-4953.patch


 There was a test failure last night:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testBasic
 Error Message:
 testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): 
 Insane FieldCache usage(s) found expected:0 but was:2
 Stack Trace:
 java.lang.AssertionError: 
 testBasic(org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest): 
 Insane FieldCache usage(s) found expected:0 but was:2
 at 
 __randomizedtesting.SeedInfo.seed([1F9C2A2AD23A8E02:B466373F0DE6082C]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:592)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:55)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:722)
 Build Log:
 [...truncated 6904 lines...]
 [junit4:junit4] Suite: 
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest
 [junit4:junit4]   2 *** BEGIN 
 

[jira] [Updated] (LUCENE-4955) NGramTokenFilter increments positions for each gram

2013-04-25 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4955:
-

Attachment: LUCENE-4955.patch

I tried to iterate on Simon's patch:

 * NGramTokenFilter doesn't modify offsets and emits all n-grams of a single 
term at the same position

 * NGramTokenizer uses a sliding window.

 * NGramTokenizer and NGramTokenFilter removed from TestRandomChains exclusions.

It was very hard to add the compatibility version support to NGramTokenizer so 
there are now two distinct classes and the factory picks the right one 
depending on the Lucene match version.

Simon's highlighting test now fails because the highlighted content is 
different, but not because of a broken token stream.

 NGramTokenFilter increments positions for each gram
 ---

 Key: LUCENE-4955
 URL: https://issues.apache.org/jira/browse/LUCENE-4955
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.3
Reporter: Simon Willnauer
 Fix For: 5.0, 4.4

 Attachments: highlighter-test.patch, highlighter-test.patch, 
 LUCENE-4955.patch, LUCENE-4955.patch


 NGramTokenFilter increments positions for each gram rather for the actual 
 token which can lead to rather funny problems especially with highlighting. 
 if this filter should be used for highlighting is a different story but today 
 this seems to be a common practice in many situations to highlight sub-term 
 matches.
 I have a test for highlighting that uses ngram failing with a StringIOOB 
 since tokens are sorted by position which causes offsets to be mixed up due 
 to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.

2013-04-25 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-949:
---

Attachment: LUCENE-949.patch

Refactored a bit and added a few more tests.

 AnalyzingQueryParser can't work with leading wildcards.
 ---

 Key: LUCENE-949
 URL: https://issues.apache.org/jira/browse/LUCENE-949
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 2.2
Reporter: Stefan Klein
 Attachments: LUCENE-949.patch, LUCENE-949.patch


 The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following 
 changes to accept leading wildcards:
   protected Query getWildcardQuery(String field, String termStr) throws 
 ParseException
   {
   String useTermStr = termStr;
   String leadingWildcard = null;
   if (*.equals(field))
   {
   if (*.equals(useTermStr))
   return new MatchAllDocsQuery();
   }
   boolean hasLeadingWildcard = (useTermStr.startsWith(*) || 
 useTermStr.startsWith(?)) ? true : false;
   if (!getAllowLeadingWildcard()  hasLeadingWildcard)
   throw new ParseException('*' or '?' not allowed as 
 first character in WildcardQuery);
   if (getLowercaseExpandedTerms())
   {
   useTermStr = useTermStr.toLowerCase();
   }
   if (hasLeadingWildcard)
   {
   leadingWildcard = useTermStr.substring(0, 1);
   useTermStr = useTermStr.substring(1);
   }
   List tlist = new ArrayList();
   List wlist = new ArrayList();
   /*
* somewhat a hack: find/store wildcard chars in order to put 
 them back
* after analyzing
*/
   boolean isWithinToken = (!useTermStr.startsWith(?)  
 !useTermStr.startsWith(*));
   isWithinToken = true;
   StringBuffer tmpBuffer = new StringBuffer();
   char[] chars = useTermStr.toCharArray();
   for (int i = 0; i  useTermStr.length(); i++)
   {
   if (chars[i] == '?' || chars[i] == '*')
   {
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = false;
   }
   else
   {
   if (!isWithinToken)
   {
   wlist.add(tmpBuffer.toString());
   tmpBuffer.setLength(0);
   }
   isWithinToken = true;
   }
   tmpBuffer.append(chars[i]);
   }
   if (isWithinToken)
   {
   tlist.add(tmpBuffer.toString());
   }
   else
   {
   wlist.add(tmpBuffer.toString());
   }
   // get Analyzer from superclass and tokenize the term
   TokenStream source = getAnalyzer().tokenStream(field, new 
 StringReader(useTermStr));
   org.apache.lucene.analysis.Token t;
   int countTokens = 0;
   while (true)
   {
   try
   {
   t = source.next();
   }
   catch (IOException e)
   {
   t = null;
   }
   if (t == null)
   {
   break;
   }
   if (!.equals(t.termText()))
   {
   try
   {
   tlist.set(countTokens++, t.termText());
   }
   catch (IndexOutOfBoundsException ioobe)
   {
   countTokens = -1;
   }
   }
   }
   try
   {
   source.close();
   }
   catch (IOException e)
   {
   // ignore
   }
   if (countTokens != tlist.size())
   {

[jira] [Updated] (SOLR-4705) HttpShardHandler null point exception

2013-04-25 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4705:
---

Attachment: SOLR-4705.patch

Thanks for the patch Raintung!

I've updated it to include test cases of all the various shards param 
possibilities to future proof us against similar bugs down the line.

i'm still running a bunch of iterations to verify the test itself isn't flawed, 
and then i'll commit

 HttpShardHandler null point exception
 -

 Key: SOLR-4705
 URL: https://issues.apache.org/jira/browse/SOLR-4705
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.2, 4.2.1
Reporter: Raintung Li
Priority: Minor
 Attachments: patch-4705.txt, SOLR-4705.patch


 Call search URL;
 select?q=testshards=ip/solr/
 checkDistributed method throw null pointer exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4705) HttpShardHandler null point exception

2013-04-25 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4705:
---

Fix Version/s: 4.4
 Assignee: Hoss Man

 HttpShardHandler null point exception
 -

 Key: SOLR-4705
 URL: https://issues.apache.org/jira/browse/SOLR-4705
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.2, 4.2.1
Reporter: Raintung Li
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.4

 Attachments: patch-4705.txt, SOLR-4705.patch


 Call search URL;
 select?q=testshards=ip/solr/
 checkDistributed method throw null pointer exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642519#comment-13642519
 ] 

Commit Tag Bot commented on SOLR-4761:
--

[trunk commit] rmuir
http://svn.apache.org/viewvc?view=revisionrevision=1476026

SOLR-4761: add option to plug in mergedSegmentWarmer

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch, SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642527#comment-13642527
 ] 

Commit Tag Bot commented on SOLR-4761:
--

[branch_4x commit] rmuir
http://svn.apache.org/viewvc?view=revisionrevision=1476030

SOLR-4761: add option to plug in mergedSegmentWarmer

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: SOLR-4761.patch, SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4761) add option to plug in mergedsegmentwarmer

2013-04-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-4761.
---

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

 add option to plug in mergedsegmentwarmer
 -

 Key: SOLR-4761
 URL: https://issues.apache.org/jira/browse/SOLR-4761
 Project: Solr
  Issue Type: New Feature
Reporter: Robert Muir
 Fix For: 5.0, 4.4

 Attachments: SOLR-4761.patch, SOLR-4761.patch


 This is pretty expert, but can be useful in some cases. 
 We can also provide a simple minimalist implementation that just ensures 
 datastructures are primed so the first queries aren't e.g. causing norms to 
 be read from disk etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4766) smoketester to check war files have the same contents

2013-04-25 Thread Robert Muir (JIRA)
Robert Muir created SOLR-4766:
-

 Summary: smoketester to check war files have the same contents
 Key: SOLR-4766
 URL: https://issues.apache.org/jira/browse/SOLR-4766
 Project: Solr
  Issue Type: Test
  Components: Build
Affects Versions: 4.3
Reporter: Robert Muir
 Fix For: 4.3


As Ryan points out on [VOTE] Lucene Solr 4.3.0 RC3 thread, somehow the .war 
file in the binary packaging has different contents than the maven one (in 
particular, one contains logging jars, the other does not).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4766) smoketester to check war files have the same contents

2013-04-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642540#comment-13642540
 ] 

Robert Muir commented on SOLR-4766:
---

My initial idea is where we check jars and wars in 
checkIdenticalMavenArtifacts, to compare their zip TOCs and ensure they have 
the same sets of files.

 smoketester to check war files have the same contents
 -

 Key: SOLR-4766
 URL: https://issues.apache.org/jira/browse/SOLR-4766
 Project: Solr
  Issue Type: Test
  Components: Build
Affects Versions: 4.3
Reporter: Robert Muir
 Fix For: 4.3


 As Ryan points out on [VOTE] Lucene Solr 4.3.0 RC3 thread, somehow the .war 
 file in the binary packaging has different contents than the maven one (in 
 particular, one contains logging jars, the other does not).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2356) indexing using DataImportHandler does not use entire CPU capacities

2013-04-25 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642576#comment-13642576
 ] 

Shalin Shekhar Mangar commented on SOLR-2356:
-

bq. In my opinion, DIH should be completely redesigned as a standalone webapp. 
It is a major design flaw that it is a RequestHandler within a Solr 
Core/collection.

Actually, DIH started as a standalone webapp inside AOL. We changed it because 
we didn't want to duplicate the schema in two places and also because we wanted 
to have it available by default in Solr installations. Another web app means 
you need to procure hardware, plan capacity/failover, create firewall holes etc

bq. As a standalone web app it could easily be deplyed on its own, talk to 
multiple collections and be parallellized.

Talking to multiple collections was never a goal for DIH -- I'm not sure what 
value it will bring. The multi-threading support in DIH can use a lot of 
improvement for sure.

 indexing using DataImportHandler does not use entire CPU capacities
 ---

 Key: SOLR-2356
 URL: https://issues.apache.org/jira/browse/SOLR-2356
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0-ALPHA
 Environment: intel xeon processor (4 cores), Debian Linux Lenny, 
 OpenJDK 64bits server v1.6.0
Reporter: colby
Priority: Minor
  Labels: test
   Original Estimate: 168h
  Remaining Estimate: 168h

 When I use a DataImportHandler to index a large number of documents (~35M), 
 cpu usage doesn't go over than 100% cpu (i.e. just one core).
 When I configure 4 threads for the entity tag, the cpu usage is splitted to 
 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
 I use solr embedded with jetty server.
 Is there a way to tune this feature in order to use all cores and improve 
 indexing performances ?
 Because for the moment, an extra script (PHP) gives better indexing 
 performances than DIH.
 thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org