date:20150121


 [ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-6928:


Assignee: Timothy Potter

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor

 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2521 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2521/

4 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.testDistribSearch

Error Message:
org.apache.http.NoHttpResponseException: The target server failed to respond

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.http.NoHttpResponseException: The target server failed to respond
at 
__randomizedtesting.SeedInfo.seed([77AD558AFE563DA6:F64BDB9289095D9A]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:871)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:736)
at 
org.apache.solr.cloud.HttpPartitionTest.sendDoc(HttpPartitionTest.java:480)
at 
org.apache.solr.cloud.HttpPartitionTest.testRf2(HttpPartitionTest.java:201)
at 
org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:114)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at

Re: [JENKINS] Lucene-Solr-5.x-Linux (64bit/ibm-j9-jdk7) - Build # 11491 - Failure!

2015-01-21 Thread Chris Hostetter



this looks like problems noted in SOLR-6915 wih the kerberose stuff and 
the IBM JDK.

miller? gregory? ... were you guys going to disable these tests on IBM 
JDKs or not?

It's one thing to say a certain feature only works with Oracle JVMs, but 
it's going to suck if 5.0 goes out and we know that the Solr tests will 
reliable fail 100% of the time on IBM JDKs



: Date: Wed, 21 Jan 2015 05:30:55 + (UTC)
: From: Policeman Jenkins Server jenk...@thetaphi.de
: Reply-To: dev@lucene.apache.org
: To: sar...@gmail.com, dev@lucene.apache.org
: Subject: [JENKINS] Lucene-Solr-5.x-Linux (64bit/ibm-j9-jdk7) - Build # 11491 -
:  Failure!
: 
: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11491/
: Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}
: 
: 1 tests failed.
: FAILED:  org.apache.solr.cloud.SaslZkACLProviderTest.testSaslZkACLProvider
: 
: Error Message:
: Could not get the port for ZooKeeper server
: 
: Stack Trace:
: java.lang.RuntimeException: Could not get the port for ZooKeeper server
:   at org.apache.solr.cloud.ZkTestServer.run(ZkTestServer.java:482)
:   at 
org.apache.solr.cloud.SaslZkACLProviderTest$SaslZkTestServer.run(SaslZkACLProviderTest.java:206)
:   at 
org.apache.solr.cloud.SaslZkACLProviderTest.setUp(SaslZkACLProviderTest.java:74)
:   at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
:   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
:   at java.lang.reflect.Method.invoke(Method.java:619)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:861)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
:   at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
:   at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
:   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1943 - Failure!

2015-01-21 Thread Chris Hostetter


It's our old friend SOLR-6387 !

This time manifesting itself via calls down into Tika.

My best guess is that something changed in the recently upgraded version 
of Tika in Solr so that we now tickle this ExternalParser code path in a 
way we never tickled it before ... but more perplexing is why i can't 
reproduce any similar errors on trunk (or 5x) using ant test 
-Dtests.slow=true -Dtests.locale=tr_TR in solr/contrib/extraction ?

Does Tika already do some special platform/locale detection internally 
that is bypassing this error for most people, and only manifesting on 
MacOSX?

can a mac user try to reproduce this?


: Date: Wed, 21 Jan 2015 06:47:13 + (UTC)
: From: Policeman Jenkins Server jenk...@thetaphi.de
: Reply-To: dev@lucene.apache.org
: To: rm...@apache.org, ans...@apache.org, sha...@apache.org, no...@apache.org,
: gcha...@apache.org, ehatc...@apache.org, tflo...@apache.org,
: sar...@gmail.com, dev@lucene.apache.org
: Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1943 -
: Failure!
: 
: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1943/
: Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
: 
: 1 tests failed.
: FAILED:  
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath
: 
: Error Message:
: posix_spawn is not a supported process launch mechanism on this platform.
: 
: Stack Trace:
: java.lang.Error: posix_spawn is not a supported process launch mechanism on 
this platform.
:   at 
__randomizedtesting.SeedInfo.seed([58A6FBEB77E81527:26F735786F5F7761]:0)
:   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
:   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
:   at java.security.AccessController.doPrivileged(Native Method)
:   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
:   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
:   at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
:   at java.lang.Runtime.exec(Runtime.java:620)
:   at java.lang.Runtime.exec(Runtime.java:485)
:   at 
org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
:   at 
org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
:   at 
org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
:   at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
:   at 
org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
:   at 
org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
:   at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
:   at 
org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
:   at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
:   at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
:   at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
:   at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
:   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
:   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
:   at 
org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:353)
:   at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocalFromHandler(ExtractingRequestHandlerTest.java:703)
:   at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:710)
:   at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath(ExtractingRequestHandlerTest.java:474)
:   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
:   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
:   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
:   at java.lang.reflect.Method.invoke(Method.java:483)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
:   at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

[
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286059#comment-14286059
]

Uwe Schindler commented on SOLR-6991:
-

This is in fact the problem with spawning external processes. This is not new,
also TIKA 1.6 had parsers that spawned processes. I just think we never hit
this because this one is different: The parser spawns a process while
initializing (to inspect the system). The other Spawning parsers are only
executed as needed. ExternalParser exists since a long time in TIKA.

I would not roll back to TIKA 1.5 because the new TIKA is much better than this
one (regarding bugs). In fact we should maybe disable this tests with the
well-known assume (trunk, 5.x, 5.0). In fact, I would suggest to add a note to
the ref guide, so people know what this means. This is unfortunately a bug in
the JVM, so this is not really our or TIKA's fault.

In fact, as written in my Blog post about Locale issues: Most Turkish system
administrators don't run servers with the turkish locale :-) Its just too
broken with lots of software.

Update to Apache TIKA 1.7
-

Key: SOLR-6991
URL: https://issues.apache.org/jira/browse/SOLR-6991
Project: Solr
Issue Type: Improvement
Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 5.0, Trunk, 5.1

Attachments: SOLR-6991.patch, SOLR-6991.patch

Apache TIKA 1.7 was released:
[https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
This is more or less a dependency update, so replacements. Not sure if we
should do this for 5.0. In 5.0 we currently have the previous version, which
was not yet released with Solr. If we now bring this into 5.0, we wouldn't
have a new release 2 times. I can change the stuff this evening and let it
bake in 5.x, so maybe we backport this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading


 [ 
https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6521:
-
Attachment: SOLR-6521.patch

 CloudSolrServer should synchronize cache cluster state loading
 --

 Key: SOLR-6521
 URL: https://issues.apache.org/jira/browse/SOLR-6521
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
Priority: Critical
  Labels: SolrCloud
 Fix For: 5.0, Trunk

 Attachments: SOLR-6521.patch, SOLR-6521.patch


 Under heavy load-testing with the new solrj client that caches the cluster 
 state instead of setting a watcher, I started seeing lots of zk connection 
 loss on the client-side when refreshing the CloudSolrServer 
 collectionStateCache, and this was causing crazy client-side 99.9% latency 
 (~15 sec). I swapped the cache out with guava's LoadingCache (which does 
 locking to ensure only one thread loads the content under one key while the 
 other threads that want the same key wait) and the connection loss went away 
 and the 99.9% latency also went down to just about 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading

2015-01-21 Thread Jessica Cheng Mallet (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286062#comment-14286062
 ] 

Jessica Cheng Mallet commented on SOLR-6521:


bq. I agree that the concurrency can be dramatically improved . Using Guava may 
not be an option because it is not yet a dependency on SolrJ. The other option 
would be to make the cache pluggable through an API . So ,if you have Guava or 
something else in your package you can plug it in through an API

That'd be awesome!

 CloudSolrServer should synchronize cache cluster state loading
 --

 Key: SOLR-6521
 URL: https://issues.apache.org/jira/browse/SOLR-6521
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
Priority: Critical
  Labels: SolrCloud
 Fix For: 5.0, Trunk

 Attachments: SOLR-6521.patch


 Under heavy load-testing with the new solrj client that caches the cluster 
 state instead of setting a watcher, I started seeing lots of zk connection 
 loss on the client-side when refreshing the CloudSolrServer 
 collectionStateCache, and this was causing crazy client-side 99.9% latency 
 (~15 sec). I swapped the cache out with guava's LoadingCache (which does 
 locking to ensure only one thread loads the content under one key while the 
 other threads that want the same key wait) and the connection loss went away 
 and the 99.9% latency also went down to just about 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286065#comment-14286065
 ] 

Uwe Schindler commented on SOLR-6991:
-

In fact you can select parsers using a config file / SetString. But this 
makes updaing horrible, because we have to revisit the list on each TIKA 
update...

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading


 [ 
https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6521:
-
Attachment: (was: SOLR-6521.patch)

 CloudSolrServer should synchronize cache cluster state loading
 --

 Key: SOLR-6521
 URL: https://issues.apache.org/jira/browse/SOLR-6521
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
Priority: Critical
  Labels: SolrCloud
 Fix For: 5.0, Trunk

 Attachments: SOLR-6521.patch, SOLR-6521.patch


 Under heavy load-testing with the new solrj client that caches the cluster 
 state instead of setting a watcher, I started seeing lots of zk connection 
 loss on the client-side when refreshing the CloudSolrServer 
 collectionStateCache, and this was causing crazy client-side 99.9% latency 
 (~15 sec). I swapped the cache out with guava's LoadingCache (which does 
 locking to ensure only one thread loads the content under one key while the 
 other threads that want the same key wait) and the connection loss went away 
 and the 99.9% latency also went down to just about 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading


 [ 
https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6521:
-
Attachment: SOLR-6521.patch

hi [~mewmewball]

This patch increases the parallelism and makes it tunable. 

 CloudSolrServer should synchronize cache cluster state loading
 --

 Key: SOLR-6521
 URL: https://issues.apache.org/jira/browse/SOLR-6521
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
Priority: Critical
  Labels: SolrCloud
 Fix For: 5.0, Trunk

 Attachments: SOLR-6521.patch, SOLR-6521.patch


 Under heavy load-testing with the new solrj client that caches the cluster 
 state instead of setting a watcher, I started seeing lots of zk connection 
 loss on the client-side when refreshing the CloudSolrServer 
 collectionStateCache, and this was causing crazy client-side 99.9% latency 
 (~15 sec). I swapped the cache out with guava's LoadingCache (which does 
 locking to ensure only one thread loads the content under one key while the 
 other threads that want the same key wait) and the connection loss went away 
 and the 99.9% latency also went down to just about 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-6991:

Attachment: SOLR-6991-forkfix.patch

This disables the test... Just copypasted from map-reduce/morphlines/

In fact this is not TIKA's issue and not new, a lot of stuff around Hadoop in 
Solr fails with Turkish!

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285904#comment-14285904
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653606 from [~mikemccand] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653606 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1943 - Failure!

2015-01-21 Thread Steve Rowe

Reproduces for me on OS X 10.10, Oracle JDK 1.8.0_20:

=
   [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=ExtractingRequestHandlerTest -Dtests.method=testExtraction 
-Dtests.seed=98ABBA97C7FD5F1C -Dtests.slow=true -Dtests.locale=tr_TR 
-Dtests.timezone=America/Nome -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.49s | ExtractingRequestHandlerTest.testExtraction 
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
process launch mechanism on this platform.
   [junit4]at 
__randomizedtesting.SeedInfo.seed([98ABBA97C7FD5F1C:21D8CEE9BBD58FE9]:0)
   [junit4]at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]at java.security.AccessController.doPrivileged(Native 
Method)
   [junit4]at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]at 
org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]at 
org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]at 
org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]at 
org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   [junit4]at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
   [junit4]at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   [junit4]at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
   [junit4]at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
   [junit4]at 
org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:353)
   [junit4]at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocalFromHandler(ExtractingRequestHandlerTest.java:703)
   [junit4]at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:710)
   [junit4]at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:59)
   [junit4]at java.lang.Thread.run(Thread.java:745)
=

I’ll try reverting to just before the Tika upgrade and see if it still happens.

Steve


 On Jan 21, 2015, at 1:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote:
 
 
 It's our old friend SOLR-6387 !
 
 This time manifesting itself via calls down into Tika.
 
 My best guess is that something changed in the recently upgraded version 
 of Tika in Solr so that we now tickle this ExternalParser code path in a 
 way we never tickled it before ... but more perplexing is why i can't 
 reproduce any similar errors on trunk (or 5x) using ant test 
 -Dtests.slow=true -Dtests.locale=tr_TR in solr/contrib/extraction ?
 
 Does Tika already do some special platform/locale detection internally 
 that is bypassing this error for most people, and only manifesting on 
 MacOSX?
 
 can a mac user try to reproduce this?
 
 
 : Date: Wed, 21 Jan 2015 06:47:13 + (UTC)
 : From: Policeman Jenkins Server jenk...@thetaphi.de
 : Reply-To: dev@lucene.apache.org
 : To: rm...@apache.org, ans...@apache.org, sha...@apache.org, 
 no...@apache.org,
 : gcha...@apache.org, ehatc...@apache.org, tflo...@apache.org,
 : sar...@gmail.com, dev@lucene.apache.org
 : Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1943 
 -
 : Failure!
 : 
 : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1943/
 : Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
 : 
 : 1 tests failed.
 : FAILED:  
 org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath
 : 
 : Error Message:
 : posix_spawn is not a supported process launch mechanism on this platform.
 : 
 : Stack Trace:
 : java.lang.Error:

[jira] [Commented] (SOLR-6993) install_solr_service.sh won't install on RHEL / CentOS

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285886#comment-14285886
 ] 

ASF subversion and git services commented on SOLR-6993:
---

Commit 1653601 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1653601 ]

SOLR-6993: install_solr_service.sh won't install on RHEL / CentOS

 install_solr_service.sh won't install on RHEL / CentOS
 --

 Key: SOLR-6993
 URL: https://issues.apache.org/jira/browse/SOLR-6993
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0, Trunk
 Environment: RHEL 6.5 / CentOS 6.5
Reporter: David Anderson
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6993.patch


 There's a bug that will prevent install_solr_service.sh from working on RHEL 
 / CentOS 6.5.  It works on Ubuntu 14.  Appears to be some obscure difference 
 in bash expression evaluation behavior.
 line 87 and 89:SOLR_DIR=${SOLR_INSTALL_FILE:0:-4}
 blows up with this error:
 ./install_solr_service.sh: line 87: -4: substring expression  0
 this results in the archive not being extracted and rest of the script won't 
 work.
 I tested a simple change:
   SOLR_DIR=${SOLR_INSTALL_FILE%.tgz}
 and verified it works on both RHEL 6.5 and Ubuntu 14
 Patch is attached.  I set this to Major thinking that not being able to 
 install on CentOS is worth fixing prior to release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6993) install_solr_service.sh won't install on RHEL / CentOS


 [ 
https://issues.apache.org/jira/browse/SOLR-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-6993.
--
Resolution: Fixed

 install_solr_service.sh won't install on RHEL / CentOS
 --

 Key: SOLR-6993
 URL: https://issues.apache.org/jira/browse/SOLR-6993
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0, Trunk
 Environment: RHEL 6.5 / CentOS 6.5
Reporter: David Anderson
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6993.patch


 There's a bug that will prevent install_solr_service.sh from working on RHEL 
 / CentOS 6.5.  It works on Ubuntu 14.  Appears to be some obscure difference 
 in bash expression evaluation behavior.
 line 87 and 89:SOLR_DIR=${SOLR_INSTALL_FILE:0:-4}
 blows up with this error:
 ./install_solr_service.sh: line 87: -4: substring expression  0
 this results in the archive not being extracted and rest of the script won't 
 work.
 I tested a simple change:
   SOLR_DIR=${SOLR_INSTALL_FILE%.tgz}
 and verified it works on both RHEL 6.5 and Ubuntu 14
 Patch is attached.  I set this to Major thinking that not being able to 
 install on CentOS is worth fixing prior to release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6845) Add buildOnStartup option for suggesters

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-6845:

Fix Version/s: 5.1
   Trunk

 Add buildOnStartup option for suggesters
 

 Key: SOLR-6845
 URL: https://issues.apache.org/jira/browse/SOLR-6845
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Tomás Fernández Löbbe
 Fix For: Trunk, 5.1

 Attachments: SOLR-6845.patch, SOLR-6845.patch, SOLR-6845.patch


 SOLR-6679 was filed to track the investigation into the following problem...
 {panel}
 The stock solrconfig provides a bad experience with a large index... start up 
 Solr and it will spin at 100% CPU for minutes, unresponsive, while it 
 apparently builds a suggester index.
 ...
 This is what I did:
 1) indexed 10M very small docs (only takes a few minutes).
 2) shut down Solr
 3) start up Solr and watch it be unresponsive for over 4 minutes!
 I didn't even use any of the fields specified in the suggester config and I 
 never called the suggest request handler.
 {panel}
 ..but ultimately focused on removing/disabling the suggester from the sample 
 configs.
 Opening this new issue to focus on actually trying to identify the root 
 problem  fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6134) fix typos


[ 
https://issues.apache.org/jira/browse/LUCENE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285950#comment-14285950
 ] 

ASF subversion and git services commented on LUCENE-6134:
-

Commit 1653615 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653615 ]

LUCENE-6134: fix typos in lucene/CHANGES.txt and solr/CHANGES.txt (merged trunk 
r1653612)

 fix typos
 -

 Key: LUCENE-6134
 URL: https://issues.apache.org/jira/browse/LUCENE-6134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Trivial
 Attachments: LUCENE-6134-CHANGES.txt-s.patch, 
 LUCENE-6134-its-its.patch, 
 LUCENE-6134-necessary-whether-initializ-specified.patch


 I found a bunch of typos, will fix under this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286101#comment-14286101
 ] 

Uwe Schindler commented on SOLR-6991:
-

FYI: SolrCellMorphlineTest is already disabled by the same assume, so this is 
the only broken one.

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6134) fix typos


[ 
https://issues.apache.org/jira/browse/LUCENE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285939#comment-14285939
 ] 

ASF subversion and git services commented on LUCENE-6134:
-

Commit 1653612 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1653612 ]

LUCENE-6134: fix typos in lucene/CHANGES.txt and solr/CHANGES.txt

 fix typos
 -

 Key: LUCENE-6134
 URL: https://issues.apache.org/jira/browse/LUCENE-6134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Trivial
 Attachments: LUCENE-6134-CHANGES.txt-s.patch, 
 LUCENE-6134-its-its.patch, 
 LUCENE-6134-necessary-whether-initializ-specified.patch


 I found a bunch of typos, will fix under this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.8.0) - Build # 1902 - Still Failing!

2015-01-21 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/1902/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

2 tests failed.
FAILED:  org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch

Error Message:
There were too many update fails - we expect it can happen, but shouldn't easily

Stack Trace:
java.lang.AssertionError: There were too many update fails - we expect it can 
happen, but shouldn't easily
at 
__randomizedtesting.SeedInfo.seed([9FCAAA6FE82229C2:1E2C24779F7D49FE]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at 
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:224)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286020#comment-14286020
 ] 

Steve Rowe commented on SOLR-6991:
--

I can reproduce this on OS X 10.10 using Oracle JDK 1.8.0_20.

When I revert back to r1652741 (just before the first commit under this issue), 
all solr-cell tests pass using the following (same thing that fails 100% for me 
with current trunk):

{noformat}
ant clean
cd solr/contrib/extraction
ant test -Dtests.slow=true -Dtests.locale=tr_TR
{noformat}

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Set path to JRE / JDK in code

2015-01-21 Thread Petrus Hyvönen

Hi,

It is primary on my wrapped library that this applies, for so it can be
easily installed.

Yes on windows the JRE is found using the PATH. I haven't been able to
locate where this is done, as I would prefer to use a direct variable or
separate environment variable for this to keep it boxed from the system
variables.

I have however found a way now that works fine based on updating the PATH
as a first thing in the __init__.py file in the wrapped library by
prepending:

import os
os.environ[PATH] = rmypath\jre\bin\server + os.pathsep +
os.environ[PATH]

the installer takes care of updating the path to the right place later on.

Many thanks
/Petrus



On Tue, Jan 20, 2015 at 7:14 PM, Andi Vajda va...@apache.org wrote:


 On Tue, 20 Jan 2015, Petrus Hyvönen wrote:

  Hi,

 I'm trying to package a wrapped library together with a non system-wide
 java JDK so that it can be easily installed.

 Can I somehow direct which JDK to use besides using JCC_JDK and putting
 the
 JRE in the PATH (I'm currently under windows)? The JCC_JDK could be
 patched
 in the setup.py but the PATH JRE that is accessed during running the
 wrapped library I don't understand where it is accessed, or how to patch
 this?


 So you're asking how to control where to pickup the JRE DLLs (on Windows)
 at runtime ? If I remember correctly, on Windows you just set the Path
 environment variable, no ?

  For example it would be good to have this in the config.py file if
 possible?


 If you're sure config.py is run _before_ any JRE DLL is loaded, you might
 be able to change the Path fron there too.

 Andi..



 Any thoughts or someone who's done this already?

 Regards
 /Petrus


 --
 _
 Petrus Hyvönen, Uppsala, Sweden
 Mobile Phone/SMS:+46 73 803 19 00




-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286111#comment-14286111
 ] 

Hoss Man commented on SOLR-6991:


bq. In fact this is not TIKA's issue and not new, a lot of stuff around Hadoop 
in Solr fails with Turkish!

...my point is: it's new to Solr.

in all other cases where POSIX_SPAWN impacts Solr, we either:
* deal with it in the solr code, so we give a meaningful error to the user 
explaining the problem (ie: SystemInfoHandler)
* it's in an optional feature that *NEVER* worked with turkish -- ie: the 
hadoop / morephlines contribs, from the first version it was available in Solr, 
would not work with turkish locale

...in this case, we're talking about an _existing_ solr feature, that has 
previously worked fine if you run older Solr with turkish, and now when 
upgrading to 5.0 you're going to get a weird error message.

if there's nothing better we can do keep the ExtractionRequestHandler working 
or users who upgrade (even if they run with turkish) then i'm fine with assumes 
in the tests and notes in the docs ... i was just hoping you'd have a better 
idea.

in particular: I'm still wondering if we can leverage the classpath in a way to 
override the default TesseractOCRConfig.properties file in the tika-parsers 
jar with our own version that prevents tesseract from being used.  (i agree 
it's not worth switching to explicitly whitelisting the parsers in Solr code, 
but is there an easy way to blacklist this parser and/or other parsers we know 
are problematic?)


 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6960) Config reporting handler is missing initParams defaults

2015-01-21 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285895#comment-14285895
 ] 

Alexandre Rafalovitch commented on SOLR-6960:
-

This should really be a blocker for 5.0 as it affects the default example 
collections. Without this, we cannot claim to actually have Config Report 
Handler as a new feature.

 Config reporting handler is missing initParams defaults
 ---

 Key: SOLR-6960
 URL: https://issues.apache.org/jira/browse/SOLR-6960
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Alexandre Rafalovitch
 Fix For: 5.0


 *curl http://localhost:8983/solr/techproducts/config/requestHandler* produces 
 (fragments):
 {quote}
   /update:{
 name:/update,
 class:org.apache.solr.handler.UpdateRequestHandler,
 defaults:{}},
   /update/json/docs:{
 name:/update/json/docs,
 class:org.apache.solr.handler.UpdateRequestHandler,
 defaults:{
   update.contentType:application/json,
   json.command:false}},
 {quote}
 Where are the defaults from initParams:
 {quote}
 initParams path=/update/**,/query,/select,/tvrh,/elevate,/spell,/browse
 lst name=defaults
   str name=dftext/str
 /lst
 /initParams
   initParams path=/update/json/docs
 lst name=defaults
   str name=srcField\_src_/str
   str name=mapUniqueKeyOnlytrue/str
 /lst
   /initParams
 {quote}
 Obviously, a test is missing as well to catch this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1943 - Failure!

2015-01-21 Thread Anshum Gupta

Is this a Tika 1.7 upgrade related failure?

On Tue, Jan 20, 2015 at 10:47 PM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1943/
 Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

 1 tests failed.
 FAILED:
 org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath

 Error Message:
 posix_spawn is not a supported process launch mechanism on this platform.

 Stack Trace:
 java.lang.Error: posix_spawn is not a supported process launch mechanism
 on this platform.
 at
 __randomizedtesting.SeedInfo.seed([58A6FBEB77E81527:26F735786F5F7761]:0)
 at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
 at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
 at java.lang.ProcessImpl.start(ProcessImpl.java:130)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 at java.lang.Runtime.exec(Runtime.java:620)
 at java.lang.Runtime.exec(Runtime.java:485)
 at
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
 at
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
 at
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
 at
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
 at
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
 at
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
 at
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
 at
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
 at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
 at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
 at
 org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:353)
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocalFromHandler(ExtractingRequestHandlerTest.java:703)
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:710)
 at
 org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath(ExtractingRequestHandlerTest.java:474)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
 at
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
 at

[JENKINS] Lucene-Solr-5.x-Linux (32bit/jdk1.7.0_72) - Build # 11495 - Failure!

2015-01-21 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11495/
Java: 32bit/jdk1.7.0_72 -server -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.cloud.ReplicationFactorTest.testDistribSearch

Error Message:
Error from server at http://127.0.0.1:57852/sus/za/repfacttest_c8n_2x2: The 
target server failed to respond

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:57852/sus/za/repfacttest_c8n_2x2: The target 
server failed to respond
at 
__randomizedtesting.SeedInfo.seed([C9F1855E170E9BAE:48170B466051FB92]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
at 
org.apache.solr.cloud.ReplicationFactorTest.sendNonDirectUpdateRequestReplica(ReplicationFactorTest.java:195)
at 
org.apache.solr.cloud.ReplicationFactorTest.testRf2NotUsingDirectUpdates(ReplicationFactorTest.java:165)
at 
org.apache.solr.cloud.ReplicationFactorTest.doTest(ReplicationFactorTest.java:129)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)

[jira] [Resolved] (SOLR-6845) Add buildOnStartup option for suggesters

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe resolved SOLR-6845.
-
Resolution: Fixed

 Add buildOnStartup option for suggesters
 

 Key: SOLR-6845
 URL: https://issues.apache.org/jira/browse/SOLR-6845
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Tomás Fernández Löbbe
 Fix For: Trunk, 5.1

 Attachments: SOLR-6845.patch, SOLR-6845.patch, SOLR-6845.patch


 SOLR-6679 was filed to track the investigation into the following problem...
 {panel}
 The stock solrconfig provides a bad experience with a large index... start up 
 Solr and it will spin at 100% CPU for minutes, unresponsive, while it 
 apparently builds a suggester index.
 ...
 This is what I did:
 1) indexed 10M very small docs (only takes a few minutes).
 2) shut down Solr
 3) start up Solr and watch it be unresponsive for over 4 minutes!
 I didn't even use any of the fields specified in the suggester config and I 
 never called the suggest request handler.
 {panel}
 ..but ultimately focused on removing/disabling the suggester from the sample 
 configs.
 Opening this new issue to focus on actually trying to identify the root 
 problem  fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6993) install_solr_service.sh won't install on RHEL / CentOS


[ 
https://issues.apache.org/jira/browse/SOLR-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285890#comment-14285890
 ] 

ASF subversion and git services commented on SOLR-6993:
---

Commit 1653603 from [~thelabdude] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653603 ]

SOLR-6993: install_solr_service.sh won't install on RHEL / CentOS

 install_solr_service.sh won't install on RHEL / CentOS
 --

 Key: SOLR-6993
 URL: https://issues.apache.org/jira/browse/SOLR-6993
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0, Trunk
 Environment: RHEL 6.5 / CentOS 6.5
Reporter: David Anderson
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6993.patch


 There's a bug that will prevent install_solr_service.sh from working on RHEL 
 / CentOS 6.5.  It works on Ubuntu 14.  Appears to be some obscure difference 
 in bash expression evaluation behavior.
 line 87 and 89:SOLR_DIR=${SOLR_INSTALL_FILE:0:-4}
 blows up with this error:
 ./install_solr_service.sh: line 87: -4: substring expression  0
 this results in the archive not being extracted and rest of the script won't 
 work.
 I tested a simple change:
   SOLR_DIR=${SOLR_INSTALL_FILE%.tgz}
 and verified it works on both RHEL 6.5 and Ubuntu 14
 Patch is attached.  I set this to Major thinking that not being able to 
 install on CentOS is worth fixing prior to release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6134) fix typos

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285958#comment-14285958
 ] 

ASF subversion and git services commented on LUCENE-6134:
-

Commit 1653616 from [~sar...@syr.edu] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653616 ]

LUCENE-6134: fix typos in lucene/CHANGES.txt and solr/CHANGES.txt (merged 
branch_5x r1653615)

 fix typos
 -

 Key: LUCENE-6134
 URL: https://issues.apache.org/jira/browse/LUCENE-6134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Trivial
 Attachments: LUCENE-6134-CHANGES.txt-s.patch, 
 LUCENE-6134-its-its.patch, 
 LUCENE-6134-necessary-whether-initializ-specified.patch


 I found a bunch of typos, will fix under this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading


[ 
https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285985#comment-14285985
 ] 

Noble Paul commented on SOLR-6521:
--

bq.The patch is locking the entire cache for all loading, which might not be an 
ideal solution for a cluster with many

I understand that. In reality , different  collections expire at different time 
.so everyone waiting on the lock would be a rare thing.  The common use case is 
one collection expired  and every thread is trying to refresh that 
simultaneously.

I agree that the concurrency can be dramatically improved . Using Guava may not 
be an option because it is not yet a dependency on SolrJ. The other option 
would be to make the cache pluggable through an API . So ,if you have Guava or 
something else in your package you can plug it in through an API

 CloudSolrServer should synchronize cache cluster state loading
 --

 Key: SOLR-6521
 URL: https://issues.apache.org/jira/browse/SOLR-6521
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
Priority: Critical
  Labels: SolrCloud
 Fix For: 5.0, Trunk

 Attachments: SOLR-6521.patch


 Under heavy load-testing with the new solrj client that caches the cluster 
 state instead of setting a watcher, I started seeing lots of zk connection 
 loss on the client-side when refreshing the CloudSolrServer 
 collectionStateCache, and this was causing crazy client-side 99.9% latency 
 (~15 sec). I swapped the cache out with guava's LoadingCache (which does 
 locking to ensure only one thread loads the content under one key while the 
 other threads that want the same key wait) and the connection loss went away 
 and the 99.9% latency also went down to just about 1 sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7005) facet.heatmap for spatial heatmap faceting on RPT

[
https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated SOLR-7005:
---
Attachment: heatmap_64x32.png
heatmap_512x256.png

There are some performance #'s on LUCENE-6191.

I experimented with generating a PNG to carry the data in a compressed manner,
since this data can get large. I'm abusing the image to carry the same detail
in the counts, and that means 4 bytes per pixel. Counts 16M touch the high
byte of a 4-byte int, which is where the alpha channel is, which will
progressively lighten the image. _The image is not at all optimized for human
viewing that is pleasant on the eyes_, except for the bit flip of the high
(alpha channel) byte; otherwise you would see nothing until the counts exceed
this figure. That said, it's crude and you can get a sense of it. _If people
have input on how to cheaply and easily tweak the value to look nicer, I'm
interested._ Since a client app may consume this PNG if it wants this
compressed format and render it the way it wants to, there should be a
straight-forward algorithm to derive the count from the ARGB (alpha, red,
green, blue) int.

The attached PNG is 512x256 (131,072 cells mind you!) of the 8.5M geonames data
set. On a 16 segment index with no search filters, it took 882ms to compute
the underlying heatmap, and 218ms to build the PNG and write it to disk. The
write-to-disk hack is temporary to easily view the image by opening it from the
file system. You can expect there will be more time in consuming this image
from Solr's javabin/XML/JSON + base64 wrapper (whatever you choose).

Now a 512x256 image is so detailed that it arguably isn't a heatmap but another
way to go about rendering individual points. A more course, say, 64x32 image
would be more true to the heatmap label, and obviously much faster to generate
-- like 100ms + only ~2ms to generate the PNG.

facet.heatmap for spatial heatmap faceting on RPT
-

Key: SOLR-7005
URL: https://issues.apache.org/jira/browse/SOLR-7005
Project: Solr
Issue Type: New Feature
Components: spatial
Reporter: David Smiley
Assignee: David Smiley
Fix For: 5.1

Attachments: heatmap_512x256.png, heatmap_64x32.png

This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell
counter in Lucene spatial LUCENE-6191. This is a form of faceting, and
as-such I think it should live in the facet parameter namespace. Here's
what the parameters are:
* facet=true
* facet.heatmap=fieldname
* facet.heatmap.bbox=\[-180 -90 TO 180 90]
* facet.heatmap.gridLevel=6
* facet.heatmap.distErrPct=0.10
Like other faceting features, the fieldName can have local-params to exclude
filter queries or specify an output key.
The bbox is optional; you get the whole world or you can specify a box or
actually any shape that WKT supports (you get the bounding box of whatever
you put).
Ultimately, this feature needs to know the grid level, which together with
the input shape will yield a certain number of cells. You can specify
gridLevel exactly, or don't and instead provide distErrPct which is computed
like it is for the RPT field type as seen in the schema. 0.10 yielded ~4k
cells but it'll vary. There's also a facet.heatmap.maxCells safety net
defaulting to 100k. Exceed this and you get an error.
The output is (JSON):
{noformat}
{gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
0, 2, 1, ],[1, 1, 3, 2, ...],...]}
{noformat}
counts is null if all would be 0. Perhaps individual row arrays should
likewise be null... I welcome feedback.
I'm toying with an output format option in which you can specify a base-64'ed
grayscale PNG.
Obviously this should support sharded / distributed environments.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6928) solr.cmd stop works only in english


[ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285945#comment-14285945
 ] 

Timothy Potter commented on SOLR-6928:
--

awesome suggestion Jan! testing your idea now and will get committed for 5

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor

 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6134) fix typos


 [ 
https://issues.apache.org/jira/browse/LUCENE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-6134:
---
Attachment: LUCENE-6134-CHANGES.txt-s.patch

Patch fixing typos in {{lucene/CHANGES.txt}} and {{solr/CHANGES.txt}}.  

Committing shortly.

 fix typos
 -

 Key: LUCENE-6134
 URL: https://issues.apache.org/jira/browse/LUCENE-6134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Trivial
 Attachments: LUCENE-6134-CHANGES.txt-s.patch, 
 LUCENE-6134-its-its.patch, 
 LUCENE-6134-necessary-whether-initializ-specified.patch


 I found a bunch of typos, will fix under this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7012) add an ant target to package a plugin into a jar

Noble Paul created SOLR-7012:


 Summary: add an ant target to package a plugin into a jar
 Key: SOLR-7012
 URL: https://issues.apache.org/jira/browse/SOLR-7012
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul


Now it is extremely hard to create  plugin because the user do not know about 
the exact dependencies and their poms
we will add a target to solr/build.xml called plugin-jar
invoke it as follows

{code}
ant -Dplugin.package=my.package -Djar.location=/tmp/my.jar plugin-jar
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6845) Add buildOnStartup option for suggesters

[
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tomás Fernández Löbbe updated SOLR-6845:

Assignee: Tomás Fernández Löbbe
Summary: Add buildOnStartup option for suggesters (was: figure out why
suggester causes slow startup - even when not used)

Changed summary to reflect the actual change done

Add buildOnStartup option for suggesters

Key: SOLR-6845
URL: https://issues.apache.org/jira/browse/SOLR-6845
Project: Solr
Issue Type: Bug
Reporter: Hoss Man
Assignee: Tomás Fernández Löbbe
Attachments: SOLR-6845.patch, SOLR-6845.patch, SOLR-6845.patch

SOLR-6679 was filed to track the investigation into the following problem...
{panel}
The stock solrconfig provides a bad experience with a large index... start up
Solr and it will spin at 100% CPU for minutes, unresponsive, while it
apparently builds a suggester index.
...
This is what I did:
1) indexed 10M very small docs (only takes a few minutes).
2) shut down Solr
3) start up Solr and watch it be unresponsive for over 4 minutes!
I didn't even use any of the fields specified in the suggester config and I
never called the suggest request handler.
{panel}
..but ultimately focused on removing/disabling the suggester from the sample
configs.
Opening this new issue to focus on actually trying to identify the root
problem fix it.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6928) solr.cmd stop works only in english


 [ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-6928:
-
Attachment: SOLR-6928.patch

Here's a patch that builds upon Jan's, but required checking for PID==0 because 
it was still finding something that wasn't listening:

  Proto  Local Address  Foreign AddressState   PID
  TCP127.0.0.1:49204127.0.0.1:8983 TIME_WAIT   0

According to the docs, the PID 0 is for a pseudo-idle process so the script 
could ignore those and keep looping to find the actual listening process.

This patch works well on English Windows ... I don't have access to a German 
Windows box, can someone test please?

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-6928.patch


 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe reopened SOLR-6991:
--

Reopening to address this Mac OS X failure in solr-cell:

{noformat}
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1943/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath

...

  [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=ExtractingRequestHandlerTest -Dtests.method=testXPath 
-Dtests.seed=58A6FBEB77E81527 -Dtests.slow=true -Dtests.locale=tr_TR 
-Dtests.timezone=Etc/GMT+3 -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
  [junit4] ERROR   2.57s | ExtractingRequestHandlerTest.testXPath 
  [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
process launch mechanism on this platform.
  [junit4] at 
__randomizedtesting.SeedInfo.seed([58A6FBEB77E81527:26F735786F5F7761]:0)
  [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
  [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
  [junit4] at java.security.AccessController.doPrivileged(Native 
Method)
  [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
  [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130)
  [junit4] at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
  [junit4] at java.lang.Runtime.exec(Runtime.java:620)
  [junit4] at java.lang.Runtime.exec(Runtime.java:485)
  [junit4] at 
org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
  [junit4] at 
org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
  [junit4] at 
org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
  [junit4] at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
  [junit4] at 
org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
  [junit4] at 
org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
  [junit4] at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
  [junit4] at 
org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
  [junit4] at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
  [junit4] at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
  [junit4] at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
  [junit4] at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  [junit4] at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
  [junit4] at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
  [junit4] at 
org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:353)
  [junit4] at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocalFromHandler(ExtractingRequestHandlerTest.java:703)
  [junit4] at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:710)
  [junit4] at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testXPath(ExtractingRequestHandlerTest.java:474)
  [junit4] at java.lang.Thread.run(Thread.java:745)
{noformat}

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data

2015-01-21 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-6192.

Resolution: Fixed

Resolving ... Tom can you post back here the results of testing with this fix?  
 Thanks.  Hopefully this is the bug you were hitting!

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285835#comment-14285835
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653585 from [~mikemccand] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653585 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6191) Spatial 2D faceting (heatmaps)

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285863#comment-14285863
 ] 

David Smiley commented on LUCENE-6191:
--

delta stats:
 * *Segments: 16 (no deleted)*
 * docs: 8,526,175  (slightly less than before, not sure why).
 * QuadTree
 precision: 22 (better than 10m)
 bounds: -180 to 180, -180 to 180 (360x360 square)
 * Disk index size: 2.35GB
 * heatmap input range: -180 to 180, -89.999 to 89.999 (slightly inset so 
heatmap doesn't include a row just  90 and just  90)

512x256 (131,072 cells) heatmap : 882ms
64x32 (2048 cells) heatmap: 120ms

 Spatial 2D faceting (heatmaps)
 --

 Key: LUCENE-6191
 URL: https://issues.apache.org/jira/browse/LUCENE-6191
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.1

 Attachments: LUCENE-6191__Spatial_heatmap.patch


 Lucene spatial's PrefixTree (grid) based strategies index data in a way 
 highly amenable to faceting on grids cells to compute a so-called _heatmap_. 
 The underlying code in this patch uses the PrefixTreeFacetCounter utility 
 class which was recently refactored out of faceting for NumberRangePrefixTree 
 LUCENE-5735.  At a low level, the terms (== grid cells) are navigated 
 per-segment, forward only with TermsEnum.seek, so it's pretty quick and 
 furthermore requires no extra caches  no docvalues.  Ideally you should use 
 QuadPrefixTree (or Flex once it comes out) to maximize the number grid levels 
 which in turn maximizes the fidelity of choices when you ask for a grid 
 covering a region.  Conveniently, the provided capability returns the data in 
 a 2-D grid of counts, so the caller needn't know a thing about how the data 
 is encoded in the prefix tree.  Well almost... at this point they need to 
 provide a grid level, but I'll soon provide a means of deriving the grid 
 level based on a min/max cell count.
 I recommend QuadPrefixTree with geo=false so that you can provide a square 
 world-bounds (360x360 degrees), which means square grid cells which are more 
 desirable to display than rectangular cells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285862#comment-14285862
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653594 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653594 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 740 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/740/

4 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.testDistribSearch

Error Message:
org.apache.http.NoHttpResponseException: The target server failed to respond

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.http.NoHttpResponseException: The target server failed to respond
at 
__randomizedtesting.SeedInfo.seed([A40ECF0BF65E3F19:25E8411381015F25]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:871)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:736)
at 
org.apache.solr.cloud.HttpPartitionTest.sendDoc(HttpPartitionTest.java:480)
at 
org.apache.solr.cloud.HttpPartitionTest.testRf2(HttpPartitionTest.java:201)
at 
org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:114)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at

[jira] [Commented] (LUCENE-6191) Spatial 2D faceting (heatmaps)

2015-01-21 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285809#comment-14285809
]

David Smiley commented on LUCENE-6191:
--

I have some performance numbers taken while working on SOLR-7005. I took a
geonames data set of 8,552,952 docs and I indexed the latitude longitude into
a quad prefixTree with maximum resolution of a meter and with geo=false and
-180 to 180, -90 to 90 world bounds of standard geodetic degree boundaries.
That's a screw-up on my part; I forgot to use 360x360 to get square grid boxes
instead of rectangular ones. But that's not pertinent. The index size is
2.6GB which is kind of large. Increasing the maximum resolution to above a
meter will decrease the index size a lot. This reminds me of how beneficial
the forthcoming flex prefixTree will be, but I digress. This data is all
points.

Base stats:
* Machine: my SSD based recent MacBook Pro, Java 8
* Lucene/Solr: trunk as of last night
* Docs: 8,552,952
* Segments: 1
* Disk index size: 2.6GB
* QuadTree:
** precision: 26 (better than a meter)

512x512 heatmap, (_note: this is a whopping 262,144 cells_): 248ms (PNG to be
attached to SOLR-7005 soon).
Now filtered with an additional query down to 165 docs: 105ms (I figure this
fast number is due to a particular optimization in the prefix tree facet
counter for highly discriminating filters).

64x64 heatmap (4,096 cells): 105ms
Filtered to 165 docs: 21ms

I took one measurement when the index was un-optimized at 38 segments,
including 10K deleted docs (512x512 query all): 1800ms roughly. I should try
this again after I re-index with the square grid cells I want.

Spatial 2D faceting (heatmaps)
--

Attachments: LUCENE-6191__Spatial_heatmap.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1653580 - in /lucene/dev/branches/branch_5x: ./ lucene/ lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/ lucene/core/ lucene/core/src/java/org/apache/lucene/codecs/l

2015-01-21 Thread Robert Muir

I think we should also fix Lucene41SkipWriter (src/test) ?

On Wed, Jan 21, 2015 at 11:24 AM,  mikemcc...@apache.org wrote:
 Author: mikemccand
 Date: Wed Jan 21 16:24:08 2015
 New Revision: 1653580

 URL: http://svn.apache.org/r1653580
 Log:
 LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
 extremely large indices

 Modified:
 lucene/dev/branches/branch_5x/   (props changed)
 lucene/dev/branches/branch_5x/lucene/   (props changed)
 lucene/dev/branches/branch_5x/lucene/CHANGES.txt   (props changed)
 
 lucene/dev/branches/branch_5x/lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java
 lucene/dev/branches/branch_5x/lucene/core/   (props changed)
 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipReader.java
 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipWriter.java

 Modified: 
 lucene/dev/branches/branch_5x/lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java?rev=1653580r1=1653579r2=1653580view=diff
 ==
 --- 
 lucene/dev/branches/branch_5x/lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java
  (original)
 +++ 
 lucene/dev/branches/branch_5x/lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene41/Lucene41SkipReader.java
  Wed Jan 21 16:24:08 2015
 @@ -173,13 +173,13 @@ final class Lucene41SkipReader extends M
  // if (DEBUG) {
  //   System.out.println(  delta= + delta);
  // }
 -docPointer[level] += skipStream.readVInt();
 +docPointer[level] += skipStream.readVLong();
  // if (DEBUG) {
  //   System.out.println(  docFP= + docPointer[level]);
  // }

  if (posPointer != null) {
 -  posPointer[level] += skipStream.readVInt();
 +  posPointer[level] += skipStream.readVLong();
// if (DEBUG) {
//   System.out.println(  posFP= + posPointer[level]);
// }
 @@ -193,7 +193,7 @@ final class Lucene41SkipReader extends M
}

if (payPointer != null) {
 -payPointer[level] += skipStream.readVInt();
 +payPointer[level] += skipStream.readVLong();
}
  }
  return delta;

 Modified: 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipReader.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipReader.java?rev=1653580r1=1653579r2=1653580view=diff
 ==
 --- 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipReader.java
  (original)
 +++ 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipReader.java
  Wed Jan 21 16:24:08 2015
 @@ -179,10 +179,10 @@ final class Lucene50SkipReader extends M
@Override
protected int readSkipData(int level, IndexInput skipStream) throws 
 IOException {
  int delta = skipStream.readVInt();
 -docPointer[level] += skipStream.readVInt();
 +docPointer[level] += skipStream.readVLong();

  if (posPointer != null) {
 -  posPointer[level] += skipStream.readVInt();
 +  posPointer[level] += skipStream.readVLong();
posBufferUpto[level] = skipStream.readVInt();

if (payloadByteUpto != null) {
 @@ -190,7 +190,7 @@ final class Lucene50SkipReader extends M
}

if (payPointer != null) {
 -payPointer[level] += skipStream.readVInt();
 +payPointer[level] += skipStream.readVLong();
}
  }
  return delta;

 Modified: 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipWriter.java
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipWriter.java?rev=1653580r1=1653579r2=1653580view=diff
 ==
 --- 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipWriter.java
  (original)
 +++ 
 lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/codecs/lucene50/Lucene50SkipWriter.java
  Wed Jan 21 16:24:08 2015
 @@ -147,12 +147,12 @@ final class Lucene50SkipWriter extends M
  skipBuffer.writeVInt(delta);
  lastSkipDoc[level] = curDoc;

 -skipBuffer.writeVInt((int) (curDocPointer - lastSkipDocPointer[level]));
 +skipBuffer.writeVLong(curDocPointer - lastSkipDocPointer[level]);
  lastSkipDocPointer[level] = curDocPointer;

  if (fieldHasPositions) {

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285842#comment-14285842
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653588 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1653588 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6976) Remove all methods and classes deprecated in 4.x from trunk and 5.x

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285797#comment-14285797
 ] 

ASF subversion and git services commented on SOLR-6976:
---

Commit 1653566 from [~romseygeek] in branch 'dev/branches/lucene_solr_5_0'
[ https://svn.apache.org/r1653566 ]

SOLR-6976: Remove methods and classes deprecated in 4.x

 Remove all methods and classes deprecated in 4.x from trunk and 5.x
 ---

 Key: SOLR-6976
 URL: https://issues.apache.org/jira/browse/SOLR-6976
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6976.patch, SOLR-6976.patch, SOLR-6976.patch, 
 SOLR-sharkeys.patch


 We have a bunch of methods, classes, enums, etc which are marked as 
 deprecated in Solr code in the 4.x branch.  Some of them have been marked as 
 such since the 1.4 release.  Before we get 5.0 out, these should all be 
 removed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6976) Remove all methods and classes deprecated in 4.x from trunk and 5.x

2015-01-21 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved SOLR-6976.
-
Resolution: Fixed
  Assignee: Alan Woodward

 Remove all methods and classes deprecated in 4.x from trunk and 5.x
 ---

 Key: SOLR-6976
 URL: https://issues.apache.org/jira/browse/SOLR-6976
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Blocker
 Fix For: 5.0, Trunk

 Attachments: SOLR-6976.patch, SOLR-6976.patch, SOLR-6976.patch, 
 SOLR-sharkeys.patch


 We have a bunch of methods, classes, enums, etc which are marked as 
 deprecated in Solr code in the 4.x branch.  Some of them have been marked as 
 such since the 1.4 release.  Before we get 5.0 out, these should all be 
 removed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285818#comment-14285818
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653577 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1653577 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285857#comment-14285857
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653593 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1653593 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285828#comment-14285828
 ] 

ASF subversion and git services commented on LUCENE-6192:
-

Commit 1653580 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653580 ]

LUCENE-6192: don't overflow int when writing skip data for high freq terms in 
extremely large indices

 Long overflow in LuceneXXSkipWriter can corrupt skip data
 -

 Key: LUCENE-6192
 URL: https://issues.apache.org/jira/browse/LUCENE-6192
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk, 4.x

 Attachments: LUCENE-6192.patch


 I've been iterating with Tom on this corruption that CheckIndex detects in 
 his rather large index (720 GB in a single segment):
 {noformat}
  java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... 
 org.apache.lucene.index.CheckIndex //shards/4/core-1/data/test_index 
 -verbose 21 |tee -a shard4_reoptimizedNewJava
 Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index
 Segments file=segments_e numSegments=1 version=4.10.2 format= 
 userData={commitTimeMSec=1421479358825}
   1 of 1: name=_8m8 docCount=1130856
 version=4.10.2
 codec=Lucene410
 compound=false
 numFiles=10
 size (MB)=719,967.32
 diagnostics = {timestamp=1421437320935, os=Linux, 
 os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, 
 lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, 
 java.version=1.7.0_71, java.vendor=Oracle Corporation}
 no deletions
 test: open reader.OK
 test: check integrity.OK
 test: check live docs.OK
 test: fields..OK [80 fields]
 test: field norms.OK [23 fields]
 test: terms, freq, prox...ERROR: java.lang.AssertionError: -96
 java.lang.AssertionError: -96
 at 
 org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925)
 at 
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955)
 at 
 org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100)
 at 
 org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 test: stored fields...OK [67472796 total field count; avg 59.665 
 fields per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector fields per doc]
 test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
 FAILED
 WARNING: fixIndex() would remove reference to this segment; full 
 exception:
 java.lang.RuntimeException: Term Index test failed
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096)
 WARNING: 1 broken segments (containing 1130856 documents) detected
 WARNING: would write new segments file, and 1130856 documents would be lost, 
 if -fix were specified
 {noformat}
 And Rob spotted long - int casts in our skip list writers that look like 
 they could cause such corruption if a single high-freq term with many 
 positions required  2.1 GB to write its positions into .pos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7013) Unclear error message with solr script when lacking jar executable

2015-01-21 Thread Derek Wood (JIRA)

Derek Wood created SOLR-7013:


 Summary: Unclear error message with solr script when lacking jar 
executable
 Key: SOLR-7013
 URL: https://issues.apache.org/jira/browse/SOLR-7013
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: Fedora 21
Reporter: Derek Wood


Fedora 21 doesn't ship the jar executable with the default jdk package, so 
the attempt to extract webapp/solr.war in the solr script can fail without a 
clear error message. The attached patch adds this error message and includes 
support for the unzip utility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-21 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286240#comment-14286240
]

Joel Bernstein commented on SOLR-6581:
--

The hint in the code is still upper case TOP_FC. This was meant to be lower
case. I'll open another issue for this and have it accept both cases. 5.0 will
go out with the upper case syntax though so I'll update the documentation.

Efficient DocValues support and numeric collapse field implementations for
Collapse and Expand
--

Key: SOLR-6581
URL: https://issues.apache.org/jira/browse/SOLR-6581
Project: Solr
Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
Fix For: 5.0, Trunk

Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch,
SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch,
SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch,
SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch,
renames.diff

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
are optimized to work with a top level FieldCache. Top level FieldCaches have
a very fast docID to top-level ordinal lookup. Fast access to the top-level
ordinals allows for very high performance field collapsing on high
cardinality fields.
LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
FieldCache is no longer in regular use. Instead all top level caches are
accessed through MultiDocValues.
This ticket does the following:
1) Optimizes Collapse and Expand to use MultiDocValues and makes this the
default approach when collapsing on String fields
2) Provides an option to use a top level FieldCache if the performance of
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is
a new hint parameter. If the hint parameter is set to top_fc then the
top-level FieldCache would be used for both Collapse and Expand.
Example syntax:
{code}
fq={!collapse field=x hint=TOP_FC}
{code}
3) Adds numeric collapse field implementations.
4) Resolves issue SOLR-6066

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286314#comment-14286314
 ] 

Uwe Schindler commented on SOLR-6991:
-

The last comment was just an idea, but doesn't work. The problem here is that 
initialization of the parser fails, so it will always call 
TesseractOCRParser.getSupportedTypes()...

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7013) Unclear error message with solr script when lacking jar executable

2015-01-21 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-7013:
---
 Priority: Blocker  (was: Major)
Fix Version/s: 5.0

 Unclear error message with solr script when lacking jar executable
 --

 Key: SOLR-7013
 URL: https://issues.apache.org/jira/browse/SOLR-7013
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: Fedora 21
Reporter: Derek Wood
Priority: Blocker
 Fix For: 5.0

 Attachments: solr.patch


 Fedora 21 doesn't ship the jar executable with the default jdk package, so 
 the attempt to extract webapp/solr.war in the solr script can fail without 
 a clear error message. The attached patch adds this error message and 
 includes support for the unzip utility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7005) facet.heatmap for spatial heatmap faceting on RPT

[
https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated SOLR-7005:
---
Attachment: SOLR-7005_heatmap.patch

Thanks for the encouragement Shalin, and Erik on #lucene-dev, and others via
email who have gotten wind of this.

Here's the first-draft patch. It is still based on being its own
SearchComponent, and it doesn't yet support distributed-search -- those issues
should be addressed next.

I added support for the distErr parameter to facilitate computing the grid
level in the same fashion as used by Lucene spatial to ultimately derive a grid
level for a given shape (a rect/box in this case). In fact it re-uses utility
methods in Lucene spatial to compute the grid level given the world boundary,
distErr (if provided) and distErrPct (if provided). The units of distErr is
the same as distanceUnits attribute on the field type (a new Solr 5 thing). So
if units is a kilometer and distErr is 100 then the grid cells returns are at
least as precise as 100 kilometers (which BTW is a little less than a spherical
degree for Earth, which is 111.2km). The 512x256 heatmap I uploaded was
generated by specifying distErr=111.2. A client could compute a distErr if
they instead know how many minimum cells they want in the heatmap. I may bake
that formula in and provide a minCells param.

For distributed-search, I'm thinking the internal shard requests will use PNG
since it's compressed, and then the user can get whatever format they asked
for. I only want to write the aggregation logic once, not per-format :-)

As a part of this work I found it useful to add SpatialUtils.parseRectangle
which parses the {{[lowerLeftPoint TO upperRightPoint]}} format. In another
issue I want to re-use this to provide a more Solr-friendly way of indexing a
rectangle (for e.g. BBoxField or RPT) or for specifying worldBounds on the
field type.

Even though I don't have distributed-search implemented yet, the test extends
BaseDistributedSearchTestCase any way. I dislike the idea of writing two tests
that test the same thing (one distributed, one not) when the infrastructure
should make it indifferent since it's transparent to input output I'm
testing. Unfortunately, assertQ friends are hard-coded to use TestHarness
which is in turn hard-coded to use an embedded Solr instance. And
unfortunately, BaseDistributedSearchTestCase doesn't let me test 0 shards (hey,
I haven't implemented that feature yet!). The patch tweaks
BaseDistributedSearchTestCase slightly to let me do this.

facet.heatmap for spatial heatmap faceting on RPT
-

Key: SOLR-7005
URL: https://issues.apache.org/jira/browse/SOLR-7005
Project: Solr
Issue Type: New Feature
Components: spatial
Reporter: David Smiley
Assignee: David Smiley
Fix For: 5.1

Attachments: SOLR-7005_heatmap.patch, heatmap_512x256.png,
heatmap_64x32.png

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-21 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-6581:
-
Description:
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
are optimized to work with a top level FieldCache. Top level FieldCaches have a
very fast docID to top-level ordinal lookup. Fast access to the top-level
ordinals allows for very high performance field collapsing on high cardinality
fields.

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
FieldCache is no longer in regular use. Instead all top level caches are
accessed through MultiDocValues.

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a
new hint parameter. If the hint parameter is set to top_fc then the
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=TOP_FC}
{code}

3) Adds numeric collapse field implementations.

4) Resolves issue SOLR-6066

was:
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
are optimized to work with a top level FieldCache. Top level FieldCaches have a
very fast docID to top-level ordinal lookup. Fast access to the top-level
ordinals allows for very high performance field collapsing on high cardinality
fields.

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
FieldCache is no longer in regular use. Instead all top level caches are
accessed through MultiDocValues.

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the
default approach when collapsing on String fields

Example syntax:
{code}
fq={!collapse field=x hint=top_fc}
{code}

3) Adds numeric collapse field implementations.

4) Resolves issue SOLR-6066

Efficient DocValues support and numeric collapse field implementations for
Collapse and Expand
--

Key: SOLR-6581
URL: https://issues.apache.org/jira/browse/SOLR-6581
Project: Solr
Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
Fix For: 5.0, Trunk

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2522 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2522/

4 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.testDistribSearch

Error Message:
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: http://127.0.0.1:12771/c8n_1x2_shard1_replica2

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: http://127.0.0.1:12771/c8n_1x2_shard1_replica2
at 
__randomizedtesting.SeedInfo.seed([F5B2AF58A025A0EF:74542140D77AC0D3]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:890)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:793)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:736)
at 
org.apache.solr.cloud.HttpPartitionTest.sendDoc(HttpPartitionTest.java:480)
at 
org.apache.solr.cloud.HttpPartitionTest.testRf2(HttpPartitionTest.java:201)
at 
org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:114)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:878)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-7005) facet.heatmap for spatial heatmap faceting on RPT


[ 
https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286232#comment-14286232
 ] 

David Smiley commented on SOLR-7005:


Oh, facet.heatmap.format=png (or ints, ints being the default)

 facet.heatmap for spatial heatmap faceting on RPT
 -

 Key: SOLR-7005
 URL: https://issues.apache.org/jira/browse/SOLR-7005
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.1

 Attachments: SOLR-7005_heatmap.patch, heatmap_512x256.png, 
 heatmap_64x32.png


 This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell 
 counter in Lucene spatial LUCENE-6191.  This is a form of faceting, and 
 as-such I think it should live in the facet parameter namespace.  Here's 
 what the parameters are:
 * facet=true
 * facet.heatmap=fieldname
 * facet.heatmap.bbox=\[-180 -90 TO 180 90]
 * facet.heatmap.gridLevel=6
 * facet.heatmap.distErrPct=0.10
 Like other faceting features, the fieldName can have local-params to exclude 
 filter queries or specify an output key.
 The bbox is optional; you get the whole world or you can specify a box or 
 actually any shape that WKT supports (you get the bounding box of whatever 
 you put).
 Ultimately, this feature needs to know the grid level, which together with 
 the input shape will yield a certain number of cells.  You can specify 
 gridLevel exactly, or don't and instead provide distErrPct which is computed 
 like it is for the RPT field type as seen in the schema.  0.10 yielded ~4k 
 cells but it'll vary.  There's also a facet.heatmap.maxCells safety net 
 defaulting to 100k.  Exceed this and you get an error.
 The output is (JSON):
 {noformat}
 {gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
  0, 2, 1, ],[1, 1, 3, 2, ...],...]}
 {noformat}
 counts is null if all would be 0.  Perhaps individual row arrays should 
 likewise be null... I welcome feedback.
 I'm toying with an output format option in which you can specify a base-64'ed 
 grayscale PNG.
 Obviously this should support sharded / distributed environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6188) Remove HTML verification from checkJavaDocs.py

2015-01-21 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286282#comment-14286282
 ] 

Erick Erickson commented on LUCENE-6188:


Thanks! Back from 2 days onsite so I can pay some attention now.

 Remove HTML verification from checkJavaDocs.py
 --

 Key: LUCENE-6188
 URL: https://issues.apache.org/jira/browse/LUCENE-6188
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/javadocs
Reporter: Ramkumar Aiyengar
Assignee: Erick Erickson
Priority: Minor
 Attachments: LUCENE-6188.patch, LUCENE-6188.patch


 Currently, the broken HTML verification in {{checkJavaDocs.py}} has issues in 
 some cases (see SOLR-6902).
 On looking further to fix it with the {{html.parser}} package instead, 
 noticed that there is broken HTML verification already present (using 
 {{html.parser}}!)in {{checkJavadocLinks.py}} anyway which takes care of 
 validation, and probably {{jTidy}} does it as well, going by the output 
 (haven't verified it).
 Given this, the validation in {{checkJavaDocs.py}} doesn't seem to add any 
 further value, so here's a patch to just nuke it instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-5.0-Linux (32bit/ibm-j9-jdk7) - Build # 26 - Failure!

2015-01-21 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.0-Linux/26/
Java: 32bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

45 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
Could not get the port for ZooKeeper server

Stack Trace:
java.lang.RuntimeException: Could not get the port for ZooKeeper server
at __randomizedtesting.SeedInfo.seed([89F35FDD4F4E4D35]:0)
at org.apache.solr.cloud.ZkTestServer.run(ZkTestServer.java:482)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:767)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:853)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:


Stack Trace:
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([89F35FDD4F4E4D35]:0)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.getLocalPort(NIOServerCnxnFactory.java:134)
at 
org.apache.solr.cloud.ZkTestServer$ZKServerMain.shutdown(ZkTestServer.java:334)
at org.apache.solr.cloud.ZkTestServer.shutdown(ZkTestServer.java:492)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_afterClass(AbstractZkTestCase.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:790)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at

[jira] [Commented] (LUCENE-6161) Applying deletes is sometimes dog slow

2015-01-21 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286421#comment-14286421
 ] 

Robert Muir commented on LUCENE-6161:
-

Just a few minor thoughts:

Some of the iteration is more awkward now, it might be nice to open a followup 
to clean this up.
delGen is awkward to see being held in PrefixCodedTerms, and we have an 
iterator api that ... is neither termsenum or iterable but another one instead.
I wonder if we could have the same logic, but using a more natural one. if it 
would just make the code even more awkward, then screw it :)

We should fix the issue though for now I think.

 Applying deletes is sometimes dog slow
 --

 Key: LUCENE-6161
 URL: https://issues.apache.org/jira/browse/LUCENE-6161
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6161.patch, LUCENE-6161.patch, LUCENE-6161.patch, 
 LUCENE-6161.patch, LUCENE-6161.patch


 I hit this while testing various use cases for LUCENE-6119 (adding 
 auto-throttle to ConcurrentMergeScheduler).
 When I tested always call updateDocument (each add buffers a delete term), 
 with many indexing threads, opening an NRT reader once per second (forcing 
 all deleted terms to be applied), I see that 
 BufferedUpdatesStream.applyDeletes sometimes seems to take a lng time, 
 e.g.:
 {noformat}
 BD 0 [2015-01-04 09:31:12.597; Lucene Merge Thread #69]: applyDeletes took 
 339 msec for 10 segments, 117 deleted docs, 607333 visited terms
 BD 0 [2015-01-04 09:31:18.148; Thread-4]: applyDeletes took 5533 msec for 62 
 segments, 10989 deleted docs, 8517225 visited terms
 BD 0 [2015-01-04 09:31:21.463; Lucene Merge Thread #71]: applyDeletes took 
 1065 msec for 10 segments, 470 deleted docs, 1825649 visited terms
 BD 0 [2015-01-04 09:31:26.301; Thread-5]: applyDeletes took 4835 msec for 61 
 segments, 14676 deleted docs, 9649860 visited terms
 BD 0 [2015-01-04 09:31:35.572; Thread-11]: applyDeletes took 6073 msec for 72 
 segments, 13835 deleted docs, 11865319 visited terms
 BD 0 [2015-01-04 09:31:37.604; Lucene Merge Thread #75]: applyDeletes took 
 251 msec for 10 segments, 58 deleted docs, 240721 visited terms
 BD 0 [2015-01-04 09:31:44.641; Thread-11]: applyDeletes took 5956 msec for 64 
 segments, 15109 deleted docs, 10599034 visited terms
 BD 0 [2015-01-04 09:31:47.814; Lucene Merge Thread #77]: applyDeletes took 
 396 msec for 10 segments, 137 deleted docs, 719914 visit
 {noformat}
 What this means is even though I want an NRT reader every second, often I 
 don't get one for up to ~7 or more seconds.
 This is on an SSD, machine has 48 GB RAM, heap size is only 2 GB.  12 
 indexing threads.
 As hideously complex as this code is, I think there are some inefficiencies, 
 but fixing them could be hard / make code even hairier ...
 Also, this code is mega-locked: holds IW's lock, holds BD's lock.  It blocks 
 things like merges kicking off or finishing...
 E.g., we pull the MergedIterator many times on the same set of sub-iterators. 
  Maybe we can create the sorted terms up front and reuse that?
 Maybe we should go term stride (one term visits all N segments) not 
 segment stride (visit each segment, iterating all deleted terms for it).  
 Just iterating the terms to be deleted takes a sizable part of the time, and 
 we now do that once for every segment in the index.
 Also, the isUnique bit in LUCENE-6005 should help here, since if we know 
 the field is unique, we can stop seekExact once we found a segment that has 
 the deleted term, we can maybe pass false for removeDuplicates to 
 MergedIterator...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7013) Unclear error message with solr script when lacking jar executable

2015-01-21 Thread Derek Wood (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derek Wood updated SOLR-7013:
-
Attachment: solr.patch

 Unclear error message with solr script when lacking jar executable
 --

 Key: SOLR-7013
 URL: https://issues.apache.org/jira/browse/SOLR-7013
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: Fedora 21
Reporter: Derek Wood
 Attachments: solr.patch


 Fedora 21 doesn't ship the jar executable with the default jdk package, so 
 the attempt to extract webapp/solr.war in the solr script can fail without 
 a clear error message. The attached patch adds this error message and 
 includes support for the unzip utility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6845) Add buildOnStartup option for suggesters


 [ 
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6845:

Attachment: tests-failures.txt

I just saw a local failure on trunk on 
org.apache.solr.handler.component.SuggestComponentTest.testDefaultBuildOnStartupStoredDict.
 The logs are attached and the stack trace is:
{code}
  2 786070 T7047 oas.SolrTestCaseJ4.assertQ ERROR REQUEST FAILED: 
xpath=//lst[@name='suggest']/lst[@name='suggest_doc_default_startup']/lst[@name='example']/int[@name='numFound'][.='0']
  2xml response was: ?xml version=1.0 encoding=UTF-8?
  2response
  2lst name=responseHeaderint name=status0/intint 
name=QTime5/int/lstlst name=suggestlst 
name=suggest_doc_default_startuplst name=exampleint 
name=numFound2/intarr name=suggestionslststr name=termexample 
inputdata/strlong name=weight45/longstr 
name=payload//lstlststr name=termexample data/strlong 
name=weight40/longstr name=payload//lst/arr/lst/lst/lst
  2/response
  2
  2request 
was:qt=/suggestsuggest.q=examplesuggest.count=2suggest.dictionary=suggest_doc_default_startupwt=xml
  2 786071 T7047 oasc.SolrException.log ERROR REQUEST FAILED: 
qt=/suggestsuggest.q=examplesuggest.count=2suggest.dictionary=suggest_doc_default_startupwt=xml:java.lang.RuntimeException:
 REQUEST FAILED: 
xpath=//lst[@name='suggest']/lst[@name='suggest_doc_default_startup']/lst[@name='example']/int[@name='numFound'][.='0']
  2xml response was: ?xml version=1.0 encoding=UTF-8?
  2response
  2lst name=responseHeaderint name=status0/intint 
name=QTime5/int/lstlst name=suggestlst 
name=suggest_doc_default_startuplst name=exampleint 
name=numFound2/intarr name=suggestionslststr name=termexample 
inputdata/strlong name=weight45/longstr 
name=payload//lstlststr name=termexample data/strlong 
name=weight40/longstr name=payload//lst/arr/lst/lst/lst
  2/response
  2
  2request 
was:qt=/suggestsuggest.q=examplesuggest.count=2suggest.dictionary=suggest_doc_default_startupwt=xml
  2at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:741)
  2at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:715)
  2at 
org.apache.solr.handler.component.SuggestComponentTest.testDefaultBuildOnStartupStoredDict(SuggestComponentTest.java:257)
  2at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
{code}

{code}
ant test  -Dtestcase=SuggestComponentTest 
-Dtests.method=testDefaultBuildOnStartupStoredDict 
-Dtests.seed=1AE9946D9D16B26E -Dtests.slow=true -Dtests.locale=en 
-Dtests.timezone=Asia/Istanbul -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
{code}

I tried a few times but couldn't reproduce it. 

 Add buildOnStartup option for suggesters
 

 Key: SOLR-6845
 URL: https://issues.apache.org/jira/browse/SOLR-6845
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Tomás Fernández Löbbe
 Fix For: Trunk, 5.1

 Attachments: SOLR-6845.patch, SOLR-6845.patch, SOLR-6845.patch, 
 tests-failures.txt


 SOLR-6679 was filed to track the investigation into the following problem...
 {panel}
 The stock solrconfig provides a bad experience with a large index... start up 
 Solr and it will spin at 100% CPU for minutes, unresponsive, while it 
 apparently builds a suggester index.
 ...
 This is what I did:
 1) indexed 10M very small docs (only takes a few minutes).
 2) shut down Solr
 3) start up Solr and watch it be unresponsive for over 4 minutes!
 I didn't even use any of the fields specified in the suggester config and I 
 never called the suggest request handler.
 {panel}
 ..but ultimately focused on removing/disabling the suggester from the sample 
 configs.
 Opening this new issue to focus on actually trying to identify the root 
 problem  fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6188) Remove HTML verification from checkJavaDocs.py

2015-01-21 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286294#comment-14286294
 ] 

Robert Muir commented on LUCENE-6188:
-

{quote}
If it's not adding value anymore (e.g. we recently turned on faster javadocs 
checking via javac's doclint options), I agree we should remove it: it's slow 
and hackity and un-understandable.
{quote}

The doclint stuff added (TRUNK ONLY) is blazing fast and nice, but there is a 
good amount of work before its checking html, i see these steps:
* actually turn on html verification in doclint. this can't be done until a lot 
of problems are fixed. When they are fixed we can enable html:
  {noformat}-Xdoclint:all/protected -Xdoclint:-html -Xdoclint:-missing{noformat}
* figure out how to check overview.html and package.html. I suspect they are 
currently not being checked (but maybe im wrong). Maybe we can ask the openjdk 
developers about it. 

Then jtidy could be removed completely. python linting is still needed until we 
can properly enable missing and cutover build logic to that. Then i think its 
check-missing could be removed. As far as the python broken links checker, im 
not sure if there is a replacement. Ideally we are just using doclint for all 
checks in the future.

 Remove HTML verification from checkJavaDocs.py
 --

 Key: LUCENE-6188
 URL: https://issues.apache.org/jira/browse/LUCENE-6188
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/javadocs
Reporter: Ramkumar Aiyengar
Assignee: Erick Erickson
Priority: Minor
 Attachments: LUCENE-6188.patch, LUCENE-6188.patch


 Currently, the broken HTML verification in {{checkJavaDocs.py}} has issues in 
 some cases (see SOLR-6902).
 On looking further to fix it with the {{html.parser}} package instead, 
 noticed that there is broken HTML verification already present (using 
 {{html.parser}}!)in {{checkJavadocLinks.py}} anyway which takes care of 
 validation, and probably {{jTidy}} does it as well, going by the output 
 (haven't verified it).
 Given this, the validation in {{checkJavaDocs.py}} doesn't seem to add any 
 further value, so here's a patch to just nuke it instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6954) Considering changing SolrClient#shutdown to SolrClient#close.

2015-01-21 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-6954:

Attachment: SOLR-6954.patch

Patch making SolrClient implement Closeable, and making shutdown() a deprecated 
concrete method that delegates to close().  Also cuts over all tests to use 
close() (and try-with-resources where possible).

 Considering changing SolrClient#shutdown to SolrClient#close.
 -

 Key: SOLR-6954
 URL: https://issues.apache.org/jira/browse/SOLR-6954
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-6954.patch


 SolrClient#shutdown is not as odd as SolrServer#shutdown, but as we want 
 users to release these objects, close is more standard and if we implement 
 Closeable, tools help point out leaks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7014) Collapse identical catch branches in try-catch statements


 [ 
https://issues.apache.org/jira/browse/SOLR-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-7014:

Attachment: SOLR-7014.patch

This takes care of all solr classes. I'll attach another one which does the 
same for lucene.

 Collapse identical catch branches in try-catch statements
 -

 Key: SOLR-7014
 URL: https://issues.apache.org/jira/browse/SOLR-7014
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1

 Attachments: SOLR-7014.patch


 We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
 statements into one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286272#comment-14286272
 ] 

Uwe Schindler commented on SOLR-6991:
-

Hi,
I checked the code. The problem is: You cannot disable by config (because it 
always tries to execute the command thats part of the default config file). If 
the config file is not there, then it runs TESSERACT without any path.

The only way to work around is: 
- Disable the whole parser (f*ck, because then we need to maintain our own 
parser list internally). There is no way to tell TIKA to exclude some parsers 
(something like AutodetectParser#disableParser(name/class/whatever)
- Use a hack with reflection to make TesseractOCRParser#TESSERACT_PRESENT 
return false for any path... Just replace the static map by one that returns 
false for any key (LOL) and ignores any put()

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-7014) Collapse identical catch branches in try-catch statements

Shalin Shekhar Mangar created SOLR-7014:
---

 Summary: Collapse identical catch branches in try-catch statements
 Key: SOLR-7014
 URL: https://issues.apache.org/jira/browse/SOLR-7014
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1


We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
statements into one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286307#comment-14286307
]

Uwe Schindler commented on SOLR-6991:
-

One trick could work:
TIKA prefers always external parsers loaded by SPI. The trick here would be
to add a /META-INF/services/... file that lists a subclass of the Tesseract
parser that just always returns no supported media types. TIKA would use our
subclass in preference to the one shipped. By that we could disable the parser.
I have not checked this, but this would be another hack (that I don't like,
too).

Update to Apache TIKA 1.7
-

Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7014) Collapse identical catch branches in try-catch statements


[ 
https://issues.apache.org/jira/browse/SOLR-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286324#comment-14286324
 ] 

ASF subversion and git services commented on SOLR-7014:
---

Commit 1653665 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1653665 ]

SOLR-7014: Collapse identical catch branches in try-catch statements

 Collapse identical catch branches in try-catch statements
 -

 Key: SOLR-7014
 URL: https://issues.apache.org/jira/browse/SOLR-7014
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1

 Attachments: SOLR-7014.patch


 We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
 statements into one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6845) Add buildOnStartup option for suggesters


[ 
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286323#comment-14286323
 ] 

Tomás Fernández Löbbe commented on SOLR-6845:
-

I'll take a look

 Add buildOnStartup option for suggesters
 

 Key: SOLR-6845
 URL: https://issues.apache.org/jira/browse/SOLR-6845
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Tomás Fernández Löbbe
 Fix For: Trunk, 5.1

 Attachments: SOLR-6845.patch, SOLR-6845.patch, SOLR-6845.patch, 
 tests-failures.txt


 SOLR-6679 was filed to track the investigation into the following problem...
 {panel}
 The stock solrconfig provides a bad experience with a large index... start up 
 Solr and it will spin at 100% CPU for minutes, unresponsive, while it 
 apparently builds a suggester index.
 ...
 This is what I did:
 1) indexed 10M very small docs (only takes a few minutes).
 2) shut down Solr
 3) start up Solr and watch it be unresponsive for over 4 minutes!
 I didn't even use any of the fields specified in the suggester config and I 
 never called the suggest request handler.
 {panel}
 ..but ultimately focused on removing/disabling the suggester from the sample 
 configs.
 Opening this new issue to focus on actually trying to identify the root 
 problem  fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-5.x-Linux (64bit/ibm-j9-jdk7) - Build # 11493 - Failure!

2015-01-21 Thread Michael McCandless

J9 bug.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jan 21, 2015 at 5:53 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11493/
 Java: 64bit/ibm-j9-jdk7 
 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

 1 tests failed.
 FAILED:  
 org.apache.lucene.codecs.lucene49.TestLucene49NormsFormat.testByteRange

 Error Message:


 Stack Trace:
 java.lang.NullPointerException
 at 
 __randomizedtesting.SeedInfo.seed([1EFEBBCD258C8490:D78182FF451AC405]:0)
 at 
 org.apache.lucene.codecs.lucene49.Lucene49NormsConsumer$NormMap.add(Lucene49NormsConsumer.java:206)
 at 
 org.apache.lucene.codecs.lucene49.Lucene49NormsConsumer.addNormsField(Lucene49NormsConsumer.java:95)
 at 
 org.apache.lucene.index.NormValuesWriter.flush(NormValuesWriter.java:72)
 at 
 org.apache.lucene.index.DefaultIndexingChain.writeNorms(DefaultIndexingChain.java:204)
 at 
 org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:92)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:419)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2733)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2888)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2855)
 at 
 org.apache.lucene.index.RandomIndexWriter.commit(RandomIndexWriter.java:257)
 at 
 org.apache.lucene.index.BaseNormsFormatTestCase.doTestNormsVersusStoredFields(BaseNormsFormatTestCase.java:261)
 at 
 org.apache.lucene.index.BaseNormsFormatTestCase.testByteRange(BaseNormsFormatTestCase.java:54)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
 at java.lang.reflect.Method.invoke(Method.java:619)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286477#comment-14286477
 ] 

Steve Rowe commented on SOLR-6991:
--

bq. don't we need similar assumes in dataimporthandler-extras tests that use 
TikaEntityProcessor? (i'm not sure why those wouldn't fail with turkish now as 
well)

I ran {{ant test -Dtests.slow=true -Dtests.locale=tr_TR}} in 
{{solr/contrib/dataimporthandler-extras/}}, and got the following failure:

{noformat}
   [junit4] Suite: org.apache.solr.handler.dataimport.TestTikaEntityProcessor
   [junit4]   2 Creating dataDir: 
/Users/sarowe/svn/lucene/dev/trunk2/solr/build/contrib/solr-dataimporthandler-extras/test/J0/temp/solr.handler.dataimport.TestTikaEntityProcessor
 9123B7DE098A1C98-001/init-core-data-001
   [junit4]   2 log4j:WARN No appenders could be found for logger 
(org.apache.solr.SolrTestCaseJ4).
   [junit4]   2 log4j:WARN Please initialize the log4j system properly.
   [junit4]   2 log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestTikaEntityProcessor -Dtests.method=testTikaHTMLMapperIdentity 
-Dtests.seed=9123B7DE098A1C98 -Dtests.slow=true -Dtests.locale=tr_TR 
-Dtests.timezone=America/Toronto -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.93s J0 | 
TestTikaEntityProcessor.testTikaHTMLMapperIdentity 
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
process launch mechanism on this platform.
   [junit4]at 
__randomizedtesting.SeedInfo.seed([9123B7DE098A1C98:C15C334FC0BEE965]:0)
   [junit4]at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]at java.security.AccessController.doPrivileged(Native 
Method)
   [junit4]at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]at 
org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]at 
org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]at 
org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]at 
org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]at 
org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   [junit4]at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:141)
   [junit4]at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
   [junit4]at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
   [junit4]at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
   [junit4]at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
   [junit4]at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
   [junit4]at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
   [junit4]at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
   [junit4]at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:189)
   [junit4]at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
   [junit4]at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
   [junit4]at 
org.apache.solr.util.TestHarness.query(TestHarness.java:331)
   [junit4]at 
org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase.runFullImport(AbstractDataImportHandlerTestCase.java:86)
   [junit4]at 
org.apache.solr.handler.dataimport.TestTikaEntityProcessor.testTikaHTMLMapperIdentity(TestTikaEntityProcessor.java:99)
   [junit4]

[jira] [Commented] (SOLR-6969) Just like we have to retry when the NameNode is in safemode on Solr startup, we also need to retry when opening a transaction log file for append when we get a RecoveryI

2015-01-21 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286485#comment-14286485
 ] 

Mike Drob commented on SOLR-6969:
-

Is retrying always going to be safe? That works fine after we've lost a server 
and started a new one (albeit too quickly) but what about the case where two 
servers both think they are responsible for that tlog? This can happen if the 
original server partially dies, but still has some threads that are doing work 
and haven't been cleaned up.

Looking at how other projects handle similar issues - HBase moves the entire 
directory[1] to break any existing leases and ensure any other processes gets 
kicked out. Maybe a retry is a good stop-gap, but is it going to be a full 
solution?

[1]: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L310

 Just like we have to retry when the NameNode is in safemode on Solr startup, 
 we also need to retry when opening a transaction log file for append when we 
 get a RecoveryInProgressException.
 

 Key: SOLR-6969
 URL: https://issues.apache.org/jira/browse/SOLR-6969
 Project: Solr
  Issue Type: Bug
  Components: hdfs
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, Trunk


 This can happen after a hard crash and restart. The current workaround is to 
 stop and wait it out and start again. We should retry and wait a given amount 
 of time as we do when we detect safe mode though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6928) solr.cmd stop works only in english


 [ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6928:
--
Attachment: SOLR-6928.patch

Slightly improved patch
* No need for case insensitive find
* Require a space after port number to avoid false match

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-6928.patch, SOLR-6928.patch


 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286502#comment-14286502
 ] 

Uwe Schindler commented on SOLR-6991:
-

[~steve_rowe]: Can you commit to all 3 branches, I wanted to go sleeping? 
Thanks.

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6928) solr.cmd stop works only in english


[ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286540#comment-14286540
 ] 

ASF subversion and git services commented on SOLR-6928:
---

Commit 1653700 from [~thelabdude] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653700 ]

SOLR-6928: solr.cmd stop works only in english

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-6928.patch, SOLR-6928.patch


 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286558#comment-14286558
 ] 

Steve Rowe commented on SOLR-6991:
--

bq. Steve Rowe: Can you commit to all 3 branches, I wanted to go sleeping? 
Thanks.

Will do.

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286562#comment-14286562
 ] 

Steve Rowe commented on SOLR-6991:
--

bq. I'm running all Solr tests now with this patch and -Dtests.slow=true 
-Dtests.locale=tr_TR.

All Solr tests passed with the patch.

Committing now.

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6188) Remove HTML verification from checkJavaDocs.py

2015-01-21 Thread Ramkumar Aiyengar (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286584#comment-14286584
 ] 

Ramkumar Aiyengar commented on LUCENE-6188:
---

Rob, the logic I have nuked is actually not as a duplicate of doclint (i just 
didnt check that, and as you mention there might be differences) but the 
checkJavadocLinks.py script which is run prior to this script in the 
documentation-lint. That does the exact same check in Python, except it uses a 
real parser rather than regex hacks..

 Remove HTML verification from checkJavaDocs.py
 --

 Key: LUCENE-6188
 URL: https://issues.apache.org/jira/browse/LUCENE-6188
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/javadocs
Reporter: Ramkumar Aiyengar
Assignee: Erick Erickson
Priority: Minor
 Attachments: LUCENE-6188.patch, LUCENE-6188.patch


 Currently, the broken HTML verification in {{checkJavaDocs.py}} has issues in 
 some cases (see SOLR-6902).
 On looking further to fix it with the {{html.parser}} package instead, 
 noticed that there is broken HTML verification already present (using 
 {{html.parser}}!)in {{checkJavadocLinks.py}} anyway which takes care of 
 validation, and probably {{jTidy}} does it as well, going by the output 
 (haven't verified it).
 Given this, the validation in {{checkJavaDocs.py}} doesn't seem to add any 
 further value, so here's a patch to just nuke it instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-6991.
--
Resolution: Fixed

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7011) fix OverseerCollectionProcessor.deleteCollection removal-done check


[ 
https://issues.apache.org/jira/browse/SOLR-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286609#comment-14286609
 ] 

ASF subversion and git services commented on SOLR-7011:
---

Commit 1653716 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1653716 ]

SOLR-7011: Delete collection returns before collection is actually removed

 fix OverseerCollectionProcessor.deleteCollection removal-done check
 ---

 Key: SOLR-7011
 URL: https://issues.apache.org/jira/browse/SOLR-7011
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.3, 5.0, Trunk
Reporter: Christine Poerschke
Assignee: Shalin Shekhar Mangar
Priority: Minor

 {{OverseerCollectionProcessor.java}} line 1184 has a
 {{.hasCollection(message.getStr(collection))}} call which should be either
 {{.hasCollection(message.getStr(name))}} or
 {{.hasCollection(collection)}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6188) Remove HTML verification from checkJavaDocs.py

2015-01-21 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286498#comment-14286498
]

Erick Erickson commented on LUCENE-6188:

Hmmm, thanks of pointing this out, but it makes things... complicated.

Problem is that until this is done, SOLR-6902 is blocked as that patch fails
precommit. For no good reason I can find. If I'm reading things right, doclint
is only in Java 8, so is simply not an option for 5x even if the problems you
point out are fixed up.

If I'm reading this right, Ramkumar's claim is that the html checking in this
patch that is being removed is unnecessary anyway, so removing it doesn't lose
us anything. And it's incorrectly failing this doc for some reason. I checked
the generated doc file and it looks fine, I think I even ran it through an XML
validator. I could always have missed something of course.

That said, the proposed changes in this JIRA to take a lot of code out of
checkJavaDocs.py, and I'll very much admit I haven't gone through the changes
in much detail, but they do appear to just be doing HTML validation.

I can treat this somewhat as a black box and do something like apply this patch
locally and:

1 create some invalid JavaDoc links and insure that they're flagged if this
patch is applied (any suggestions for a candidate list)? If that works (or,
more accurately fails the invalid javadocs), commit this patch to trunk and 5x
and then commit SOLR-6902
or
2 just remove the javadocs from SOLR-6902 or possibly munge them until that
code succeeds precommit.
or
3 try to figure out what the false failure is here and fix checkJavaDocs.py

I think 1 is my first choice, and 3 is a very distant third. Spending time
debugging code that it sounds like we're going to remove on trunk seems like a
waste. I may do 2 anyway, remove the javaDocs and put them if one of the
other approaches works. SOLR-6902 is hard to keep up to date since it touches
so much, Alan's checkin is already going to be a headache to reconcile. So
keeping it our of the code line just because of a bad (and possibly redundant)
bit of non-standard HTML checking seems like a poor tradeoff.

This last can be argued of course

Anyway, I'll do some poking around and report back before committing anything.

Remove HTML verification from checkJavaDocs.py
--

Key: LUCENE-6188
URL: https://issues.apache.org/jira/browse/LUCENE-6188
Project: Lucene - Core
Issue Type: Improvement
Components: general/javadocs
Reporter: Ramkumar Aiyengar
Assignee: Erick Erickson
Priority: Minor
Attachments: LUCENE-6188.patch, LUCENE-6188.patch

Currently, the broken HTML verification in {{checkJavaDocs.py}} has issues in
some cases (see SOLR-6902).
On looking further to fix it with the {{html.parser}} package instead,
noticed that there is broken HTML verification already present (using
{{html.parser}}!)in {{checkJavadocLinks.py}} anyway which takes care of
validation, and probably {{jTidy}} does it as well, going by the output
(haven't verified it).
Given this, the validation in {{checkJavaDocs.py}} doesn't seem to add any
further value, so here's a patch to just nuke it instead.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6193) Collapse identical catch branches in try-catch statements

2015-01-21 Thread ASF subversion and git services (JIRA)

Shalin Shekhar Mangar created LUCENE-6193:
-

 Summary: Collapse identical catch branches in try-catch statements
 Key: LUCENE-6193
 URL: https://issues.apache.org/jira/browse/LUCENE-6193
 Project: Lucene - Core
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1


We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
statements into one. We did the same for solr in SOLR-7014.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7014) Collapse identical catch branches in try-catch statements


[ 
https://issues.apache.org/jira/browse/SOLR-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286532#comment-14286532
 ] 

ASF subversion and git services commented on SOLR-7014:
---

Commit 1653698 from sha...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1653698 ]

SOLR-7014: Collapse identical catch branches in try-catch statements in 
morphlines-core

 Collapse identical catch branches in try-catch statements
 -

 Key: SOLR-7014
 URL: https://issues.apache.org/jira/browse/SOLR-7014
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1

 Attachments: SOLR-7014-more.patch, SOLR-7014.patch


 We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
 statements into one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7

2015-01-21 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286623#comment-14286623
 ] 

Anshum Gupta commented on SOLR-6991:


Thanks for fixing this everyone!

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2523 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2523/

5 tests failed.
REGRESSION:  
org.apache.solr.handler.component.SuggestComponentTest.testBuildOnStartupWithNewCores

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([D0016418FD804417:4AC564B67B9CDB93]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:748)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:715)
at 
org.apache.solr.handler.component.SuggestComponentTest.doTestBuildOnStartup(SuggestComponentTest.java:395)
at 
org.apache.solr.handler.component.SuggestComponentTest.testBuildOnStartupWithNewCores(SuggestComponentTest.java:374)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at

[jira] [Commented] (SOLR-6991) Update to Apache TIKA 1.7


[ 
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286500#comment-14286500
 ] 

Uwe Schindler commented on SOLR-6991:
-

Ah you already posted a patch. Thanks for testing. I have only Windows ready to 
use on my laptop :-)

 Update to Apache TIKA 1.7
 -

 Key: SOLR-6991
 URL: https://issues.apache.org/jira/browse/SOLR-6991
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, Trunk, 5.1

 Attachments: SOLR-6991-forkfix.patch, SOLR-6991-forkfix.patch, 
 SOLR-6991.patch, SOLR-6991.patch


 Apache TIKA 1.7 was released: 
 [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
 This is more or less a dependency update, so replacements. Not sure if we 
 should do this for 5.0. In 5.0 we currently have the previous version, which 
 was not yet released with Solr. If we now bring this into 5.0, we wouldn't 
 have a new release 2 times. I can change the stuff this evening and let it 
 bake in 5.x, so maybe we backport this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6193) Collapse identical catch branches in try-catch statements

2015-01-21 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated LUCENE-6193:
--
Attachment: LUCENE-6193.patch

The only places where I did not make these changes are where the catch blocks 
have different comments or the code wasn't ASL.

The following were excluded:
# org.apache.lucene.analysis.core.TestFactories
# org.apache.lucene.index.TestReaderClosed
# org.apache.lucene.queryparser.flexible.messages.NLS (one instance)
# org.egothor.stemmer.Diff (license different from ASL)
# org.tartarus.snowball.SnowballProgram (license different from ASL)

 Collapse identical catch branches in try-catch statements
 -

 Key: LUCENE-6193
 URL: https://issues.apache.org/jira/browse/LUCENE-6193
 Project: Lucene - Core
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: Trunk, 5.1

 Attachments: LUCENE-6193.patch


 We are on Java 7+ so we can reduce verbosity by collapsing identical catch 
 statements into one. We did the same for solr in SOLR-7014.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6928) solr.cmd stop works only in english


[ 
https://issues.apache.org/jira/browse/SOLR-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286538#comment-14286538
 ] 

ASF subversion and git services commented on SOLR-6928:
---

Commit 1653699 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1653699 ]

SOLR-6928: solr.cmd stop works only in english

 solr.cmd stop works only in english
 ---

 Key: SOLR-6928
 URL: https://issues.apache.org/jira/browse/SOLR-6928
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.10.3
 Environment: german windows 7
Reporter: john.work
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-6928.patch, SOLR-6928.patch


 in solr.cmd the stop doesnt work while executing 'netstat -nao ^| find /i 
 listening ^| find :%SOLR_PORT%' so listening is not found.
 e.g. in german cmd.exe the netstat -nao prints the following output:
 {noformat}
   Proto  Lokale Adresse Remoteadresse  Status   PID
   TCP0.0.0.0:80 0.0.0.0:0  ABHÖREN 4
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-7014) Collapse identical catch branches in try-catch statements