[jira] [Commented] (SOLR-2366) Facet Range Gaps

2014-07-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074116#comment-14074116
 ] 

Tomás Fernández Löbbe commented on SOLR-2366:
-

I didn't see this Jira before (just saw this now that I was updating the 
faceting wiki). Part of what's described here can be achieved by Interval 
Faceting (SOLR-6216). Implementation is different though, because it relies in 
DocValues instead of filters.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 1696 - Failure!

2014-07-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/1696/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC

7 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.SolrExampleBinaryTest.testExampleConfig

Error Message:
Expected mime type application/octet-stream but got text/html. html head 
meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ 
titleError 500 {msg=Could not initialize class 
java.lang.UNIXProcess,trace=java.lang.NoClassDefFoundError: Could not 
initialize class java.lang.UNIXProcess  at 
java.lang.ProcessImpl.start(ProcessImpl.java:130)  at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)  at 
java.lang.Runtime.exec(Runtime.java:617)  at 
java.lang.Runtime.exec(Runtime.java:450)  at 
java.lang.Runtime.exec(Runtime.java:347)  at 
org.apache.solr.handler.admin.SystemInfoHandler.execute(SystemInfoHandler.java:220)
  at 
org.apache.solr.handler.admin.SystemInfoHandler.getSystemInfo(SystemInfoHandler.java:176)
  at 
org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:97)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandler.java:88)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:137)
  at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)  
at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)  
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)  
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)  
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
 at org.eclipse.jetty.server.Server.handle(Server.java:368)  at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)  at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)  at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
  at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
  at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
 at java.lang.Thread.run(Thread.java:745) ,code=500}/title /head body 
h2HTTP ERROR: 500/h2 pProblem accessing /solr/admin/info/system. Reason: 
pre{msg=Could not initialize class 
java.lang.UNIXProcess,trace=java.lang.NoClassDefFoundError: Could not 
initialize class java.lang.UNIXProcess  at 
java.lang.ProcessImpl.start(ProcessImpl.java:130)  at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)  at 
java.lang.Runtime.exec(Runtime.java:617)  at 
java.lang.Runtime.exec(Runtime.java:450)  at 
java.lang.Runtime.exec(Runtime.java:347)  at 
org.apache.solr.handler.admin.SystemInfoHandler.execute(SystemInfoHandler.java:220)
  at 
org.apache.solr.handler.admin.SystemInfoHandler.getSystemInfo(SystemInfoHandler.java:176)
  at 
org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:97)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandler.java:88)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
  at 

[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27188 - Failure!

2014-07-25 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27188/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterThreadsToSegments.testManyThreadsClose

Error Message:
Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, 
state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
at 
__randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0)
at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258)
Caused by: java.lang.NullPointerException
at 
org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300)
at 
org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110)
at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253)




Build Log:
[...truncated 1608 lines...]
   [junit4] Suite: org.apache.lucene.index.TestIndexWriterThreadsToSegments
   [junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestIndexWriterThreadsToSegments -Dtests.method=testManyThreadsClose 
-Dtests.seed=23CFC8083A310D8C -Dtests.slow=true -Dtests.locale=fr_FR 
-Dtests.timezone=Pacific/Midway -Dtests.file.encoding=UTF-8
   [junit4] ERROR   2.55s J4 | 
TestIndexWriterThreadsToSegments.testManyThreadsClose 
   [junit4] Throwable #1: 
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   [junit4]at 
__randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0)
   [junit4] Caused by: java.lang.RuntimeException: 
java.lang.NullPointerException
   [junit4]at 
__randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0)
   [junit4]at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258)
   [junit4] Caused by: java.lang.NullPointerException
   [junit4]at 
org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300)
   [junit4]at 
org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473)
   [junit4]at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437)
   [junit4]at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
   [junit4]at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110)
   [junit4]at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253)
   [junit4]   2 NOTE: test params are: codec=Lucene49: 
{field=PostingsFormat(name=SimpleText)}, docValues:{}, sim=DefaultSimilarity, 
locale=fr_FR, timezone=Pacific/Midway
   [junit4]   2 NOTE: Linux 3.2.0-26-generic amd64/Oracle Corporation 1.7.0_55 
(64-bit)/cpus=8,threads=1,free=57393592,total=242221056
   [junit4]   2 NOTE: All tests run in this JVM: [TestScorerPerf, 
TestDeterminism, TestDemo, TestAutomatonQuery, TestLookaheadTokenFilter, 
TestTermVectorsFormat, TestBinaryDocValuesUpdates, TestSumDocFreq, 
Test2BNumericDocValues, Nested, Nested2, TestDateSort, TestPrefixFilter, 
ThrowInUncaught, TestCrashCausesCorruptIndex, TestDocValuesWithThreads, 
TestLucene3xStoredFieldsFormat, InBeforeClass, InAfterClass, InTestMethod, 
NonStringProperties, TestReaderClosed, TestBitVector, TestLucene42NormsFormat, 
TestOmitNorms, TestQueryRescorer, TestSimpleAttributeImpl, TestDocCount, 
TestDocIdBitSet, TestOpenBitSet, TestBasics, TestNorms, TestCompoundFile, 
TestIndexWriterUnicode, TestBufferedIndexInput, TestConsistentFieldNumbers, 
TestLockFactory, TestSegmentMerger, TestIndexWriterNRTIsCurrent, 
TestFieldsReader, TestDocValuesIndexing, TestHugeRamFile, 
TestSpanSearchEquivalence, 

[jira] [Created] (SOLR-6278) add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional

2014-07-25 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-6278:
-

 Summary: add admin/collections?action=DELETEREPLICAcore=... 
support, make collection=... and shard=... parameters optional
 Key: SOLR-6278
 URL: https://issues.apache.org/jira/browse/SOLR-6278
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke


To add {{core=...}} as an alternative to {{replica=...}} way of identifying 
what is to be deleted, {{collection=...}} and {{shard=...}} to be optional 
provided the other parameters uniquely identify exactly one deletion target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-6278: add admin/collections?action=...

2014-07-25 Thread cpoerschke
GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/71

SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support

SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support, 
make collection=... and shard=... parameters optional

https://issues.apache.org/jira/i#browse/SOLR-6278

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
trunk-DELETEREPICA-by-core-name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/71.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #71


commit 93bb1104bff3c0d7a30d8f8e0bf2d2989f7ce79c
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-07-09T10:39:35Z

solr: add admin/collections?action=DELETEREPLICAcore=... support, make 
collection=... and shard=... parameters optional




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6278) add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional

2014-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074301#comment-14074301
 ] 

ASF GitHub Bot commented on SOLR-6278:
--

GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/71

SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support

SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support, 
make collection=... and shard=... parameters optional

https://issues.apache.org/jira/i#browse/SOLR-6278

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
trunk-DELETEREPICA-by-core-name

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/71.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #71


commit 93bb1104bff3c0d7a30d8f8e0bf2d2989f7ce79c
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-07-09T10:39:35Z

solr: add admin/collections?action=DELETEREPLICAcore=... support, make 
collection=... and shard=... parameters optional




 add admin/collections?action=DELETEREPLICAcore=... support, make 
 collection=... and shard=... parameters optional
 --

 Key: SOLR-6278
 URL: https://issues.apache.org/jira/browse/SOLR-6278
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke

 To add {{core=...}} as an alternative to {{replica=...}} way of identifying 
 what is to be deleted, {{collection=...}} and {{shard=...}} to be optional 
 provided the other parameters uniquely identify exactly one deletion target.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-6279: cores?action=UNLOAD can unreg...

2014-07-25 Thread cpoerschke
GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/72

SOLR-6279: cores?action=UNLOAD can unregister unclosed core


https://issues.apache.org/jira/i#browse/SOLR-6279

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
trunk-UNLOAD-can-unregister-unclosed-close

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #72


commit 2a80ca5dfb47b7cd0416be2ae72bade0fe8f3ad0
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-07-22T12:07:58Z

solr: cores?action=UNLOAD can unregister unclosed core

Changing CoreContainer.unload to wait for core to close before 
unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6279) cores?action=UNLOAD can unregister unclosed core

2014-07-25 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-6279:
-

 Summary: cores?action=UNLOAD can unregister unclosed core
 Key: SOLR-6279
 URL: https://issues.apache.org/jira/browse/SOLR-6279
 Project: Solr
  Issue Type: Bug
Reporter: Christine Poerschke


baseline:

{code}
  /somewhere/instanceA/collection1_shard1/core.properties
  /somewhere/instanceA/collection1_shard1/data
  /somewhere/instanceA/collection1_shard2/core.properties
  /somewhere/instanceA/collection1_shard2/data

  /somewhere/instanceB
{code}

actions:

{code}
  curl http://host:port/solr/admin/cores?action=UNLOADcore=collection1_shard2;

  # since UNLOAD completed we should now be free to move the unloaded core's 
files as we wish

  mv /somewhere/instanceA/collection1_shard2 
/somewhere/instanceB/collection1_shard2
{code}

expected result:

{code}
  /somewhere/instanceA/collection1_shard1/core.properties
  /somewhere/instanceA/collection1_shard1/data

  # collection1_shard2 files have been fully relocated

  /somewhere/instanceB/collection1_shard2/core.properties.unloaded
  /somewhere/instanceB/collection1_shard2/data
{code}

actual result:

{code}
  /somewhere/instanceA/collection1_shard1/core.properties
  /somewhere/instanceA/collection1_shard1/data
  /somewhere/instanceA/collection1_shard2/data

  # collection1_shard2 files have not been fully relocated and/or some files 
were left behind in instanceA because the UNLOAD action had returned prior to 
the core being closed

  /somewhere/instanceB/collection1_shard2/core.properties.unloaded
  /somewhere/instanceB/collection1_shard2/data
{code}


+proposed fix:+ Changing CoreContainer.unload to wait for core to close before 
unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6279) cores?action=UNLOAD can unregister unclosed core

2014-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074307#comment-14074307
 ] 

ASF GitHub Bot commented on SOLR-6279:
--

GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/72

SOLR-6279: cores?action=UNLOAD can unregister unclosed core


https://issues.apache.org/jira/i#browse/SOLR-6279

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
trunk-UNLOAD-can-unregister-unclosed-close

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #72


commit 2a80ca5dfb47b7cd0416be2ae72bade0fe8f3ad0
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-07-22T12:07:58Z

solr: cores?action=UNLOAD can unregister unclosed core

Changing CoreContainer.unload to wait for core to close before 
unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores.




 cores?action=UNLOAD can unregister unclosed core
 

 Key: SOLR-6279
 URL: https://issues.apache.org/jira/browse/SOLR-6279
 Project: Solr
  Issue Type: Bug
Reporter: Christine Poerschke

 baseline:
 {code}
   /somewhere/instanceA/collection1_shard1/core.properties
   /somewhere/instanceA/collection1_shard1/data
   /somewhere/instanceA/collection1_shard2/core.properties
   /somewhere/instanceA/collection1_shard2/data
   /somewhere/instanceB
 {code}
 actions:
 {code}
   curl 
 http://host:port/solr/admin/cores?action=UNLOADcore=collection1_shard2;
   # since UNLOAD completed we should now be free to move the unloaded core's 
 files as we wish
   mv /somewhere/instanceA/collection1_shard2 
 /somewhere/instanceB/collection1_shard2
 {code}
 expected result:
 {code}
   /somewhere/instanceA/collection1_shard1/core.properties
   /somewhere/instanceA/collection1_shard1/data
   # collection1_shard2 files have been fully relocated
   /somewhere/instanceB/collection1_shard2/core.properties.unloaded
   /somewhere/instanceB/collection1_shard2/data
 {code}
 actual result:
 {code}
   /somewhere/instanceA/collection1_shard1/core.properties
   /somewhere/instanceA/collection1_shard1/data
   /somewhere/instanceA/collection1_shard2/data
   # collection1_shard2 files have not been fully relocated and/or some files 
 were left behind in instanceA because the UNLOAD action had returned prior to 
 the core being closed
   /somewhere/instanceB/collection1_shard2/core.properties.unloaded
   /somewhere/instanceB/collection1_shard2/data
 {code}
 +proposed fix:+ Changing CoreContainer.unload to wait for core to close 
 before unregistering it from ZK. Adding testMidUseUnload method to 
 TestLazyCores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074333#comment-14074333
 ] 

ASF subversion and git services commented on SOLR-5847:
---

Commit 1613406 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1613406 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements

 The Admin GUI doesn't allow to abort a running dataimport
 -

 Key: SOLR-5847
 URL: https://issues.apache.org/jira/browse/SOLR-5847
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, web gui
Affects Versions: 4.7
Reporter: Paco Garcia
Priority: Minor

 With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error 
 on POST requests with no Content-Type), the jquery invocation to abort a 
 running dataimport fails with HTTP error code 415.
 The method POST should have some content in the body
 See comments in SOLR-5517



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3622) DIH should not do rollbacks.

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074332#comment-14074332
 ] 

ASF subversion and git services commented on SOLR-3622:
---

Commit 1613406 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1613406 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements

 DIH should not do rollbacks.
 

 Key: SOLR-3622
 URL: https://issues.apache.org/jira/browse/SOLR-3622
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Mark Miller
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10


 This is not playing nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6269) Change rollback to error in DIH

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074335#comment-14074335
 ] 

ASF subversion and git services commented on SOLR-6269:
---

Commit 1613406 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1613406 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements

 Change rollback to error in DIH
 ---

 Key: SOLR-6269
 URL: https://issues.apache.org/jira/browse/SOLR-6269
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.9
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10

 Attachments: SOLR-6269.patch


 Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
 mode, let's rename most things rollback to error, such as the new 
 onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6194) Allow access to DataImporter and DIHConfiguration

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074334#comment-14074334
 ] 

ASF subversion and git services commented on SOLR-6194:
---

Commit 1613406 from [~ehatcher] in branch 'dev/trunk'
[ https://svn.apache.org/r1613406 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements

 Allow access to DataImporter and DIHConfiguration
 -

 Key: SOLR-6194
 URL: https://issues.apache.org/jira/browse/SOLR-6194
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.10
Reporter: Aaron LaBella
Assignee: Shalin Shekhar Mangar
 Fix For: 4.10

 Attachments: SOLR-6194.patch, SOLR-6194.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 I'd like to change the visibility and access to a couple of the internal 
 classes of DataImportHandler, specifically DataImporter and DIHConfiguration. 
  My reasoning is that I've added the ability for a new data import handler 
 command called *getquery* that will return the exact queries (fully 
 resolved) that are executed for an entity within the data import 
 configuration.  This makes it much easier to debug the dih, rather than 
 turning on debug/verbose flags and digging through the raw response.  
 Additionally, it gives me a service that I can then go take the queries 
 from and run them.
 Here's a snippet of Java code that I can now execute now that I have access 
 to the DIHConfiguration:
 {code:title=Snippet.java|borderStyle=solid}
   /**
* @return a map of all the queries for each entity in the given config
*/
   protected MapString,String getEntityQueries(DIHConfiguration config, 
 MapString,Object params)
   {
 MapString,String queries = new LinkedHashMap();
 if (config != null  config.getEntities() != null)
 {
   //make a new variable resolve
   VariableResolver vr = new VariableResolver();
   vr.addNamespace(dataimporter.request,params);
   //for each entity
   for (Entity e : config.getEntities())
   {
 //get the query and resolve it
 if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY))
 {
   String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY);
   query = query.replaceAll(\\s+,  ).trim();
   String resolved = vr.replaceTokens(query);
   resolved = resolved.replaceAll(\\s+,  ).trim();
   queries.put(e.getName(),resolved);
   queries.put(e.getName()+_raw,query);
 }
   }
 }
 return queries;
   }
 {code}
 I'm attaching a patch that I would appreciate someone have a look for 
 consideration.  It's fully tested -- please let me know if there is something 
 else I need to do/test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3622) DIH should not do rollbacks.

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074336#comment-14074336
 ] 

ASF subversion and git services commented on SOLR-3622:
---

Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1613409 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements 
(merged from r1613406)

 DIH should not do rollbacks.
 

 Key: SOLR-3622
 URL: https://issues.apache.org/jira/browse/SOLR-3622
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Mark Miller
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10


 This is not playing nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074337#comment-14074337
 ] 

ASF subversion and git services commented on SOLR-5847:
---

Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1613409 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements 
(merged from r1613406)

 The Admin GUI doesn't allow to abort a running dataimport
 -

 Key: SOLR-5847
 URL: https://issues.apache.org/jira/browse/SOLR-5847
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, web gui
Affects Versions: 4.7
Reporter: Paco Garcia
Priority: Minor

 With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error 
 on POST requests with no Content-Type), the jquery invocation to abort a 
 running dataimport fails with HTTP error code 415.
 The method POST should have some content in the body
 See comments in SOLR-5517



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6269) Change rollback to error in DIH

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074340#comment-14074340
 ] 

ASF subversion and git services commented on SOLR-6269:
---

Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1613409 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements 
(merged from r1613406)

 Change rollback to error in DIH
 ---

 Key: SOLR-6269
 URL: https://issues.apache.org/jira/browse/SOLR-6269
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.9
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10

 Attachments: SOLR-6269.patch


 Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
 mode, let's rename most things rollback to error, such as the new 
 onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6194) Allow access to DataImporter and DIHConfiguration

2014-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074339#comment-14074339
 ] 

ASF subversion and git services commented on SOLR-6194:
---

Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1613409 ]

SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements 
(merged from r1613406)

 Allow access to DataImporter and DIHConfiguration
 -

 Key: SOLR-6194
 URL: https://issues.apache.org/jira/browse/SOLR-6194
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.10
Reporter: Aaron LaBella
Assignee: Shalin Shekhar Mangar
 Fix For: 4.10

 Attachments: SOLR-6194.patch, SOLR-6194.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 I'd like to change the visibility and access to a couple of the internal 
 classes of DataImportHandler, specifically DataImporter and DIHConfiguration. 
  My reasoning is that I've added the ability for a new data import handler 
 command called *getquery* that will return the exact queries (fully 
 resolved) that are executed for an entity within the data import 
 configuration.  This makes it much easier to debug the dih, rather than 
 turning on debug/verbose flags and digging through the raw response.  
 Additionally, it gives me a service that I can then go take the queries 
 from and run them.
 Here's a snippet of Java code that I can now execute now that I have access 
 to the DIHConfiguration:
 {code:title=Snippet.java|borderStyle=solid}
   /**
* @return a map of all the queries for each entity in the given config
*/
   protected MapString,String getEntityQueries(DIHConfiguration config, 
 MapString,Object params)
   {
 MapString,String queries = new LinkedHashMap();
 if (config != null  config.getEntities() != null)
 {
   //make a new variable resolve
   VariableResolver vr = new VariableResolver();
   vr.addNamespace(dataimporter.request,params);
   //for each entity
   for (Entity e : config.getEntities())
   {
 //get the query and resolve it
 if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY))
 {
   String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY);
   query = query.replaceAll(\\s+,  ).trim();
   String resolved = vr.replaceTokens(query);
   resolved = resolved.replaceAll(\\s+,  ).trim();
   queries.put(e.getName(),resolved);
   queries.put(e.getName()+_raw,query);
 }
   }
 }
 return queries;
   }
 {code}
 I'm attaching a patch that I would appreciate someone have a look for 
 consideration.  It's fully tested -- please let me know if there is something 
 else I need to do/test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport

2014-07-25 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved SOLR-5847.


Resolution: Fixed
  Assignee: Erik Hatcher

 The Admin GUI doesn't allow to abort a running dataimport
 -

 Key: SOLR-5847
 URL: https://issues.apache.org/jira/browse/SOLR-5847
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, web gui
Affects Versions: 4.7
Reporter: Paco Garcia
Assignee: Erik Hatcher
Priority: Minor

 With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error 
 on POST requests with no Content-Type), the jquery invocation to abort a 
 running dataimport fails with HTTP error code 415.
 The method POST should have some content in the body
 See comments in SOLR-5517



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6269) Change rollback to error in DIH

2014-07-25 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved SOLR-6269.


Resolution: Fixed

 Change rollback to error in DIH
 ---

 Key: SOLR-6269
 URL: https://issues.apache.org/jira/browse/SOLR-6269
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.9
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10

 Attachments: SOLR-6269.patch


 Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
 mode, let's rename most things rollback to error, such as the new 
 onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport

2014-07-25 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074342#comment-14074342
 ] 

Erik Hatcher commented on SOLR-5847:


I simply changed the method to GET instead of POST.

 The Admin GUI doesn't allow to abort a running dataimport
 -

 Key: SOLR-5847
 URL: https://issues.apache.org/jira/browse/SOLR-5847
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler, web gui
Affects Versions: 4.7
Reporter: Paco Garcia
Assignee: Erik Hatcher
Priority: Minor

 With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error 
 on POST requests with no Content-Type), the jquery invocation to abort a 
 running dataimport fails with HTTP error code 415.
 The method POST should have some content in the body
 See comments in SOLR-5517



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6194) Allow access to DataImporter and DIHConfiguration

2014-07-25 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved SOLR-6194.


   Resolution: Fixed
Fix Version/s: 5.0

 Allow access to DataImporter and DIHConfiguration
 -

 Key: SOLR-6194
 URL: https://issues.apache.org/jira/browse/SOLR-6194
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.10
Reporter: Aaron LaBella
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.10

 Attachments: SOLR-6194.patch, SOLR-6194.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 I'd like to change the visibility and access to a couple of the internal 
 classes of DataImportHandler, specifically DataImporter and DIHConfiguration. 
  My reasoning is that I've added the ability for a new data import handler 
 command called *getquery* that will return the exact queries (fully 
 resolved) that are executed for an entity within the data import 
 configuration.  This makes it much easier to debug the dih, rather than 
 turning on debug/verbose flags and digging through the raw response.  
 Additionally, it gives me a service that I can then go take the queries 
 from and run them.
 Here's a snippet of Java code that I can now execute now that I have access 
 to the DIHConfiguration:
 {code:title=Snippet.java|borderStyle=solid}
   /**
* @return a map of all the queries for each entity in the given config
*/
   protected MapString,String getEntityQueries(DIHConfiguration config, 
 MapString,Object params)
   {
 MapString,String queries = new LinkedHashMap();
 if (config != null  config.getEntities() != null)
 {
   //make a new variable resolve
   VariableResolver vr = new VariableResolver();
   vr.addNamespace(dataimporter.request,params);
   //for each entity
   for (Entity e : config.getEntities())
   {
 //get the query and resolve it
 if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY))
 {
   String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY);
   query = query.replaceAll(\\s+,  ).trim();
   String resolved = vr.replaceTokens(query);
   resolved = resolved.replaceAll(\\s+,  ).trim();
   queries.put(e.getName(),resolved);
   queries.put(e.getName()+_raw,query);
 }
   }
 }
 return queries;
   }
 {code}
 I'm attaching a patch that I would appreciate someone have a look for 
 consideration.  It's fully tested -- please let me know if there is something 
 else I need to do/test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6269) Change rollback to error in DIH

2014-07-25 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074347#comment-14074347
 ] 

Noble Paul commented on SOLR-6269:
--

I just went  through the patch
It's fine. 

 Change rollback to error in DIH
 ---

 Key: SOLR-6269
 URL: https://issues.apache.org/jira/browse/SOLR-6269
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.9
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10

 Attachments: SOLR-6269.patch


 Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
 mode, let's rename most things rollback to error, such as the new 
 onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3622) DIH should not do rollbacks.

2014-07-25 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved SOLR-3622.


Resolution: Fixed

 DIH should not do rollbacks.
 

 Key: SOLR-3622
 URL: https://issues.apache.org/jira/browse/SOLR-3622
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Mark Miller
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10


 This is not playing nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: SOLR-6163: special chars and ManagedSyno...

2014-07-25 Thread timoschmidt
GitHub user timoschmidt opened a pull request:

https://github.com/apache/lucene-solr/pull/73

SOLR-6163: special chars and ManagedSynonymFilterFactory

Special characters could not be used for update or deletion because the url 
was not decoded before the resource was used.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/timoschmidt/lucene-solr origin/branch_4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/73.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #73


commit 0168e160e4a9236b047b2e24909d1f59dfd3eb7b
Author: timo.schmidt timo-schm...@gmx.net
Date:   2014-07-25T12:44:26Z

SOLR-6163: special chars and ManagedSynonymFilterFactory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6163) special chars and ManagedSynonymFilterFactory

2014-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074359#comment-14074359
 ] 

ASF GitHub Bot commented on SOLR-6163:
--

GitHub user timoschmidt opened a pull request:

https://github.com/apache/lucene-solr/pull/73

SOLR-6163: special chars and ManagedSynonymFilterFactory

Special characters could not be used for update or deletion because the url 
was not decoded before the resource was used.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/timoschmidt/lucene-solr origin/branch_4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/73.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #73


commit 0168e160e4a9236b047b2e24909d1f59dfd3eb7b
Author: timo.schmidt timo-schm...@gmx.net
Date:   2014-07-25T12:44:26Z

SOLR-6163: special chars and ManagedSynonymFilterFactory




 special chars and ManagedSynonymFilterFactory
 -

 Key: SOLR-6163
 URL: https://issues.apache.org/jira/browse/SOLR-6163
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Wim Kumpen

 Hey,
 I was playing with the ManagedSynonymFilterFactory to create a synonym list 
 with the API. But I have difficulties when my keys contains special 
 characters (or spaces) to delete them...
 I added a key ééé that matches with some other words. It's saved in the 
 synonym file as ééé.
 When I try to delete it, I do:
 curl -X DELETE 
 http://localhost/solr/mycore/schema/analysis/synonyms/english/ééé;
 error message: %C3%A9%C3%A9%C3%A9%C2%B5 not found in 
 /schema/analysis/synonyms/english
 A wild guess from me is that %C3%A9 isn't decoded back to ééé. And that's why 
 he can't find the keyword?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-07-25 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074360#comment-14074360
 ] 

Steve Molloy commented on SOLR-6248:


In this case it cannot replace the current MoreLikeThisHandler implementation 
which can analyze incoming text (as opposed to searching for a matching 
document in the index) in order to find similar documents in the index. Being 
able to query by unique field and returning similar documents is already 
covered by the MoreLikeThisComponent if you use rows=1 to get a single document 
and its set of similar ones. The use case that forces the MoreLikeThisHandler 
currently (at least that I know of) is really this on-the-fly analysis of text 
that is nowhere in the index.

 MoreLikeThis Query Parser
 -

 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
Reporter: Anshum Gupta
 Attachments: SOLR-6248.patch


 MLT Component doesn't let people highlight/paginate and the handler comes 
 with an cost of maintaining another piece in the config. Also, any changes to 
 the default (number of results to be fetched etc.) /select handler need to be 
 copied/synced with this handler too.
 Having an MLT QParser would let users get back docs based on a query for them 
 to paginate, highlight etc. It would also give them the flexibility to use 
 this anywhere i.e. q,fq,bq etc.
 A bit of history about MLT (thanks to Hoss)
 MLT Handler pre-dates the existence of QParsers and was meant to take an 
 arbitrary query as input, find docs that match that 
 query, club them together to find interesting terms, and then use those 
 terms as if they were my main query to generate a main result set.
 This result would then be used as the set to facet, highlight etc.
 The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y)
 The MLT component on the other hand solved a very different purpose of 
 augmenting the main result set. It is used to get similar docs for each of 
 the doc in the main result set.
 DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m)
 The new approach:
 All of this can be done better and cleaner (and makes more sense too) using 
 an MLT QParser.
 An important thing to handle here is the case where the user doesn't have 
 TermVectors, in which case, it does what happens right now i.e. parsing 
 stored fields.
 Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
 field would need to be a TextField with an index analyzer defined. This 
 analyzer will then be used to extract terms for MLT.
 In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
 the schema (if TermVectors are enabled for the field). If not, a /get call 
 can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6276) DIH test failures with invalid locale in derby JDBC driver

2014-07-25 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6276:
-

Attachment: SOLR-6276.patch

 DIH test failures with invalid locale in derby JDBC driver
 --

 Key: SOLR-6276
 URL: https://issues.apache.org/jira/browse/SOLR-6276
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.8.1
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Attachments: SOLR-6276.patch


 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10879/testReport/junit/org.apache.solr.handler.dataimport/TestJdbcDataSourceConvertType/testConvertType/
 We should pass the locale explicitly in the connection url params



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074391#comment-14074391
 ] 

David Smiley commented on SOLR-6216:


[~tomasflobbe] , do you have any scripts for your performance testing?  I am 
impressed with that part specifically as I've got some interval faceting on my 
horizon to do what is somewhat similar to what you've got here LUCENE-5735

 Better faceting for multiple intervals on DV fields
 ---

 Key: SOLR-6216
 URL: https://issues.apache.org/jira/browse/SOLR-6216
 Project: Solr
  Issue Type: Improvement
Reporter: Tomás Fernández Löbbe
Assignee: Erick Erickson
 Fix For: 4.10

 Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
 SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
 SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch


 There are two ways to have faceting on values ranges in Solr right now: 
 “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
 doing something similar:
 {code:java}
 searcher.numDocs(rangeQ , docs)
 {code}
 The good thing about this implementation is that it can benefit from caching. 
 The bad thing is that it may be slow with cold caches, and that there will be 
 a query for each of the ranges.
 A different implementation would be one that works similar to regular field 
 faceting, using doc values and validating ranges for each value of the 
 matching documents. This implementation would sometimes be faster than Range 
 Faceting / Query Faceting, specially on cases where caches are not very 
 effective, like on a high update rate, or where ranges change frequently.
 Functionally, the result should be exactly the same as the one obtained by 
 doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5847) Improved implementation of language models in lucene

2014-07-25 Thread Hadas Raviv (JIRA)
Hadas Raviv created LUCENE-5847:
---

 Summary: Improved implementation of language models in lucene 
 Key: LUCENE-5847
 URL: https://issues.apache.org/jira/browse/LUCENE-5847
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Hadas Raviv
Priority: Minor
 Fix For: 5.0
 Attachments: LUCENE-2507.patch

The current implementation of language models in lucene is based on the paper 
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information 
Retrieval by Zhai and Lafferty ('01). Specifically, LMDiricheltSimilarity and 
LMJelinikMercerSimilarity use a normalized smoothed score for a matching term 
in a document, as suggested in the above mentioned paper.

However, lucene doesn't assign a score to query terms that do not appear in a 
matched document. According to the pure LM approach, these terms should be 
assigned with a collection probability background score. If one uses the 
Jelinik Mercer smoothing method, the final result list produced by lucene is 
rank equivalent to the one that would have been created by a full LM 
implementation. However, this is not the case for Dirichlet smoothing method, 
because the background score is document dependent. Documents in which not all 
query terms appear, are missing the document-dependant background score for the 
missing terms. This component affects the final ranking of documents in the 
list.

Since LM is a baseline method in many works in the IR research field, I attach 
a patch that implements a full LM in lucene. The basic issue that should be 
addressed here is assigning a document with a score that depends on *all* the 
query terms, collection statistics and the document length. The general idea of 
what I did is adding a new getBackGroundScore(int docID) method to similarity, 
scorer and bulkScorer. Than, when a collector assigns a score to a document 
(score = scorer.score()) I added the backgound score 
(score=scorer.score()+scorer.background(doc)) that is assigned by the 
similarity class used for ranking. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5847) Improved implementation of language models in lucene

2014-07-25 Thread Hadas Raviv (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hadas Raviv updated LUCENE-5847:


Attachment: LUCENE-2507.patch

 Improved implementation of language models in lucene 
 -

 Key: LUCENE-5847
 URL: https://issues.apache.org/jira/browse/LUCENE-5847
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Hadas Raviv
Priority: Minor
 Fix For: 5.0

 Attachments: LUCENE-2507.patch


 The current implementation of language models in lucene is based on the paper 
 A Study of Smoothing Methods for Language Models Applied to Ad Hoc 
 Information Retrieval by Zhai and Lafferty ('01). Specifically, 
 LMDiricheltSimilarity and LMJelinikMercerSimilarity use a normalized smoothed 
 score for a matching term in a document, as suggested in the above mentioned 
 paper.
 However, lucene doesn't assign a score to query terms that do not appear in a 
 matched document. According to the pure LM approach, these terms should be 
 assigned with a collection probability background score. If one uses the 
 Jelinik Mercer smoothing method, the final result list produced by lucene is 
 rank equivalent to the one that would have been created by a full LM 
 implementation. However, this is not the case for Dirichlet smoothing method, 
 because the background score is document dependent. Documents in which not 
 all query terms appear, are missing the document-dependant background score 
 for the missing terms. This component affects the final ranking of documents 
 in the list.
 Since LM is a baseline method in many works in the IR research field, I 
 attach a patch that implements a full LM in lucene. The basic issue that 
 should be addressed here is assigning a document with a score that depends on 
 *all* the query terms, collection statistics and the document length. The 
 general idea of what I did is adding a new getBackGroundScore(int docID) 
 method to similarity, scorer and bulkScorer. Than, when a collector assigns a 
 score to a document (score = scorer.score()) I added the backgound score 
 (score=scorer.score()+scorer.background(doc)) that is assigned by the 
 similarity class used for ranking. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27188 - Failure!

2014-07-25 Thread Michael McCandless
I don't like this failure ... I'll dig.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 25, 2014 at 6:02 AM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27188/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterThreadsToSegments.testManyThreadsClose

 Error Message:
 Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, 
 state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

 Stack Trace:
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, 
 group=TGRP-TestIndexWriterThreadsToSegments]
 at 
 __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
 at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0)
 at 
 org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300)
 at 
 org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437)
 at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
 at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149)
 at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110)
 at 
 org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253)




 Build Log:
 [...truncated 1608 lines...]
[junit4] Suite: org.apache.lucene.index.TestIndexWriterThreadsToSegments
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestIndexWriterThreadsToSegments 
 -Dtests.method=testManyThreadsClose -Dtests.seed=23CFC8083A310D8C 
 -Dtests.slow=true -Dtests.locale=fr_FR -Dtests.timezone=Pacific/Midway 
 -Dtests.file.encoding=UTF-8
[junit4] ERROR   2.55s J4 | 
 TestIndexWriterThreadsToSegments.testManyThreadsClose 
[junit4] Throwable #1: 
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, 
 group=TGRP-TestIndexWriterThreadsToSegments]
[junit4]at 
 __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0)
[junit4] Caused by: java.lang.RuntimeException: 
 java.lang.NullPointerException
[junit4]at 
 __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0)
[junit4]at 
 org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258)
[junit4] Caused by: java.lang.NullPointerException
[junit4]at 
 org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300)
[junit4]at 
 org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473)
[junit4]at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437)
[junit4]at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
[junit4]at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222)
[junit4]at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149)
[junit4]at 
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110)
[junit4]at 
 org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253)
[junit4]   2 NOTE: test params are: codec=Lucene49: 
 {field=PostingsFormat(name=SimpleText)}, docValues:{}, sim=DefaultSimilarity, 
 locale=fr_FR, timezone=Pacific/Midway
[junit4]   2 NOTE: Linux 3.2.0-26-generic amd64/Oracle Corporation 
 1.7.0_55 (64-bit)/cpus=8,threads=1,free=57393592,total=242221056
[junit4]   2 NOTE: All tests run in this JVM: [TestScorerPerf, 
 TestDeterminism, TestDemo, TestAutomatonQuery, TestLookaheadTokenFilter, 
 TestTermVectorsFormat, TestBinaryDocValuesUpdates, TestSumDocFreq, 
 Test2BNumericDocValues, Nested, Nested2, TestDateSort, TestPrefixFilter, 
 ThrowInUncaught, TestCrashCausesCorruptIndex, TestDocValuesWithThreads, 
 TestLucene3xStoredFieldsFormat, InBeforeClass, InAfterClass, InTestMethod, 
 NonStringProperties, TestReaderClosed, TestBitVector, 
 TestLucene42NormsFormat, TestOmitNorms, TestQueryRescorer, 
 TestSimpleAttributeImpl, TestDocCount, TestDocIdBitSet, TestOpenBitSet, 
 TestBasics, TestNorms, 

[jira] [Created] (SOLR-6280) Collapse QParser should give error for multiValued field

2014-07-25 Thread David Smiley (JIRA)
David Smiley created SOLR-6280:
--

 Summary: Collapse QParser should give error for multiValued field
 Key: SOLR-6280
 URL: https://issues.apache.org/jira/browse/SOLR-6280
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: David Smiley
Priority: Minor


The Collapse QParser does give results if you collapse on a multi-valued field, 
but the document-value is somewhat arbitrarily chosen based on the internals 
of the FieldCache (FieldCacheImpl.SortedDocValuesCache).

Note that the Grouping functionality accesses values via 
FieldType.getValueSource which is a layer of abstraction above that includes a 
multiValued error check.

Collapse should throw an error here.

p.s. easy to test with exampledocs collapsing on cat



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6281) PostingsSolrHighlighter should be more configurable

2014-07-25 Thread David Smiley (JIRA)
David Smiley created SOLR-6281:
--

 Summary: PostingsSolrHighlighter should be more configurable
 Key: SOLR-6281
 URL: https://issues.apache.org/jira/browse/SOLR-6281
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor


The DefaultSolrHighlighter (works on non-FVH and FVH modes) and 
PostingsSolrHighlighter are quite different, although they do share some 
highlighting parameters where it's directly applicable.  DSH has its 
fragListBuilder, fragmentsBuilder, boundaryScanner, configurable by letting you 
defined your own class in solrconfig.xml.  PSH does not; it uses the Lucene 
default implementations of DefaultPassageFormatter, PassageScorer, and Java 
BreakIterator, though it configures each of them based on options. I have a 
case where I need to provide a custom PassageFormatter, for example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #663: POMs out of sync

2014-07-25 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/663/

2 tests failed.
FAILED:  
org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.handler.TestReplicationHandlerBackup: 
   1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.handler.TestReplicationHandlerBackup: 
   1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)
at __randomizedtesting.SeedInfo.seed([9D01DF06CFE991E2]:0)


FAILED:  
org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie 
threads that couldn't be terminated:
   1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at 

[jira] [Commented] (SOLR-6281) PostingsSolrHighlighter should be more configurable

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074478#comment-14074478
 ] 

David Smiley commented on SOLR-6281:


A simple good-enough solution is to make the inner class extending 
PostingsHighlighter delegate out to protected methods on PSH for getting  
initializing  the PassageFormatter and a few other things.  Then I could extend 
PostingsSolrHighlighter to override the method.

Following from that step, the components could have their classes declared in 
solrconfig.xml.  But that would probably mean a new SolrPassageFormatter class. 
 I'm not sure if I want to bother going that far down the configurability road 
right now. 

As an aside... it's a shame that these different highlighters don't share more 
abstractions and terminology that seem orthogonal.  That is, a Passage is 
conceptually the same as Fragment.  I appreciate that different highlighters 
work differently and thus have somewhat different data associated with them.

 PostingsSolrHighlighter should be more configurable
 ---

 Key: SOLR-6281
 URL: https://issues.apache.org/jira/browse/SOLR-6281
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor

 The DefaultSolrHighlighter (works on non-FVH and FVH modes) and 
 PostingsSolrHighlighter are quite different, although they do share some 
 highlighting parameters where it's directly applicable.  DSH has its 
 fragListBuilder, fragmentsBuilder, boundaryScanner, configurable by letting 
 you defined your own class in solrconfig.xml.  PSH does not; it uses the 
 Lucene default implementations of DefaultPassageFormatter, PassageScorer, and 
 Java BreakIterator, though it configures each of them based on options. I 
 have a case where I need to provide a custom PassageFormatter, for example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6282) ArrayIndexOutOfBoundsException during search

2014-07-25 Thread Jason Emeric (JIRA)
Jason Emeric created SOLR-6282:
--

 Summary: ArrayIndexOutOfBoundsException during search
 Key: SOLR-6282
 URL: https://issues.apache.org/jira/browse/SOLR-6282
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.8
Reporter: Jason Emeric
Priority: Critical


When executing a search with the following query strings a

ERROR org.apache.solr.servlet.SolrDispatchFilter  â 
null:java.lang.ArrayIndexOutOfBoundsException

error is thrown and no stack trace is provided.  This is happening on searches 
that seem to have no similar pattern to them (special characters, length, 
spaces, etc.)


q=((work_title_search:(%22+zoe%22%20)%20OR%20work_title_search:%22+zoe%22^100)%20AND%20(performer_name_search:(+big~0.75%20+b%27z%20%20)^7%20OR%20performer_name_search:%22+big%20+b%27z%20%20%22^30))


q=((work_title_search:(%22+rtb%22%20)%20OR%20work_title_search:%22+rtb%22^100)%20AND%20(performer_name_search:(+fly~0.75%20+street~0.75%20+gang~0.75%20)^7%20OR%20performer_name_search:%22+fly%20+street%20+gang%20%22^30))



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074516#comment-14074516
 ] 

David Smiley commented on LUCENE-5740:
--

Due to tag balancing of embedded elements, it doesn't look so simple after all. 
 The current implementation only strips SCRIPT  STYLE tags which have special 
rules but conveniently have no child elements.  There's no need to balance 
embedded elements because there aren't any.  But to do this more generally, if 
you strip foo, you'd want to ensure that it strips 
foobarfoohi/foo/bar/foo correctly.  Admittedly, the particular 
application I'm working on strips link text content (a) and I'm not expecting 
embedded tags of the same type... but nonetheless it seems wrong to have this 
limitation.

If it did track the state, it would just need an integer depth counter 
(tagDepthWithinStrippedTag) that would be incremented for each opening element 
and decremented for each closing element within the current tag being stripped. 
 Not bad really.

What do you think [~steve_rowe]?

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-07-25 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074514#comment-14074514
 ] 

Anshum Gupta commented on SOLR-6248:


My bad, this was my mistake. The last time I'd looked at this patch was about 
10 months ago.

This works like a component but also lets you paginate and do other stuff with 
it.
Let me check out if accepting text would make sense here (or if we could have 
something on similar lines). 

 MoreLikeThis Query Parser
 -

 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
Reporter: Anshum Gupta
 Attachments: SOLR-6248.patch


 MLT Component doesn't let people highlight/paginate and the handler comes 
 with an cost of maintaining another piece in the config. Also, any changes to 
 the default (number of results to be fetched etc.) /select handler need to be 
 copied/synced with this handler too.
 Having an MLT QParser would let users get back docs based on a query for them 
 to paginate, highlight etc. It would also give them the flexibility to use 
 this anywhere i.e. q,fq,bq etc.
 A bit of history about MLT (thanks to Hoss)
 MLT Handler pre-dates the existence of QParsers and was meant to take an 
 arbitrary query as input, find docs that match that 
 query, club them together to find interesting terms, and then use those 
 terms as if they were my main query to generate a main result set.
 This result would then be used as the set to facet, highlight etc.
 The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y)
 The MLT component on the other hand solved a very different purpose of 
 augmenting the main result set. It is used to get similar docs for each of 
 the doc in the main result set.
 DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m)
 The new approach:
 All of this can be done better and cleaner (and makes more sense too) using 
 an MLT QParser.
 An important thing to handle here is the case where the user doesn't have 
 TermVectors, in which case, it does what happens right now i.e. parsing 
 stored fields.
 Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
 field would need to be a TextField with an index analyzer defined. This 
 analyzer will then be used to extract terms for MLT.
 In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
 the schema (if TermVectors are enabled for the field). If not, a /get call 
 can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-6216:


Attachment: FacetTester.java

[~dsmiley] I used the attached Java class for running the queries. As I said, 
the dataset was geonames, added 4 times (with different IDs) so that the index 
had 33M docs total. 
The queries are all boolean queries with two OR terms, generated by taking 
terms from the “name” field of the dataset. An example: 

{noformat}
name:cemetery name:lake
name:el name:historical
name:church name:el
name:dam name:la
name:al name:church
name:al name:creek
name:baptist name:la
name:la name:mount
name:creek name:de
name:center name:park
name:church name:creek
...
{noformat}

Eyeballing the logs, most of those queries matched a high number of docs from 
the index. In addition, I had a bash script running to add documents every one 
second: 

{noformat}
#!/bin/bash
IFS='\n'
while read q   
do
echo $q  tmp.doc
curl -v 
'http://localhost:8983/solr/geonames/update?stream.file=/absolute/path/to/tmp.docstream.contentType=text/csv;charset=utf-8separator=%09encapsulator=%0Eheader=falsefieldnames=id,name,,alternatenames,latitude,longitude,feature_class,feature_code,country_code,cc2,admin1_code,admin2_code,admin3_code,admin4_code,population,elevation,dem,timezone,modification_datef.alternatenames.split=truef.alternatenames.separator=,f.alternatenames.encapsulator=%0Ef.cc2.split=trueamp;f.cc2.separator=,f.cc2.encapsulator=%0E'
sleep 1
done  allCountries.txt
{noformat}
Unfortunately, It looks like I deleted the schema file I used, however there 
was nothing crazy about it, population is an int field with docValues=true. 
autoSoftCommit is configured to run every second. 

For the second test, I can’t upload the code because it’s full of customer 
specific data, but the test is very similar. I took some production queries, 
which had “intervals” in 6 fields, around 40 “intervals” total (originally 
using facet queries for each of them). For that test I used a similar bash 
script to upload data every second too. 

I have been testing this code in an environment mirroring production for around 
2/3 weeks now and QTimes have improve dramatically (on a multi-shard 
collection). I haven’t seen errors related to this. 

 Better faceting for multiple intervals on DV fields
 ---

 Key: SOLR-6216
 URL: https://issues.apache.org/jira/browse/SOLR-6216
 Project: Solr
  Issue Type: Improvement
Reporter: Tomás Fernández Löbbe
Assignee: Erick Erickson
 Fix For: 4.10

 Attachments: FacetTester.java, SOLR-6216.patch, SOLR-6216.patch, 
 SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
 SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch


 There are two ways to have faceting on values ranges in Solr right now: 
 “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
 doing something similar:
 {code:java}
 searcher.numDocs(rangeQ , docs)
 {code}
 The good thing about this implementation is that it can benefit from caching. 
 The bad thing is that it may be slow with cold caches, and that there will be 
 a query for each of the ranges.
 A different implementation would be one that works similar to regular field 
 faceting, using doc values and validating ranges for each value of the 
 matching documents. This implementation would sometimes be faster than Range 
 Faceting / Query Faceting, specially on cases where caches are not very 
 effective, like on a high update rate, or where ranges change frequently.
 Functionally, the result should be exactly the same as the one obtained by 
 doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074546#comment-14074546
 ] 

Shai Erera commented on LUCENE-5843:


{quote}
I added public static final int IndexWriter.MAX_DOCS with the limit
set to ArrayUtil.MAX_ARRAY_LENGTH.
{quote}

But MAX_ARRAY_LENGTH is dynamic, and depends on the JRE (32/64-bit). So that's 
not fixed across JVMs, right?

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074551#comment-14074551
 ] 

Robert Muir commented on LUCENE-5843:
-

Can it be INT_MAX-8 for this reason?

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074554#comment-14074554
 ] 

Shai Erera commented on LUCENE-5843:


Yes, I think we shouldn't try to be too smart here. It can even be MAX_INT-1024 
for all practical purposes (and if we want to be on the safe side w/ int[] 
allocations), as I doubt anyone will complain he cannot put MAX_INT-1023 docs 
in an index...

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_20-ea-b23) - Build # 10771 - Failure!

2014-07-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10771/
Java: 64bit/jdk1.8.0_20-ea-b23 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

2 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingBinaryTest.testChildDoctransformer

Error Message:
Expected mime type application/octet-stream but got text/html. html head 
meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ 
titleError 500 Server Error/title /head body h2HTTP ERROR: 500/h2 
pProblem accessing /solr/collection1/select. Reason: preServer 
Error/pre/p hr /ismallPowered by Jetty:///small/i 












/body /html 

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected 
mime type application/octet-stream but got text/html. html
head
meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/
titleError 500 Server Error/title
/head
body
h2HTTP ERROR: 500/h2
pProblem accessing /solr/collection1/select. Reason:
preServer Error/pre/p
hr /ismallPowered by Jetty:///small/i




















/body
/html

at 
__randomizedtesting.SeedInfo.seed([2E3620E90BB99307:5DEC3F7387A1E401]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:513)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:281)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
org.apache.solr.client.solrj.SolrExampleTests.testChildDoctransformer(SolrExampleTests.java:1373)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 

[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074572#comment-14074572
 ] 

Michael McCandless commented on LUCENE-5843:


Sorry, my description was stale: in the patch I settled on MAX_INT - 128 as a 
defensive attempt to be hopefully well below the min value for the 
MAX_ARRAY_LENGTH over normal JVMs ...

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074575#comment-14074575
 ] 

Shai Erera commented on LUCENE-5843:


oh good, I didn't read the patch before, but I see you even explain why we 
don't use the constant from ArrayUtil! +1

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-5740:


Assignee: David Smiley

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.

2014-07-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074632#comment-14074632
 ] 

Hoss Man commented on SOLR-5776:


One of the comments steve made when opening SOLR-6254...

{quote}
I found some info about /dev/random problems on FreeBSD here: 
https://wiki.freebsd.org/201308DevSummit/Security/DevRandom, which lead me to 
/etc/rc.d/iinitrandom, which gets around the limited entropy by cat'ing a bunch 
of shit to /dev/random:
...
I think we should try the same strategy in a crontab every X minutes, to see if 
that addresses the test failures.
{quote}

miller's response to that specific suggestion...

bq. I think it's fine as a short term workaround, but not a great solution. We 
probably should just disable SSL unless we can address it in a portable way.

Here's my straw man counter proposal:

* update the solr tests so that:
** SSL randomization only happens if a tests.randomssl sys prop is set - 
default is false
*** NOTE: would mean updates to the reproduce with line formatting
*** should be updated in test-help as well
*** could be used in lucene/replicator module as well -- it already has a 
tests.jettySSL  (doh! ... not included in the reproduce line!)
** sanity check that we have at least some basic coverage of Solr w/SSL that is 
*not* randomized (ie: SSLMigrationTest and at least one new test that _always_ 
uses SSL to bring up a few nodes, index a few docs, do a query, and shuts down)
** remove most of the \@SuppressSSL annotations currently in place (should only 
be used for tests that truly *needs* to supress SSL because of the nature of 
the test: ie explicitly veryfing something about non-ssl mode)
* update the jenkins boxes to:
** have cron like steve suggests
** set tests.randomssl to true when running builds

The end result, if everything works properly should be:

* no matter who runs the tests, some basic sanity checking of SSL is done
* on our jenkins builds, we do extensive randomized testing of SSL with all the 
cloud (and lucene/replicator) functionality
* users who have enough entropy on their system can run 
{{-Dtests.randomssl=true}} if they choose.

Obviously though, before putting any work into the tests framework to support 
something like tests.randomssl as a first class sysprop,  the first baby step 
to see if this plan is even viable would be the cron steve mentioned to create 
lots of entropy -- if that doesn't work, then the whole plan is moot.



 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0

 Attachments: SOLR-5776.patch, SOLR-5776.patch


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27214 - Failure!

2014-07-25 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27214/

All tests passed

Build Log:
[...truncated 653 lines...]
   [junit4] JVM J1: stdout was not empty, see: 
/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/temp/junit4-J1-20140725_193232_926.sysout
   [junit4]  JVM J1: stdout (verbatim) 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  Internal Error (ciMethodData.cpp:142), pid=28848, 
tid=139935109641984
   [junit4] #  Error: ShouldNotReachHere()
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 
1.7.0_55-b13)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode 
linux-amd64 compressed oops)
   [junit4] # Failed to write core dump. Core dumps have been disabled. To 
enable core dumping, try ulimit -c unlimited before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # 
/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/J1/hs_err_pid28848.log
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
   [junit4] #
   [junit4]  JVM J1: EOF 

[...truncated 955 lines...]
   [junit4] ERROR: JVM J1 ended with an exception, command line: 
/var/lib/jenkins/tools/hudson.model.JDK/Java_7_64bit_u55/jre/bin/java 
-Dtests.prefix=tests -Dtests.seed=21E34C94184201BA -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.10 
-Dtests.cleanthreads=perMethod 
-Djava.util.logging.config.file=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false 
-Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 
-DtempDir=. -Djava.io.tmpdir=. 
-Djunit4.tempDir=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/temp
 
-Dclover.db.dir=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.10-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.leaveTemporary=false -Dtests.filterstacks=true -classpath 

[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074644#comment-14074644
 ] 

Steve Rowe commented on LUCENE-5740:


bq. If it did track the state, it would just need an integer depth counter 
(tagDepthWithinStrippedTag) that would be incremented for each opening element 
and decremented for each closing element within the current tag being stripped.

Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an 
error, and maximize useful extracted content).  Assuming you'll see closing 
tags could be a problem here; some HTML doesn't have these in some cases.

It might be better to just track nested tags of the same type as the current 
tag being stripped, rather than all tags - the other contained tags should be 
ignorable, I think.  (This condition - nested same-type tags - should be fairly 
rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.)

The other thing to worry about is the possible lack of closing tags for a tag 
the contents of which are to be stripped.  I'm not sure how to handle this - 
maybe look at how other HTML parsers do it?  (I.e., how to limit scope of 
never-closed tags.)


 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074644#comment-14074644
 ] 

Steve Rowe edited comment on LUCENE-5740 at 7/25/14 5:39 PM:
-

bq. If it did track the state, it would just need an integer depth counter 
(tagDepthWithinStrippedTag) that would be incremented for each opening element 
and decremented for each closing element within the current tag being stripped.

Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an 
error, and maximize useful extracted content) non-well-formed content.  
Assuming you'll see closing tags could be a problem here; some HTML doesn't 
have these in some cases.

It might be better to just track nested tags of the same type as the current 
tag being stripped, rather than all tags - the other contained tags should be 
ignorable, I think.  (This condition - nested same-type tags - should be fairly 
rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.)

The other thing to worry about is the possible lack of closing tags for a tag 
the contents of which are to be stripped.  I'm not sure how to handle this - 
maybe look at how other HTML parsers do it?  (I.e., how to limit scope of 
never-closed tags.)



was (Author: steve_rowe):
bq. If it did track the state, it would just need an integer depth counter 
(tagDepthWithinStrippedTag) that would be incremented for each opening element 
and decremented for each closing element within the current tag being stripped.

Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an 
error, and maximize useful extracted content).  Assuming you'll see closing 
tags could be a problem here; some HTML doesn't have these in some cases.

It might be better to just track nested tags of the same type as the current 
tag being stripped, rather than all tags - the other contained tags should be 
ignorable, I think.  (This condition - nested same-type tags - should be fairly 
rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.)

The other thing to worry about is the possible lack of closing tags for a tag 
the contents of which are to be stripped.  I'm not sure how to handle this - 
maybe look at how other HTML parsers do it?  (I.e., how to limit scope of 
never-closed tags.)


 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074648#comment-14074648
 ] 

Steve Rowe commented on LUCENE-5740:


David, the other thing I worry about is: a fully fledged version of this would 
allow finer-grained specification of tags, along the lines of XPath, but that 
would be a much much bigger task... I don't think such a goal should hold up 
what you're thinking about.

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074743#comment-14074743
 ] 

David Smiley commented on LUCENE-5740:
--

bq. David, the other thing I worry about is: a fully fledged version of this 
would allow finer-grained specification of tags, along the lines of XPath, but 
that would be a much much bigger task... I don't think such a goal should hold 
up what you're thinking about.

Yeah I thought of that and agree it's not worth worrying about right now.  My 
initial use of this will strip generated HTML that is already fairly clean, and 
will strip these tags purely by element name.  I have no need/plans for more 
complicated matching.

bq. Don't forget that HTMLStripCharFilter must be able to handle (i.e. not 
throw an error, and maximize useful extracted content) non-well-formed content. 
Assuming you'll see closing tags could be a problem here; some HTML doesn't 
have these in some cases.

If a tag that is to be stripped opens, then I propose the next close tag at the 
same level (whatever it's name may be)  is where the strip ends:
{noformat}body bodyStart p paraStart foo   bbold/b   paraEnd /p 
bodyEnd/body{noformat}
Notice there is no {{/foo}}.  Stripping tag foo would yield only the text 
tokens bodyStart, paraStart, and bodyEnd.  I think it's not realistic to expect 
better than that, not to mention that this issue is optional and would come 
with disclaimers on this matter.

bq. It might be better to just track nested tags of the same type as the 
current tag being stripped, rather than all tags

I don't think that adds any value (at least I don't see it yet), and it hurts 
the bad-html case like the foo example above.  In that same example, only 
same-name tags would mean that bodyEnd would not get emitted.  Right?

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074752#comment-14074752
 ] 

David Smiley commented on LUCENE-5740:
--

I think I may get your concern.   If the tag that I was stripping was paragraph 
p instead of foo, then the paragraph stripping would continue on to /body.  
  So it may appear that the stripping should end at the *sooner* of a closing 
tag depth, or a *matching* close of the current element name.  A *matching* 
close means I need to keep track of two embedded tag depth integers, one for 
any element name, one for those that have the same name as what I'm stripping.  
Yeah?

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074764#comment-14074764
 ] 

Steve Rowe commented on LUCENE-5740:


Real HTML is more complicated: e.g. p within p is not allowed (or rather is 
parsed as sibling non-closed elements).

Some pertinent discussion here in the javadocs of the Jericho HTML parser: 
http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/Element.html.

In particular, the Single Tag Element and Implicitly Terminated Element 
sections, and the link in the latter section in the sentence See the element 
parsing rules for HTML elements with optional end tags for details on which 
tags can implicitly terminate a given element.

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk7) - Build # 10772 - Still Failing!

2014-07-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10772/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings

Error Message:
some thread(s) failed

Stack Trace:
java.lang.RuntimeException: some thread(s) failed
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:535)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:946)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:853)




Build Log:
[...truncated 5726 lines...]
   [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
   [junit4]   2 TEST FAIL: useCharFilter=true text='ctnyf 
\ud8e6\ude46\udbfb\ude9d\ud9b1\udcbf cynpqojvoxa 
\u0d61\u0d58\u0d73\u0d6e\u0d31\u0d37\u0d31\u0d54 cprsu wvzjus'
   [junit4]   2 TEST FAIL: useCharFilter=true text='uukvyuql  z{1,5}hzk 
\u077f\u075c\u075d\u0774'
   

[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter

2014-07-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074785#comment-14074785
 ] 

David Smiley commented on LUCENE-5740:
--

Yeah... to do this feature right, it needs to know about these cases.  Then 
perhaps the stripping part might be easier if it has an accurate depth, as it 
wouldn't include implicitly closing elements (e.g. IMG).  But in the end this 
feature is optional so a best effort attempt with rules about known HTML tags 
is fine.  I agree this means looking for a close of the same element name.  
I'll try working on something this weekend, or Monday.

 Add stripContentOfTags option to HTMLStripCharFilter
 

 Key: LUCENE-5740
 URL: https://issues.apache.org/jira/browse/LUCENE-5740
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley

 HTMLStripCharFilter should have an option to strip out the sub-content of 
 certain elements. It already does this for SCRIPT  STYLE but it should be 
 configurable to add more.  I don't want certain elements to have their 
 contents to be searchable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6163) special chars and ManagedSynonymFilterFactory

2014-07-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074952#comment-14074952
 ] 

Hoss Man commented on SOLR-6163:


A quick glance at timo's patch and the javadocs for the assocaited restlet 
classes seems to suggest that this is correct general course of action...

http://restlet.com/learn/javadocs/2.1/jse/api/org/restlet/data/Reference.html#getPath%28%29
Note that no URI decoding is done by this method. 

A cleaner fix is probably to use this alternative restlet method that an decode 
for us ...

http://restlet.com/learn/javadocs/2.1/jse/api/org/restlet/data/Reference.html#getPath%28boolean%29

There are lots of similar Note that no URI decoding is done by this method. 
and Returns the optionnally decoded __ combinations in the Request class 
-- we should probably audit all of our usages of this class.


 special chars and ManagedSynonymFilterFactory
 -

 Key: SOLR-6163
 URL: https://issues.apache.org/jira/browse/SOLR-6163
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Wim Kumpen

 Hey,
 I was playing with the ManagedSynonymFilterFactory to create a synonym list 
 with the API. But I have difficulties when my keys contains special 
 characters (or spaces) to delete them...
 I added a key ééé that matches with some other words. It's saved in the 
 synonym file as ééé.
 When I try to delete it, I do:
 curl -X DELETE 
 http://localhost/solr/mycore/schema/analysis/synonyms/english/ééé;
 error message: %C3%A9%C3%A9%C3%A9%C2%B5 not found in 
 /schema/analysis/synonyms/english
 A wild guess from me is that %C3%A9 isn't decoded back to ééé. And that's why 
 he can't find the keyword?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5848) StopwordAnalyzerBase hides Analyzer's constructor with ReuseStrategy

2014-07-25 Thread JIRA
Claude Quézel created LUCENE-5848:
-

 Summary: StopwordAnalyzerBase hides Analyzer's constructor with 
ReuseStrategy
 Key: LUCENE-5848
 URL: https://issues.apache.org/jira/browse/LUCENE-5848
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 4.9
Reporter: Claude Quézel
Priority: Minor


StopwordAnalyzerBase hides Analyzer's constructor with ReuseStrategy.

It is not possible to set a ReuseStrategy when extending from 
StopwordAnalyzerBase.

Fix is trivial to add two extra constructors to StopwordAnalyzerBase.

Same is true for all classes extending from StopwordAnalyzerBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 584 - Still Failing

2014-07-25 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/584/

1 tests failed.
FAILED:  org.apache.solr.cloud.OverseerTest.testOverseerFailure

Error Message:
Could not register as the leader because creating the ephemeral registration 
node in ZooKeeper failed

Stack Trace:
org.apache.solr.common.SolrException: Could not register as the leader because 
creating the ephemeral registration node in ZooKeeper failed
at 
__randomizedtesting.SeedInfo.seed([3C225866AD0181FE:382AD795BFA46EDF]:0)
at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:144)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:155)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:314)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221)
at 
org.apache.solr.cloud.OverseerTest$MockZKController.publishState(OverseerTest.java:155)
at 
org.apache.solr.cloud.OverseerTest.testOverseerFailure(OverseerTest.java:660)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5843:
---

Attachment: LUCENE-5843.patch

New patch, I think it's ready: I resolved the nocommits, and removed
Test2BDocs (I think its tests are folded into
TestIndexWriterMaxDocs).


 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch, LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075135#comment-14075135
 ] 

Hoss Man commented on LUCENE-5843:
--

Mike: couple quick suggestions...

* {code}
private static int actualMaxDocs = MAX_DOCS;
static void setMaxDocs(int docs) {
  if (MAX_DOCS  docs) throw UglyException
  actualMaxDocs = docs;
}
{code}...that way some poor bastard who sees it in the code and tries to be 
crafty and add a stub class to that package to set it to Integer.MAX_VALUE will 
get an immedaite error instead of a timebomb.
* add a *public* method to the test-framework that wraps this package protected 
setter, so that _tests_ in other packages besides {{org.apache.lucene.index}} 
can mutate this.
** then we can add tests for clean behavior in solr as well (not to mention 
anybody else who writes a Lucene app and wants to test for how their app 
behaves when the index gets too big w/o adding an {{org/apache/lucene/index}} 
dir to their test source

 IndexWriter should refuse to create an index with more than INT_MAX docs
 

 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5843.patch, LUCENE-5843.patch


 It's more and more common for users these days to create very large indices, 
 e.g.  indexing lines from log files, or packets on a network, etc., and it's 
 not hard to accidentally exceed the maximum number of documents in one index.
 I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
 value as a sentinel during searching.
 I'm not sure what IW does today if you create a too-big index but it's 
 probably horrible; it may succeed and then at search time you hit nasty 
 exceptions when we overflow int.
 I think it should throw an IndexFullException instead.  It'd be nice if we 
 could do this on the very doc that when added would go over the limit, but I 
 would also settle for just throwing at flush as well ... i.e. I think what's 
 really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2014-07-25 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---

Attachment: SOLR-2894.patch

Making good progress (only ~1600 lines of diff left to review!)

updates in this patch...

* PivotFacetFieldValueCollection
** some javadocs
** refactor away method: nonNullValueIterator()
*** only called in one place

* PivotFacetField
** some javaddocs
** made createFromListOfNamedLists smart enough to return null on null input
*** simplified PivotFacetValue.createFromNamedList
** made contributeFromShard smart enough to be a no-op on null input
*** simplified all callers (PivotFacet  PivotFacetValue)
** made some vars final where possible via refactoring constructor  
createFromListOfNamedLists
** refactor skipRefinementAtThisLevel out of the method an up to an instance 
var since it never changes once the facet params are set in the constructor
** consolidate skipRefinementAtThisLevel + hasBeenRefined into a single var: 
needRefinementAtThisLevel
** simplify BitSet iteration (nextSetBit is always  length)
*** processDefiniteCandidateElement
*** processPossibleCandidateElement

* PivotFacetValue
** some javadocs
** made variables private and added method accessors (w/jdocs) as needed
*** updated other classes as needed to call these new methods instead of the 
old pub vars
** made some vars final where possible via refactoring createFromNamedList  
constructor

* PivotFacet
** some javadocs
** added getQueuedRefinements(int)
** made some variables final where possible
** renamed noRefinementsRequired - isRefinementsRequired
** eliminate unused method: areAnyRefinementsQueued

* FacetComponent
** switched direct use of PivotFacet.queuedRefinements to use 
PivotFacet.getQueuedRefinements
*** simplified error checking in several places



One new question i want to go back and revisit later...

* do we really need to track knownShards in PivotFacet ?
** ResponseBuilder already maintains a String[] of all shards, getShardNum 
derived from it
** can't we just loop from 0 to shards.length? does it ever matter if a shard 
hasn't participated?
** ie: is it really important that we skip any unset bits in knownShards when 
looping?  (all the current usages seem safe even if a shard has no data for the 
current pivot)



 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Hoss Man
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-mincount-minification.patch, 
 SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, 
 dateToObject.patch, pivot_mincount_problem.sh


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6283) Add support for Interval Faceting in SolrJ

2014-07-25 Thread JIRA
Tomás Fernández Löbbe created SOLR-6283:
---

 Summary: Add support for Interval Faceting in SolrJ
 Key: SOLR-6283
 URL: https://issues.apache.org/jira/browse/SOLR-6283
 Project: Solr
  Issue Type: Improvement
Reporter: Tomás Fernández Löbbe


Interval Faceting was added in SOLR-6216. Add support for it in SolrJ



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

2014-07-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5244:
-

Description: 
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

*Large Scale Time Series Rollups*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on time dimensions. The client merge joins the sorted result sets and 
rolls up the time dimensions as it iterates through the data.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.







  was:
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.

*Large Scale Time Series Rollups*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on time dimensions. The client merge joins the sorted result sets and 
rolls up the time dimensions as it iterates through the data.







 Exporting Full Sorted Result Sets
 -

 Key: SOLR-5244
 URL: https://issues.apache.org/jira/browse/SOLR-5244
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.10

 Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch


 This ticket allows Solr to export full sorted result sets. The proposed 
 syntax is:
 {code}
 q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
 {code}
 Under the covers, the rows=-1 parameter will signal Solr to use the 
 ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
 results. The SortingResponseWriter will sort the results based on the sort 
 criteria and stream the results out.
 This capability will open up Solr for a whole range of uses that were 
 typically done using aggregation engines like Hadoop. For example:
 *Large Distributed Joins*
 A client outside of Solr calls two different Solr collections and returns the 
 results sorted by a join key. The client iterates through both streams and 
 performs a merge join.
 *Fully Distributed Field 

[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

2014-07-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5244:
-

Description: 
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.

*Large Scale Time Series Rollups*

A client outside Solr makes individual calls to all servers in a collection and 
sorts on time dimensions. The client merge joins the sorted result sets and 
rolls up the time dimensions as it iterates through the data.






  was:
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.







 Exporting Full Sorted Result Sets
 -

 Key: SOLR-5244
 URL: https://issues.apache.org/jira/browse/SOLR-5244
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.10

 Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch


 This ticket allows Solr to export full sorted result sets. The proposed 
 syntax is:
 {code}
 q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc
 {code}
 Under the covers, the rows=-1 parameter will signal Solr to use the 
 ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
 results. The SortingResponseWriter will sort the results based on the sort 
 criteria and stream the results out.
 This capability will open up Solr for a whole range of uses that were 
 typically done using aggregation engines like Hadoop. For example:
 *Large Distributed Joins*
 A client outside of Solr calls two different Solr collections and returns the 
 results sorted by a join key. The client iterates through both streams and 
 performs a merge join.
 *Fully Distributed Field Collapsing/Grouping*
 A client outside of Solr makes individual calls to all the servers in a 
 single collection and returns results sorted by the collapse key. The client 
 merge joins the sorted lists on the collapse key to perform the field 
 collapse.
 *High Cardinality