[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074116#comment-14074116 ] Tomás Fernández Löbbe commented on SOLR-2366: - I didn't see this Jira before (just saw this now that I was updating the faceting wiki). Part of what's described here can be achieved by Interval Faceting (SOLR-6216). Implementation is different though, because it relies in DocValues instead of filters. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 1696 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/1696/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC 7 tests failed. REGRESSION: org.apache.solr.client.solrj.SolrExampleBinaryTest.testExampleConfig Error Message: Expected mime type application/octet-stream but got text/html. html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 500 {msg=Could not initialize class java.lang.UNIXProcess,trace=java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) at java.lang.Runtime.exec(Runtime.java:617) at java.lang.Runtime.exec(Runtime.java:450) at java.lang.Runtime.exec(Runtime.java:347) at org.apache.solr.handler.admin.SystemInfoHandler.execute(SystemInfoHandler.java:220) at org.apache.solr.handler.admin.SystemInfoHandler.getSystemInfo(SystemInfoHandler.java:176) at org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:97) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandler.java:88) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:137) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) ,code=500}/title /head body h2HTTP ERROR: 500/h2 pProblem accessing /solr/admin/info/system. Reason: pre{msg=Could not initialize class java.lang.UNIXProcess,trace=java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) at java.lang.Runtime.exec(Runtime.java:617) at java.lang.Runtime.exec(Runtime.java:450) at java.lang.Runtime.exec(Runtime.java:347) at org.apache.solr.handler.admin.SystemInfoHandler.execute(SystemInfoHandler.java:220) at org.apache.solr.handler.admin.SystemInfoHandler.getSystemInfo(SystemInfoHandler.java:176) at org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:97) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandler.java:88) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27188 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27188/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterThreadsToSegments.testManyThreadsClose Error Message: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258) Caused by: java.lang.NullPointerException at org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300) at org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253) Build Log: [...truncated 1608 lines...] [junit4] Suite: org.apache.lucene.index.TestIndexWriterThreadsToSegments [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterThreadsToSegments -Dtests.method=testManyThreadsClose -Dtests.seed=23CFC8083A310D8C -Dtests.slow=true -Dtests.locale=fr_FR -Dtests.timezone=Pacific/Midway -Dtests.file.encoding=UTF-8 [junit4] ERROR 2.55s J4 | TestIndexWriterThreadsToSegments.testManyThreadsClose [junit4] Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] [junit4]at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0) [junit4] Caused by: java.lang.RuntimeException: java.lang.NullPointerException [junit4]at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0) [junit4]at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258) [junit4] Caused by: java.lang.NullPointerException [junit4]at org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300) [junit4]at org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473) [junit4]at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437) [junit4]at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) [junit4]at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222) [junit4]at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149) [junit4]at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110) [junit4]at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253) [junit4] 2 NOTE: test params are: codec=Lucene49: {field=PostingsFormat(name=SimpleText)}, docValues:{}, sim=DefaultSimilarity, locale=fr_FR, timezone=Pacific/Midway [junit4] 2 NOTE: Linux 3.2.0-26-generic amd64/Oracle Corporation 1.7.0_55 (64-bit)/cpus=8,threads=1,free=57393592,total=242221056 [junit4] 2 NOTE: All tests run in this JVM: [TestScorerPerf, TestDeterminism, TestDemo, TestAutomatonQuery, TestLookaheadTokenFilter, TestTermVectorsFormat, TestBinaryDocValuesUpdates, TestSumDocFreq, Test2BNumericDocValues, Nested, Nested2, TestDateSort, TestPrefixFilter, ThrowInUncaught, TestCrashCausesCorruptIndex, TestDocValuesWithThreads, TestLucene3xStoredFieldsFormat, InBeforeClass, InAfterClass, InTestMethod, NonStringProperties, TestReaderClosed, TestBitVector, TestLucene42NormsFormat, TestOmitNorms, TestQueryRescorer, TestSimpleAttributeImpl, TestDocCount, TestDocIdBitSet, TestOpenBitSet, TestBasics, TestNorms, TestCompoundFile, TestIndexWriterUnicode, TestBufferedIndexInput, TestConsistentFieldNumbers, TestLockFactory, TestSegmentMerger, TestIndexWriterNRTIsCurrent, TestFieldsReader, TestDocValuesIndexing, TestHugeRamFile, TestSpanSearchEquivalence,
[jira] [Created] (SOLR-6278) add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional
Christine Poerschke created SOLR-6278: - Summary: add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional Key: SOLR-6278 URL: https://issues.apache.org/jira/browse/SOLR-6278 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke To add {{core=...}} as an alternative to {{replica=...}} way of identifying what is to be deleted, {{collection=...}} and {{shard=...}} to be optional provided the other parameters uniquely identify exactly one deletion target. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-6278: add admin/collections?action=...
GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/71 SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional https://issues.apache.org/jira/i#browse/SOLR-6278 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-DELETEREPICA-by-core-name Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/71.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #71 commit 93bb1104bff3c0d7a30d8f8e0bf2d2989f7ce79c Author: Christine Poerschke cpoersc...@bloomberg.net Date: 2014-07-09T10:39:35Z solr: add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6278) add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional
[ https://issues.apache.org/jira/browse/SOLR-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074301#comment-14074301 ] ASF GitHub Bot commented on SOLR-6278: -- GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/71 SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support SOLR-6278: add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional https://issues.apache.org/jira/i#browse/SOLR-6278 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-DELETEREPICA-by-core-name Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/71.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #71 commit 93bb1104bff3c0d7a30d8f8e0bf2d2989f7ce79c Author: Christine Poerschke cpoersc...@bloomberg.net Date: 2014-07-09T10:39:35Z solr: add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional add admin/collections?action=DELETEREPLICAcore=... support, make collection=... and shard=... parameters optional -- Key: SOLR-6278 URL: https://issues.apache.org/jira/browse/SOLR-6278 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke To add {{core=...}} as an alternative to {{replica=...}} way of identifying what is to be deleted, {{collection=...}} and {{shard=...}} to be optional provided the other parameters uniquely identify exactly one deletion target. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-6279: cores?action=UNLOAD can unreg...
GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/72 SOLR-6279: cores?action=UNLOAD can unregister unclosed core https://issues.apache.org/jira/i#browse/SOLR-6279 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-UNLOAD-can-unregister-unclosed-close Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 commit 2a80ca5dfb47b7cd0416be2ae72bade0fe8f3ad0 Author: Christine Poerschke cpoersc...@bloomberg.net Date: 2014-07-22T12:07:58Z solr: cores?action=UNLOAD can unregister unclosed core Changing CoreContainer.unload to wait for core to close before unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6279) cores?action=UNLOAD can unregister unclosed core
Christine Poerschke created SOLR-6279: - Summary: cores?action=UNLOAD can unregister unclosed core Key: SOLR-6279 URL: https://issues.apache.org/jira/browse/SOLR-6279 Project: Solr Issue Type: Bug Reporter: Christine Poerschke baseline: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data /somewhere/instanceA/collection1_shard2/core.properties /somewhere/instanceA/collection1_shard2/data /somewhere/instanceB {code} actions: {code} curl http://host:port/solr/admin/cores?action=UNLOADcore=collection1_shard2; # since UNLOAD completed we should now be free to move the unloaded core's files as we wish mv /somewhere/instanceA/collection1_shard2 /somewhere/instanceB/collection1_shard2 {code} expected result: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data # collection1_shard2 files have been fully relocated /somewhere/instanceB/collection1_shard2/core.properties.unloaded /somewhere/instanceB/collection1_shard2/data {code} actual result: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data /somewhere/instanceA/collection1_shard2/data # collection1_shard2 files have not been fully relocated and/or some files were left behind in instanceA because the UNLOAD action had returned prior to the core being closed /somewhere/instanceB/collection1_shard2/core.properties.unloaded /somewhere/instanceB/collection1_shard2/data {code} +proposed fix:+ Changing CoreContainer.unload to wait for core to close before unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6279) cores?action=UNLOAD can unregister unclosed core
[ https://issues.apache.org/jira/browse/SOLR-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074307#comment-14074307 ] ASF GitHub Bot commented on SOLR-6279: -- GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/72 SOLR-6279: cores?action=UNLOAD can unregister unclosed core https://issues.apache.org/jira/i#browse/SOLR-6279 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-UNLOAD-can-unregister-unclosed-close Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 commit 2a80ca5dfb47b7cd0416be2ae72bade0fe8f3ad0 Author: Christine Poerschke cpoersc...@bloomberg.net Date: 2014-07-22T12:07:58Z solr: cores?action=UNLOAD can unregister unclosed core Changing CoreContainer.unload to wait for core to close before unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores. cores?action=UNLOAD can unregister unclosed core Key: SOLR-6279 URL: https://issues.apache.org/jira/browse/SOLR-6279 Project: Solr Issue Type: Bug Reporter: Christine Poerschke baseline: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data /somewhere/instanceA/collection1_shard2/core.properties /somewhere/instanceA/collection1_shard2/data /somewhere/instanceB {code} actions: {code} curl http://host:port/solr/admin/cores?action=UNLOADcore=collection1_shard2; # since UNLOAD completed we should now be free to move the unloaded core's files as we wish mv /somewhere/instanceA/collection1_shard2 /somewhere/instanceB/collection1_shard2 {code} expected result: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data # collection1_shard2 files have been fully relocated /somewhere/instanceB/collection1_shard2/core.properties.unloaded /somewhere/instanceB/collection1_shard2/data {code} actual result: {code} /somewhere/instanceA/collection1_shard1/core.properties /somewhere/instanceA/collection1_shard1/data /somewhere/instanceA/collection1_shard2/data # collection1_shard2 files have not been fully relocated and/or some files were left behind in instanceA because the UNLOAD action had returned prior to the core being closed /somewhere/instanceB/collection1_shard2/core.properties.unloaded /somewhere/instanceB/collection1_shard2/data {code} +proposed fix:+ Changing CoreContainer.unload to wait for core to close before unregistering it from ZK. Adding testMidUseUnload method to TestLazyCores. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport
[ https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074333#comment-14074333 ] ASF subversion and git services commented on SOLR-5847: --- Commit 1613406 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1613406 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements The Admin GUI doesn't allow to abort a running dataimport - Key: SOLR-5847 URL: https://issues.apache.org/jira/browse/SOLR-5847 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, web gui Affects Versions: 4.7 Reporter: Paco Garcia Priority: Minor With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error on POST requests with no Content-Type), the jquery invocation to abort a running dataimport fails with HTTP error code 415. The method POST should have some content in the body See comments in SOLR-5517 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3622) DIH should not do rollbacks.
[ https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074332#comment-14074332 ] ASF subversion and git services commented on SOLR-3622: --- Commit 1613406 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1613406 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements DIH should not do rollbacks. Key: SOLR-3622 URL: https://issues.apache.org/jira/browse/SOLR-3622 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Reporter: Mark Miller Assignee: Erik Hatcher Fix For: 5.0, 4.10 This is not playing nice. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6269) Change rollback to error in DIH
[ https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074335#comment-14074335 ] ASF subversion and git services commented on SOLR-6269: --- Commit 1613406 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1613406 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements Change rollback to error in DIH --- Key: SOLR-6269 URL: https://issues.apache.org/jira/browse/SOLR-6269 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.9 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0, 4.10 Attachments: SOLR-6269.patch Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud mode, let's rename most things rollback to error, such as the new onRollback handler. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6194) Allow access to DataImporter and DIHConfiguration
[ https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074334#comment-14074334 ] ASF subversion and git services commented on SOLR-6194: --- Commit 1613406 from [~ehatcher] in branch 'dev/trunk' [ https://svn.apache.org/r1613406 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements Allow access to DataImporter and DIHConfiguration - Key: SOLR-6194 URL: https://issues.apache.org/jira/browse/SOLR-6194 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.10 Reporter: Aaron LaBella Assignee: Shalin Shekhar Mangar Fix For: 4.10 Attachments: SOLR-6194.patch, SOLR-6194.patch Original Estimate: 2h Remaining Estimate: 2h I'd like to change the visibility and access to a couple of the internal classes of DataImportHandler, specifically DataImporter and DIHConfiguration. My reasoning is that I've added the ability for a new data import handler command called *getquery* that will return the exact queries (fully resolved) that are executed for an entity within the data import configuration. This makes it much easier to debug the dih, rather than turning on debug/verbose flags and digging through the raw response. Additionally, it gives me a service that I can then go take the queries from and run them. Here's a snippet of Java code that I can now execute now that I have access to the DIHConfiguration: {code:title=Snippet.java|borderStyle=solid} /** * @return a map of all the queries for each entity in the given config */ protected MapString,String getEntityQueries(DIHConfiguration config, MapString,Object params) { MapString,String queries = new LinkedHashMap(); if (config != null config.getEntities() != null) { //make a new variable resolve VariableResolver vr = new VariableResolver(); vr.addNamespace(dataimporter.request,params); //for each entity for (Entity e : config.getEntities()) { //get the query and resolve it if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY)) { String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY); query = query.replaceAll(\\s+, ).trim(); String resolved = vr.replaceTokens(query); resolved = resolved.replaceAll(\\s+, ).trim(); queries.put(e.getName(),resolved); queries.put(e.getName()+_raw,query); } } } return queries; } {code} I'm attaching a patch that I would appreciate someone have a look for consideration. It's fully tested -- please let me know if there is something else I need to do/test. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3622) DIH should not do rollbacks.
[ https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074336#comment-14074336 ] ASF subversion and git services commented on SOLR-3622: --- Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613409 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements (merged from r1613406) DIH should not do rollbacks. Key: SOLR-3622 URL: https://issues.apache.org/jira/browse/SOLR-3622 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Reporter: Mark Miller Assignee: Erik Hatcher Fix For: 5.0, 4.10 This is not playing nice. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport
[ https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074337#comment-14074337 ] ASF subversion and git services commented on SOLR-5847: --- Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613409 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements (merged from r1613406) The Admin GUI doesn't allow to abort a running dataimport - Key: SOLR-5847 URL: https://issues.apache.org/jira/browse/SOLR-5847 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, web gui Affects Versions: 4.7 Reporter: Paco Garcia Priority: Minor With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error on POST requests with no Content-Type), the jquery invocation to abort a running dataimport fails with HTTP error code 415. The method POST should have some content in the body See comments in SOLR-5517 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6269) Change rollback to error in DIH
[ https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074340#comment-14074340 ] ASF subversion and git services commented on SOLR-6269: --- Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613409 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements (merged from r1613406) Change rollback to error in DIH --- Key: SOLR-6269 URL: https://issues.apache.org/jira/browse/SOLR-6269 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.9 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0, 4.10 Attachments: SOLR-6269.patch Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud mode, let's rename most things rollback to error, such as the new onRollback handler. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6194) Allow access to DataImporter and DIHConfiguration
[ https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074339#comment-14074339 ] ASF subversion and git services commented on SOLR-6194: --- Commit 1613409 from [~ehatcher] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613409 ] SOLR-3622, SOLR-5847, SOLR-6194, SOLR-6269: Several DIH fixes/improvements (merged from r1613406) Allow access to DataImporter and DIHConfiguration - Key: SOLR-6194 URL: https://issues.apache.org/jira/browse/SOLR-6194 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.10 Reporter: Aaron LaBella Assignee: Shalin Shekhar Mangar Fix For: 4.10 Attachments: SOLR-6194.patch, SOLR-6194.patch Original Estimate: 2h Remaining Estimate: 2h I'd like to change the visibility and access to a couple of the internal classes of DataImportHandler, specifically DataImporter and DIHConfiguration. My reasoning is that I've added the ability for a new data import handler command called *getquery* that will return the exact queries (fully resolved) that are executed for an entity within the data import configuration. This makes it much easier to debug the dih, rather than turning on debug/verbose flags and digging through the raw response. Additionally, it gives me a service that I can then go take the queries from and run them. Here's a snippet of Java code that I can now execute now that I have access to the DIHConfiguration: {code:title=Snippet.java|borderStyle=solid} /** * @return a map of all the queries for each entity in the given config */ protected MapString,String getEntityQueries(DIHConfiguration config, MapString,Object params) { MapString,String queries = new LinkedHashMap(); if (config != null config.getEntities() != null) { //make a new variable resolve VariableResolver vr = new VariableResolver(); vr.addNamespace(dataimporter.request,params); //for each entity for (Entity e : config.getEntities()) { //get the query and resolve it if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY)) { String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY); query = query.replaceAll(\\s+, ).trim(); String resolved = vr.replaceTokens(query); resolved = resolved.replaceAll(\\s+, ).trim(); queries.put(e.getName(),resolved); queries.put(e.getName()+_raw,query); } } } return queries; } {code} I'm attaching a patch that I would appreciate someone have a look for consideration. It's fully tested -- please let me know if there is something else I need to do/test. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport
[ https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-5847. Resolution: Fixed Assignee: Erik Hatcher The Admin GUI doesn't allow to abort a running dataimport - Key: SOLR-5847 URL: https://issues.apache.org/jira/browse/SOLR-5847 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, web gui Affects Versions: 4.7 Reporter: Paco Garcia Assignee: Erik Hatcher Priority: Minor With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error on POST requests with no Content-Type), the jquery invocation to abort a running dataimport fails with HTTP error code 415. The method POST should have some content in the body See comments in SOLR-5517 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6269) Change rollback to error in DIH
[ https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-6269. Resolution: Fixed Change rollback to error in DIH --- Key: SOLR-6269 URL: https://issues.apache.org/jira/browse/SOLR-6269 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.9 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0, 4.10 Attachments: SOLR-6269.patch Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud mode, let's rename most things rollback to error, such as the new onRollback handler. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5847) The Admin GUI doesn't allow to abort a running dataimport
[ https://issues.apache.org/jira/browse/SOLR-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074342#comment-14074342 ] Erik Hatcher commented on SOLR-5847: I simply changed the method to GET instead of POST. The Admin GUI doesn't allow to abort a running dataimport - Key: SOLR-5847 URL: https://issues.apache.org/jira/browse/SOLR-5847 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, web gui Affects Versions: 4.7 Reporter: Paco Garcia Assignee: Erik Hatcher Priority: Minor With the changes introduced in 4.7.0 Release by SOLR-5517 (Return HTTP error on POST requests with no Content-Type), the jquery invocation to abort a running dataimport fails with HTTP error code 415. The method POST should have some content in the body See comments in SOLR-5517 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6194) Allow access to DataImporter and DIHConfiguration
[ https://issues.apache.org/jira/browse/SOLR-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-6194. Resolution: Fixed Fix Version/s: 5.0 Allow access to DataImporter and DIHConfiguration - Key: SOLR-6194 URL: https://issues.apache.org/jira/browse/SOLR-6194 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.10 Reporter: Aaron LaBella Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.10 Attachments: SOLR-6194.patch, SOLR-6194.patch Original Estimate: 2h Remaining Estimate: 2h I'd like to change the visibility and access to a couple of the internal classes of DataImportHandler, specifically DataImporter and DIHConfiguration. My reasoning is that I've added the ability for a new data import handler command called *getquery* that will return the exact queries (fully resolved) that are executed for an entity within the data import configuration. This makes it much easier to debug the dih, rather than turning on debug/verbose flags and digging through the raw response. Additionally, it gives me a service that I can then go take the queries from and run them. Here's a snippet of Java code that I can now execute now that I have access to the DIHConfiguration: {code:title=Snippet.java|borderStyle=solid} /** * @return a map of all the queries for each entity in the given config */ protected MapString,String getEntityQueries(DIHConfiguration config, MapString,Object params) { MapString,String queries = new LinkedHashMap(); if (config != null config.getEntities() != null) { //make a new variable resolve VariableResolver vr = new VariableResolver(); vr.addNamespace(dataimporter.request,params); //for each entity for (Entity e : config.getEntities()) { //get the query and resolve it if (e.getAllAttributes().containsKey(SqlEntityProcessor.QUERY)) { String query = e.getAllAttributes().get(SqlEntityProcessor.QUERY); query = query.replaceAll(\\s+, ).trim(); String resolved = vr.replaceTokens(query); resolved = resolved.replaceAll(\\s+, ).trim(); queries.put(e.getName(),resolved); queries.put(e.getName()+_raw,query); } } } return queries; } {code} I'm attaching a patch that I would appreciate someone have a look for consideration. It's fully tested -- please let me know if there is something else I need to do/test. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6269) Change rollback to error in DIH
[ https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074347#comment-14074347 ] Noble Paul commented on SOLR-6269: -- I just went through the patch It's fine. Change rollback to error in DIH --- Key: SOLR-6269 URL: https://issues.apache.org/jira/browse/SOLR-6269 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.9 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0, 4.10 Attachments: SOLR-6269.patch Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud mode, let's rename most things rollback to error, such as the new onRollback handler. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3622) DIH should not do rollbacks.
[ https://issues.apache.org/jira/browse/SOLR-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher resolved SOLR-3622. Resolution: Fixed DIH should not do rollbacks. Key: SOLR-3622 URL: https://issues.apache.org/jira/browse/SOLR-3622 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Reporter: Mark Miller Assignee: Erik Hatcher Fix For: 5.0, 4.10 This is not playing nice. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-6163: special chars and ManagedSyno...
GitHub user timoschmidt opened a pull request: https://github.com/apache/lucene-solr/pull/73 SOLR-6163: special chars and ManagedSynonymFilterFactory Special characters could not be used for update or deletion because the url was not decoded before the resource was used. You can merge this pull request into a Git repository by running: $ git pull https://github.com/timoschmidt/lucene-solr origin/branch_4x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/73.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #73 commit 0168e160e4a9236b047b2e24909d1f59dfd3eb7b Author: timo.schmidt timo-schm...@gmx.net Date: 2014-07-25T12:44:26Z SOLR-6163: special chars and ManagedSynonymFilterFactory --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6163) special chars and ManagedSynonymFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074359#comment-14074359 ] ASF GitHub Bot commented on SOLR-6163: -- GitHub user timoschmidt opened a pull request: https://github.com/apache/lucene-solr/pull/73 SOLR-6163: special chars and ManagedSynonymFilterFactory Special characters could not be used for update or deletion because the url was not decoded before the resource was used. You can merge this pull request into a Git repository by running: $ git pull https://github.com/timoschmidt/lucene-solr origin/branch_4x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/73.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #73 commit 0168e160e4a9236b047b2e24909d1f59dfd3eb7b Author: timo.schmidt timo-schm...@gmx.net Date: 2014-07-25T12:44:26Z SOLR-6163: special chars and ManagedSynonymFilterFactory special chars and ManagedSynonymFilterFactory - Key: SOLR-6163 URL: https://issues.apache.org/jira/browse/SOLR-6163 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Wim Kumpen Hey, I was playing with the ManagedSynonymFilterFactory to create a synonym list with the API. But I have difficulties when my keys contains special characters (or spaces) to delete them... I added a key ééé that matches with some other words. It's saved in the synonym file as ééé. When I try to delete it, I do: curl -X DELETE http://localhost/solr/mycore/schema/analysis/synonyms/english/ééé; error message: %C3%A9%C3%A9%C3%A9%C2%B5 not found in /schema/analysis/synonyms/english A wild guess from me is that %C3%A9 isn't decoded back to ééé. And that's why he can't find the keyword? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074360#comment-14074360 ] Steve Molloy commented on SOLR-6248: In this case it cannot replace the current MoreLikeThisHandler implementation which can analyze incoming text (as opposed to searching for a matching document in the index) in order to find similar documents in the index. Being able to query by unique field and returning similar documents is already covered by the MoreLikeThisComponent if you use rows=1 to get a single document and its set of similar ones. The use case that forces the MoreLikeThisHandler currently (at least that I know of) is really this on-the-fly analysis of text that is nowhere in the index. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6276) DIH test failures with invalid locale in derby JDBC driver
[ https://issues.apache.org/jira/browse/SOLR-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6276: - Attachment: SOLR-6276.patch DIH test failures with invalid locale in derby JDBC driver -- Key: SOLR-6276 URL: https://issues.apache.org/jira/browse/SOLR-6276 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.8.1 Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Attachments: SOLR-6276.patch http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10879/testReport/junit/org.apache.solr.handler.dataimport/TestJdbcDataSourceConvertType/testConvertType/ We should pass the locale explicitly in the connection url params -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6216) Better faceting for multiple intervals on DV fields
[ https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074391#comment-14074391 ] David Smiley commented on SOLR-6216: [~tomasflobbe] , do you have any scripts for your performance testing? I am impressed with that part specifically as I've got some interval faceting on my horizon to do what is somewhat similar to what you've got here LUCENE-5735 Better faceting for multiple intervals on DV fields --- Key: SOLR-6216 URL: https://issues.apache.org/jira/browse/SOLR-6216 Project: Solr Issue Type: Improvement Reporter: Tomás Fernández Löbbe Assignee: Erick Erickson Fix For: 4.10 Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch There are two ways to have faceting on values ranges in Solr right now: “Range Faceting” and “Query Faceting” (doing range queries). They both end up doing something similar: {code:java} searcher.numDocs(rangeQ , docs) {code} The good thing about this implementation is that it can benefit from caching. The bad thing is that it may be slow with cold caches, and that there will be a query for each of the ranges. A different implementation would be one that works similar to regular field faceting, using doc values and validating ranges for each value of the matching documents. This implementation would sometimes be faster than Range Faceting / Query Faceting, specially on cases where caches are not very effective, like on a high update rate, or where ranges change frequently. Functionally, the result should be exactly the same as the one obtained by doing a facet query for every interval -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5847) Improved implementation of language models in lucene
Hadas Raviv created LUCENE-5847: --- Summary: Improved implementation of language models in lucene Key: LUCENE-5847 URL: https://issues.apache.org/jira/browse/LUCENE-5847 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Hadas Raviv Priority: Minor Fix For: 5.0 Attachments: LUCENE-2507.patch The current implementation of language models in lucene is based on the paper A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval by Zhai and Lafferty ('01). Specifically, LMDiricheltSimilarity and LMJelinikMercerSimilarity use a normalized smoothed score for a matching term in a document, as suggested in the above mentioned paper. However, lucene doesn't assign a score to query terms that do not appear in a matched document. According to the pure LM approach, these terms should be assigned with a collection probability background score. If one uses the Jelinik Mercer smoothing method, the final result list produced by lucene is rank equivalent to the one that would have been created by a full LM implementation. However, this is not the case for Dirichlet smoothing method, because the background score is document dependent. Documents in which not all query terms appear, are missing the document-dependant background score for the missing terms. This component affects the final ranking of documents in the list. Since LM is a baseline method in many works in the IR research field, I attach a patch that implements a full LM in lucene. The basic issue that should be addressed here is assigning a document with a score that depends on *all* the query terms, collection statistics and the document length. The general idea of what I did is adding a new getBackGroundScore(int docID) method to similarity, scorer and bulkScorer. Than, when a collector assigns a score to a document (score = scorer.score()) I added the backgound score (score=scorer.score()+scorer.background(doc)) that is assigned by the similarity class used for ranking. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5847) Improved implementation of language models in lucene
[ https://issues.apache.org/jira/browse/LUCENE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hadas Raviv updated LUCENE-5847: Attachment: LUCENE-2507.patch Improved implementation of language models in lucene - Key: LUCENE-5847 URL: https://issues.apache.org/jira/browse/LUCENE-5847 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Hadas Raviv Priority: Minor Fix For: 5.0 Attachments: LUCENE-2507.patch The current implementation of language models in lucene is based on the paper A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval by Zhai and Lafferty ('01). Specifically, LMDiricheltSimilarity and LMJelinikMercerSimilarity use a normalized smoothed score for a matching term in a document, as suggested in the above mentioned paper. However, lucene doesn't assign a score to query terms that do not appear in a matched document. According to the pure LM approach, these terms should be assigned with a collection probability background score. If one uses the Jelinik Mercer smoothing method, the final result list produced by lucene is rank equivalent to the one that would have been created by a full LM implementation. However, this is not the case for Dirichlet smoothing method, because the background score is document dependent. Documents in which not all query terms appear, are missing the document-dependant background score for the missing terms. This component affects the final ranking of documents in the list. Since LM is a baseline method in many works in the IR research field, I attach a patch that implements a full LM in lucene. The basic issue that should be addressed here is assigning a document with a score that depends on *all* the query terms, collection statistics and the document length. The general idea of what I did is adding a new getBackGroundScore(int docID) method to similarity, scorer and bulkScorer. Than, when a collector assigns a score to a document (score = scorer.score()) I added the backgound score (score=scorer.score()+scorer.background(doc)) that is assigned by the similarity class used for ranking. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27188 - Failure!
I don't like this failure ... I'll dig. Mike McCandless http://blog.mikemccandless.com On Fri, Jul 25, 2014 at 6:02 AM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27188/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterThreadsToSegments.testManyThreadsClose Error Message: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258) Caused by: java.lang.NullPointerException at org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300) at org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253) Build Log: [...truncated 1608 lines...] [junit4] Suite: org.apache.lucene.index.TestIndexWriterThreadsToSegments [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterThreadsToSegments -Dtests.method=testManyThreadsClose -Dtests.seed=23CFC8083A310D8C -Dtests.slow=true -Dtests.locale=fr_FR -Dtests.timezone=Pacific/Midway -Dtests.file.encoding=UTF-8 [junit4] ERROR 2.55s J4 | TestIndexWriterThreadsToSegments.testManyThreadsClose [junit4] Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=222, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] [junit4]at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C:11760FEA478C456A]:0) [junit4] Caused by: java.lang.RuntimeException: java.lang.NullPointerException [junit4]at __randomizedtesting.SeedInfo.seed([23CFC8083A310D8C]:0) [junit4]at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:258) [junit4] Caused by: java.lang.NullPointerException [junit4]at org.apache.lucene.index.DocumentsWriterPerThreadPool.release(DocumentsWriterPerThreadPool.java:300) [junit4]at org.apache.lucene.index.DocumentsWriterFlushControl.obtainAndLock(DocumentsWriterFlushControl.java:473) [junit4]at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:437) [junit4]at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) [junit4]at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222) [junit4]at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:149) [junit4]at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:110) [junit4]at org.apache.lucene.index.TestIndexWriterThreadsToSegments$3.run(TestIndexWriterThreadsToSegments.java:253) [junit4] 2 NOTE: test params are: codec=Lucene49: {field=PostingsFormat(name=SimpleText)}, docValues:{}, sim=DefaultSimilarity, locale=fr_FR, timezone=Pacific/Midway [junit4] 2 NOTE: Linux 3.2.0-26-generic amd64/Oracle Corporation 1.7.0_55 (64-bit)/cpus=8,threads=1,free=57393592,total=242221056 [junit4] 2 NOTE: All tests run in this JVM: [TestScorerPerf, TestDeterminism, TestDemo, TestAutomatonQuery, TestLookaheadTokenFilter, TestTermVectorsFormat, TestBinaryDocValuesUpdates, TestSumDocFreq, Test2BNumericDocValues, Nested, Nested2, TestDateSort, TestPrefixFilter, ThrowInUncaught, TestCrashCausesCorruptIndex, TestDocValuesWithThreads, TestLucene3xStoredFieldsFormat, InBeforeClass, InAfterClass, InTestMethod, NonStringProperties, TestReaderClosed, TestBitVector, TestLucene42NormsFormat, TestOmitNorms, TestQueryRescorer, TestSimpleAttributeImpl, TestDocCount, TestDocIdBitSet, TestOpenBitSet, TestBasics, TestNorms,
[jira] [Created] (SOLR-6280) Collapse QParser should give error for multiValued field
David Smiley created SOLR-6280: -- Summary: Collapse QParser should give error for multiValued field Key: SOLR-6280 URL: https://issues.apache.org/jira/browse/SOLR-6280 Project: Solr Issue Type: Improvement Components: search Reporter: David Smiley Priority: Minor The Collapse QParser does give results if you collapse on a multi-valued field, but the document-value is somewhat arbitrarily chosen based on the internals of the FieldCache (FieldCacheImpl.SortedDocValuesCache). Note that the Grouping functionality accesses values via FieldType.getValueSource which is a layer of abstraction above that includes a multiValued error check. Collapse should throw an error here. p.s. easy to test with exampledocs collapsing on cat -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6281) PostingsSolrHighlighter should be more configurable
David Smiley created SOLR-6281: -- Summary: PostingsSolrHighlighter should be more configurable Key: SOLR-6281 URL: https://issues.apache.org/jira/browse/SOLR-6281 Project: Solr Issue Type: Improvement Components: highlighter Reporter: David Smiley Assignee: David Smiley Priority: Minor The DefaultSolrHighlighter (works on non-FVH and FVH modes) and PostingsSolrHighlighter are quite different, although they do share some highlighting parameters where it's directly applicable. DSH has its fragListBuilder, fragmentsBuilder, boundaryScanner, configurable by letting you defined your own class in solrconfig.xml. PSH does not; it uses the Lucene default implementations of DefaultPassageFormatter, PassageScorer, and Java BreakIterator, though it configures each of them based on options. I have a case where I need to provide a custom PassageFormatter, for example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #663: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/663/ 2 tests failed. FAILED: org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup Error Message: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestReplicationHandlerBackup: 1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestReplicationHandlerBackup: 1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) at __randomizedtesting.SeedInfo.seed([9D01DF06CFE991E2]:0) FAILED: org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated: 1) Thread[id=47558, name=Thread-9695, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at
[jira] [Commented] (SOLR-6281) PostingsSolrHighlighter should be more configurable
[ https://issues.apache.org/jira/browse/SOLR-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074478#comment-14074478 ] David Smiley commented on SOLR-6281: A simple good-enough solution is to make the inner class extending PostingsHighlighter delegate out to protected methods on PSH for getting initializing the PassageFormatter and a few other things. Then I could extend PostingsSolrHighlighter to override the method. Following from that step, the components could have their classes declared in solrconfig.xml. But that would probably mean a new SolrPassageFormatter class. I'm not sure if I want to bother going that far down the configurability road right now. As an aside... it's a shame that these different highlighters don't share more abstractions and terminology that seem orthogonal. That is, a Passage is conceptually the same as Fragment. I appreciate that different highlighters work differently and thus have somewhat different data associated with them. PostingsSolrHighlighter should be more configurable --- Key: SOLR-6281 URL: https://issues.apache.org/jira/browse/SOLR-6281 Project: Solr Issue Type: Improvement Components: highlighter Reporter: David Smiley Assignee: David Smiley Priority: Minor The DefaultSolrHighlighter (works on non-FVH and FVH modes) and PostingsSolrHighlighter are quite different, although they do share some highlighting parameters where it's directly applicable. DSH has its fragListBuilder, fragmentsBuilder, boundaryScanner, configurable by letting you defined your own class in solrconfig.xml. PSH does not; it uses the Lucene default implementations of DefaultPassageFormatter, PassageScorer, and Java BreakIterator, though it configures each of them based on options. I have a case where I need to provide a custom PassageFormatter, for example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6282) ArrayIndexOutOfBoundsException during search
Jason Emeric created SOLR-6282: -- Summary: ArrayIndexOutOfBoundsException during search Key: SOLR-6282 URL: https://issues.apache.org/jira/browse/SOLR-6282 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.8 Reporter: Jason Emeric Priority: Critical When executing a search with the following query strings a ERROR org.apache.solr.servlet.SolrDispatchFilter â null:java.lang.ArrayIndexOutOfBoundsException error is thrown and no stack trace is provided. This is happening on searches that seem to have no similar pattern to them (special characters, length, spaces, etc.) q=((work_title_search:(%22+zoe%22%20)%20OR%20work_title_search:%22+zoe%22^100)%20AND%20(performer_name_search:(+big~0.75%20+b%27z%20%20)^7%20OR%20performer_name_search:%22+big%20+b%27z%20%20%22^30)) q=((work_title_search:(%22+rtb%22%20)%20OR%20work_title_search:%22+rtb%22^100)%20AND%20(performer_name_search:(+fly~0.75%20+street~0.75%20+gang~0.75%20)^7%20OR%20performer_name_search:%22+fly%20+street%20+gang%20%22^30)) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074516#comment-14074516 ] David Smiley commented on LUCENE-5740: -- Due to tag balancing of embedded elements, it doesn't look so simple after all. The current implementation only strips SCRIPT STYLE tags which have special rules but conveniently have no child elements. There's no need to balance embedded elements because there aren't any. But to do this more generally, if you strip foo, you'd want to ensure that it strips foobarfoohi/foo/bar/foo correctly. Admittedly, the particular application I'm working on strips link text content (a) and I'm not expecting embedded tags of the same type... but nonetheless it seems wrong to have this limitation. If it did track the state, it would just need an integer depth counter (tagDepthWithinStrippedTag) that would be incremented for each opening element and decremented for each closing element within the current tag being stripped. Not bad really. What do you think [~steve_rowe]? Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074514#comment-14074514 ] Anshum Gupta commented on SOLR-6248: My bad, this was my mistake. The last time I'd looked at this patch was about 10 months ago. This works like a component but also lets you paginate and do other stuff with it. Let me check out if accepting text would make sense here (or if we could have something on similar lines). MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6216) Better faceting for multiple intervals on DV fields
[ https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-6216: Attachment: FacetTester.java [~dsmiley] I used the attached Java class for running the queries. As I said, the dataset was geonames, added 4 times (with different IDs) so that the index had 33M docs total. The queries are all boolean queries with two OR terms, generated by taking terms from the “name” field of the dataset. An example: {noformat} name:cemetery name:lake name:el name:historical name:church name:el name:dam name:la name:al name:church name:al name:creek name:baptist name:la name:la name:mount name:creek name:de name:center name:park name:church name:creek ... {noformat} Eyeballing the logs, most of those queries matched a high number of docs from the index. In addition, I had a bash script running to add documents every one second: {noformat} #!/bin/bash IFS='\n' while read q do echo $q tmp.doc curl -v 'http://localhost:8983/solr/geonames/update?stream.file=/absolute/path/to/tmp.docstream.contentType=text/csv;charset=utf-8separator=%09encapsulator=%0Eheader=falsefieldnames=id,name,,alternatenames,latitude,longitude,feature_class,feature_code,country_code,cc2,admin1_code,admin2_code,admin3_code,admin4_code,population,elevation,dem,timezone,modification_datef.alternatenames.split=truef.alternatenames.separator=,f.alternatenames.encapsulator=%0Ef.cc2.split=trueamp;f.cc2.separator=,f.cc2.encapsulator=%0E' sleep 1 done allCountries.txt {noformat} Unfortunately, It looks like I deleted the schema file I used, however there was nothing crazy about it, population is an int field with docValues=true. autoSoftCommit is configured to run every second. For the second test, I can’t upload the code because it’s full of customer specific data, but the test is very similar. I took some production queries, which had “intervals” in 6 fields, around 40 “intervals” total (originally using facet queries for each of them). For that test I used a similar bash script to upload data every second too. I have been testing this code in an environment mirroring production for around 2/3 weeks now and QTimes have improve dramatically (on a multi-shard collection). I haven’t seen errors related to this. Better faceting for multiple intervals on DV fields --- Key: SOLR-6216 URL: https://issues.apache.org/jira/browse/SOLR-6216 Project: Solr Issue Type: Improvement Reporter: Tomás Fernández Löbbe Assignee: Erick Erickson Fix For: 4.10 Attachments: FacetTester.java, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch There are two ways to have faceting on values ranges in Solr right now: “Range Faceting” and “Query Faceting” (doing range queries). They both end up doing something similar: {code:java} searcher.numDocs(rangeQ , docs) {code} The good thing about this implementation is that it can benefit from caching. The bad thing is that it may be slow with cold caches, and that there will be a query for each of the ranges. A different implementation would be one that works similar to regular field faceting, using doc values and validating ranges for each value of the matching documents. This implementation would sometimes be faster than Range Faceting / Query Faceting, specially on cases where caches are not very effective, like on a high update rate, or where ranges change frequently. Functionally, the result should be exactly the same as the one obtained by doing a facet query for every interval -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074546#comment-14074546 ] Shai Erera commented on LUCENE-5843: {quote} I added public static final int IndexWriter.MAX_DOCS with the limit set to ArrayUtil.MAX_ARRAY_LENGTH. {quote} But MAX_ARRAY_LENGTH is dynamic, and depends on the JRE (32/64-bit). So that's not fixed across JVMs, right? IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074551#comment-14074551 ] Robert Muir commented on LUCENE-5843: - Can it be INT_MAX-8 for this reason? IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074554#comment-14074554 ] Shai Erera commented on LUCENE-5843: Yes, I think we shouldn't try to be too smart here. It can even be MAX_INT-1024 for all practical purposes (and if we want to be on the safe side w/ int[] allocations), as I doubt anyone will complain he cannot put MAX_INT-1023 docs in an index... IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_20-ea-b23) - Build # 10771 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10771/ Java: 64bit/jdk1.8.0_20-ea-b23 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 2 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleStreamingBinaryTest.testChildDoctransformer Error Message: Expected mime type application/octet-stream but got text/html. html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 500 Server Error/title /head body h2HTTP ERROR: 500/h2 pProblem accessing /solr/collection1/select. Reason: preServer Error/pre/p hr /ismallPowered by Jetty:///small/i /body /html Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got text/html. html head meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ titleError 500 Server Error/title /head body h2HTTP ERROR: 500/h2 pProblem accessing /solr/collection1/select. Reason: preServer Error/pre/p hr /ismallPowered by Jetty:///small/i /body /html at __randomizedtesting.SeedInfo.seed([2E3620E90BB99307:5DEC3F7387A1E401]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:513) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:281) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.solr.client.solrj.SolrExampleTests.testChildDoctransformer(SolrExampleTests.java:1373) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074572#comment-14074572 ] Michael McCandless commented on LUCENE-5843: Sorry, my description was stale: in the patch I settled on MAX_INT - 128 as a defensive attempt to be hopefully well below the min value for the MAX_ARRAY_LENGTH over normal JVMs ... IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074575#comment-14074575 ] Shai Erera commented on LUCENE-5843: oh good, I didn't read the patch before, but I see you even explain why we don't use the constant from ArrayUtil! +1 IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-5740: Assignee: David Smiley Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074632#comment-14074632 ] Hoss Man commented on SOLR-5776: One of the comments steve made when opening SOLR-6254... {quote} I found some info about /dev/random problems on FreeBSD here: https://wiki.freebsd.org/201308DevSummit/Security/DevRandom, which lead me to /etc/rc.d/iinitrandom, which gets around the limited entropy by cat'ing a bunch of shit to /dev/random: ... I think we should try the same strategy in a crontab every X minutes, to see if that addresses the test failures. {quote} miller's response to that specific suggestion... bq. I think it's fine as a short term workaround, but not a great solution. We probably should just disable SSL unless we can address it in a portable way. Here's my straw man counter proposal: * update the solr tests so that: ** SSL randomization only happens if a tests.randomssl sys prop is set - default is false *** NOTE: would mean updates to the reproduce with line formatting *** should be updated in test-help as well *** could be used in lucene/replicator module as well -- it already has a tests.jettySSL (doh! ... not included in the reproduce line!) ** sanity check that we have at least some basic coverage of Solr w/SSL that is *not* randomized (ie: SSLMigrationTest and at least one new test that _always_ uses SSL to bring up a few nodes, index a few docs, do a query, and shuts down) ** remove most of the \@SuppressSSL annotations currently in place (should only be used for tests that truly *needs* to supress SSL because of the nature of the test: ie explicitly veryfing something about non-ssl mode) * update the jenkins boxes to: ** have cron like steve suggests ** set tests.randomssl to true when running builds The end result, if everything works properly should be: * no matter who runs the tests, some basic sanity checking of SSL is done * on our jenkins builds, we do extensive randomized testing of SSL with all the cloud (and lucene/replicator) functionality * users who have enough entropy on their system can run {{-Dtests.randomssl=true}} if they choose. Obviously though, before putting any work into the tests framework to support something like tests.randomssl as a first class sysprop, the first baby step to see if this plan is even viable would be the cron steve mentioned to create lots of entropy -- if that doesn't work, then the whole plan is moot. Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 Attachments: SOLR-5776.patch, SOLR-5776.patch We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 27214 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/27214/ All tests passed Build Log: [...truncated 653 lines...] [junit4] JVM J1: stdout was not empty, see: /var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/temp/junit4-J1-20140725_193232_926.sysout [junit4] JVM J1: stdout (verbatim) [junit4] # [junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4] # [junit4] # Internal Error (ciMethodData.cpp:142), pid=28848, tid=139935109641984 [junit4] # Error: ShouldNotReachHere() [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 1.7.0_55-b13) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode linux-amd64 compressed oops) [junit4] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again [junit4] # [junit4] # An error report file with more information is saved as: [junit4] # /var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/J1/hs_err_pid28848.log [junit4] # [junit4] # If you would like to submit a bug report, please visit: [junit4] # http://bugreport.sun.com/bugreport/crash.jsp [junit4] # [junit4] JVM J1: EOF [...truncated 955 lines...] [junit4] ERROR: JVM J1 ended with an exception, command line: /var/lib/jenkins/tools/hudson.model.JDK/Java_7_64bit_u55/jre/bin/java -Dtests.prefix=tests -Dtests.seed=21E34C94184201BA -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.10 -Dtests.cleanthreads=perMethod -Djava.util.logging.config.file=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/core/test/temp -Dclover.db.dir=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/var/lib/jenkins/workspace/Lucene-4x-Linux-Java7-64-test-only/checkout/lucene/tools/junit4/tests.policy -Dlucene.version=4.10-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -Dtests.leaveTemporary=false -Dtests.filterstacks=true -classpath
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074644#comment-14074644 ] Steve Rowe commented on LUCENE-5740: bq. If it did track the state, it would just need an integer depth counter (tagDepthWithinStrippedTag) that would be incremented for each opening element and decremented for each closing element within the current tag being stripped. Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an error, and maximize useful extracted content). Assuming you'll see closing tags could be a problem here; some HTML doesn't have these in some cases. It might be better to just track nested tags of the same type as the current tag being stripped, rather than all tags - the other contained tags should be ignorable, I think. (This condition - nested same-type tags - should be fairly rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.) The other thing to worry about is the possible lack of closing tags for a tag the contents of which are to be stripped. I'm not sure how to handle this - maybe look at how other HTML parsers do it? (I.e., how to limit scope of never-closed tags.) Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074644#comment-14074644 ] Steve Rowe edited comment on LUCENE-5740 at 7/25/14 5:39 PM: - bq. If it did track the state, it would just need an integer depth counter (tagDepthWithinStrippedTag) that would be incremented for each opening element and decremented for each closing element within the current tag being stripped. Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an error, and maximize useful extracted content) non-well-formed content. Assuming you'll see closing tags could be a problem here; some HTML doesn't have these in some cases. It might be better to just track nested tags of the same type as the current tag being stripped, rather than all tags - the other contained tags should be ignorable, I think. (This condition - nested same-type tags - should be fairly rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.) The other thing to worry about is the possible lack of closing tags for a tag the contents of which are to be stripped. I'm not sure how to handle this - maybe look at how other HTML parsers do it? (I.e., how to limit scope of never-closed tags.) was (Author: steve_rowe): bq. If it did track the state, it would just need an integer depth counter (tagDepthWithinStrippedTag) that would be incremented for each opening element and decremented for each closing element within the current tag being stripped. Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an error, and maximize useful extracted content). Assuming you'll see closing tags could be a problem here; some HTML doesn't have these in some cases. It might be better to just track nested tags of the same type as the current tag being stripped, rather than all tags - the other contained tags should be ignorable, I think. (This condition - nested same-type tags - should be fairly rare, but will need to be handled, e.g. ulliulli/li/ul/li/ul.) The other thing to worry about is the possible lack of closing tags for a tag the contents of which are to be stripped. I'm not sure how to handle this - maybe look at how other HTML parsers do it? (I.e., how to limit scope of never-closed tags.) Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074648#comment-14074648 ] Steve Rowe commented on LUCENE-5740: David, the other thing I worry about is: a fully fledged version of this would allow finer-grained specification of tags, along the lines of XPath, but that would be a much much bigger task... I don't think such a goal should hold up what you're thinking about. Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074743#comment-14074743 ] David Smiley commented on LUCENE-5740: -- bq. David, the other thing I worry about is: a fully fledged version of this would allow finer-grained specification of tags, along the lines of XPath, but that would be a much much bigger task... I don't think such a goal should hold up what you're thinking about. Yeah I thought of that and agree it's not worth worrying about right now. My initial use of this will strip generated HTML that is already fairly clean, and will strip these tags purely by element name. I have no need/plans for more complicated matching. bq. Don't forget that HTMLStripCharFilter must be able to handle (i.e. not throw an error, and maximize useful extracted content) non-well-formed content. Assuming you'll see closing tags could be a problem here; some HTML doesn't have these in some cases. If a tag that is to be stripped opens, then I propose the next close tag at the same level (whatever it's name may be) is where the strip ends: {noformat}body bodyStart p paraStart foo bbold/b paraEnd /p bodyEnd/body{noformat} Notice there is no {{/foo}}. Stripping tag foo would yield only the text tokens bodyStart, paraStart, and bodyEnd. I think it's not realistic to expect better than that, not to mention that this issue is optional and would come with disclaimers on this matter. bq. It might be better to just track nested tags of the same type as the current tag being stripped, rather than all tags I don't think that adds any value (at least I don't see it yet), and it hurts the bad-html case like the foo example above. In that same example, only same-name tags would mean that bodyEnd would not get emitted. Right? Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074752#comment-14074752 ] David Smiley commented on LUCENE-5740: -- I think I may get your concern. If the tag that I was stripping was paragraph p instead of foo, then the paragraph stripping would continue on to /body. So it may appear that the stripping should end at the *sooner* of a closing tag depth, or a *matching* close of the current element name. A *matching* close means I need to keep track of two embedded tag depth integers, one for any element name, one for those that have the same name as what I'm stripping. Yeah? Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074764#comment-14074764 ] Steve Rowe commented on LUCENE-5740: Real HTML is more complicated: e.g. p within p is not allowed (or rather is parsed as sibling non-closed elements). Some pertinent discussion here in the javadocs of the Jericho HTML parser: http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/Element.html. In particular, the Single Tag Element and Implicitly Terminated Element sections, and the link in the latter section in the sentence See the element parsing rules for HTML elements with optional end tags for details on which tags can implicitly terminate a given element. Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk7) - Build # 10772 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10772/ Java: 64bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings Error Message: some thread(s) failed Stack Trace: java.lang.RuntimeException: some thread(s) failed at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:535) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:946) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:853) Build Log: [...truncated 5726 lines...] [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains [junit4] 2 TEST FAIL: useCharFilter=true text='ctnyf \ud8e6\ude46\udbfb\ude9d\ud9b1\udcbf cynpqojvoxa \u0d61\u0d58\u0d73\u0d6e\u0d31\u0d37\u0d31\u0d54 cprsu wvzjus' [junit4] 2 TEST FAIL: useCharFilter=true text='uukvyuql z{1,5}hzk \u077f\u075c\u075d\u0774'
[jira] [Commented] (LUCENE-5740) Add stripContentOfTags option to HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074785#comment-14074785 ] David Smiley commented on LUCENE-5740: -- Yeah... to do this feature right, it needs to know about these cases. Then perhaps the stripping part might be easier if it has an accurate depth, as it wouldn't include implicitly closing elements (e.g. IMG). But in the end this feature is optional so a best effort attempt with rules about known HTML tags is fine. I agree this means looking for a close of the same element name. I'll try working on something this weekend, or Monday. Add stripContentOfTags option to HTMLStripCharFilter Key: LUCENE-5740 URL: https://issues.apache.org/jira/browse/LUCENE-5740 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: David Smiley Assignee: David Smiley HTMLStripCharFilter should have an option to strip out the sub-content of certain elements. It already does this for SCRIPT STYLE but it should be configurable to add more. I don't want certain elements to have their contents to be searchable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6163) special chars and ManagedSynonymFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074952#comment-14074952 ] Hoss Man commented on SOLR-6163: A quick glance at timo's patch and the javadocs for the assocaited restlet classes seems to suggest that this is correct general course of action... http://restlet.com/learn/javadocs/2.1/jse/api/org/restlet/data/Reference.html#getPath%28%29 Note that no URI decoding is done by this method. A cleaner fix is probably to use this alternative restlet method that an decode for us ... http://restlet.com/learn/javadocs/2.1/jse/api/org/restlet/data/Reference.html#getPath%28boolean%29 There are lots of similar Note that no URI decoding is done by this method. and Returns the optionnally decoded __ combinations in the Request class -- we should probably audit all of our usages of this class. special chars and ManagedSynonymFilterFactory - Key: SOLR-6163 URL: https://issues.apache.org/jira/browse/SOLR-6163 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Wim Kumpen Hey, I was playing with the ManagedSynonymFilterFactory to create a synonym list with the API. But I have difficulties when my keys contains special characters (or spaces) to delete them... I added a key ééé that matches with some other words. It's saved in the synonym file as ééé. When I try to delete it, I do: curl -X DELETE http://localhost/solr/mycore/schema/analysis/synonyms/english/ééé; error message: %C3%A9%C3%A9%C3%A9%C2%B5 not found in /schema/analysis/synonyms/english A wild guess from me is that %C3%A9 isn't decoded back to ééé. And that's why he can't find the keyword? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5848) StopwordAnalyzerBase hides Analyzer's constructor with ReuseStrategy
Claude Quézel created LUCENE-5848: - Summary: StopwordAnalyzerBase hides Analyzer's constructor with ReuseStrategy Key: LUCENE-5848 URL: https://issues.apache.org/jira/browse/LUCENE-5848 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 4.9 Reporter: Claude Quézel Priority: Minor StopwordAnalyzerBase hides Analyzer's constructor with ReuseStrategy. It is not possible to set a ReuseStrategy when extending from StopwordAnalyzerBase. Fix is trivial to add two extra constructors to StopwordAnalyzerBase. Same is true for all classes extending from StopwordAnalyzerBase. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 584 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/584/ 1 tests failed. FAILED: org.apache.solr.cloud.OverseerTest.testOverseerFailure Error Message: Could not register as the leader because creating the ephemeral registration node in ZooKeeper failed Stack Trace: org.apache.solr.common.SolrException: Could not register as the leader because creating the ephemeral registration node in ZooKeeper failed at __randomizedtesting.SeedInfo.seed([3C225866AD0181FE:382AD795BFA46EDF]:0) at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:144) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:155) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:314) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:221) at org.apache.solr.cloud.OverseerTest$MockZKController.publishState(OverseerTest.java:155) at org.apache.solr.cloud.OverseerTest.testOverseerFailure(OverseerTest.java:660) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5843: --- Attachment: LUCENE-5843.patch New patch, I think it's ready: I resolved the nocommits, and removed Test2BDocs (I think its tests are folded into TestIndexWriterMaxDocs). IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch, LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs
[ https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075135#comment-14075135 ] Hoss Man commented on LUCENE-5843: -- Mike: couple quick suggestions... * {code} private static int actualMaxDocs = MAX_DOCS; static void setMaxDocs(int docs) { if (MAX_DOCS docs) throw UglyException actualMaxDocs = docs; } {code}...that way some poor bastard who sees it in the code and tries to be crafty and add a stub class to that package to set it to Integer.MAX_VALUE will get an immedaite error instead of a timebomb. * add a *public* method to the test-framework that wraps this package protected setter, so that _tests_ in other packages besides {{org.apache.lucene.index}} can mutate this. ** then we can add tests for clean behavior in solr as well (not to mention anybody else who writes a Lucene app and wants to test for how their app behaves when the index gets too big w/o adding an {{org/apache/lucene/index}} dir to their test source IndexWriter should refuse to create an index with more than INT_MAX docs Key: LUCENE-5843 URL: https://issues.apache.org/jira/browse/LUCENE-5843 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.10 Attachments: LUCENE-5843.patch, LUCENE-5843.patch It's more and more common for users these days to create very large indices, e.g. indexing lines from log files, or packets on a network, etc., and it's not hard to accidentally exceed the maximum number of documents in one index. I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that value as a sentinel during searching. I'm not sure what IW does today if you create a too-big index but it's probably horrible; it may succeed and then at search time you hit nasty exceptions when we overflow int. I think it should throw an IndexFullException instead. It'd be nice if we could do this on the very doc that when added would go over the limit, but I would also settle for just throwing at flush as well ... i.e. I think what's really important is that the index does not become unusable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2894: --- Attachment: SOLR-2894.patch Making good progress (only ~1600 lines of diff left to review!) updates in this patch... * PivotFacetFieldValueCollection ** some javadocs ** refactor away method: nonNullValueIterator() *** only called in one place * PivotFacetField ** some javaddocs ** made createFromListOfNamedLists smart enough to return null on null input *** simplified PivotFacetValue.createFromNamedList ** made contributeFromShard smart enough to be a no-op on null input *** simplified all callers (PivotFacet PivotFacetValue) ** made some vars final where possible via refactoring constructor createFromListOfNamedLists ** refactor skipRefinementAtThisLevel out of the method an up to an instance var since it never changes once the facet params are set in the constructor ** consolidate skipRefinementAtThisLevel + hasBeenRefined into a single var: needRefinementAtThisLevel ** simplify BitSet iteration (nextSetBit is always length) *** processDefiniteCandidateElement *** processPossibleCandidateElement * PivotFacetValue ** some javadocs ** made variables private and added method accessors (w/jdocs) as needed *** updated other classes as needed to call these new methods instead of the old pub vars ** made some vars final where possible via refactoring createFromNamedList constructor * PivotFacet ** some javadocs ** added getQueuedRefinements(int) ** made some variables final where possible ** renamed noRefinementsRequired - isRefinementsRequired ** eliminate unused method: areAnyRefinementsQueued * FacetComponent ** switched direct use of PivotFacet.queuedRefinements to use PivotFacet.getQueuedRefinements *** simplified error checking in several places One new question i want to go back and revisit later... * do we really need to track knownShards in PivotFacet ? ** ResponseBuilder already maintains a String[] of all shards, getShardNum derived from it ** can't we just loop from 0 to shards.length? does it ever matter if a shard hasn't participated? ** ie: is it really important that we skip any unset bits in knownShards when looping? (all the current usages seem safe even if a shard has no data for the current pivot) Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Assignee: Hoss Man Fix For: 4.9, 5.0 Attachments: SOLR-2894-mincount-minification.patch, SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, dateToObject.patch, pivot_mincount_problem.sh Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6283) Add support for Interval Faceting in SolrJ
Tomás Fernández Löbbe created SOLR-6283: --- Summary: Add support for Interval Faceting in SolrJ Key: SOLR-6283 URL: https://issues.apache.org/jira/browse/SOLR-6283 Project: Solr Issue Type: Improvement Reporter: Tomás Fernández Löbbe Interval Faceting was added in SOLR-6216. Add support for it in SolrJ -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets
[ https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5244: - Description: This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field Collapsing/Grouping* A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse. *High Cardinality Distributed Aggregation* A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation. *Large Scale Time Series Rollups* A client outside Solr makes individual calls to all servers in a collection and sorts on time dimensions. The client merge joins the sorted result sets and rolls up the time dimensions as it iterates through the data. In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish. was: This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field Collapsing/Grouping* A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse. *High Cardinality Distributed Aggregation* A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation. In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish. *Large Scale Time Series Rollups* A client outside Solr makes individual calls to all servers in a collection and sorts on time dimensions. The client merge joins the sorted result sets and rolls up the time dimensions as it iterates through the data. Exporting Full Sorted Result Sets - Key: SOLR-5244 URL: https://issues.apache.org/jira/browse/SOLR-5244 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 5.0, 4.10 Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field
[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets
[ https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5244: - Description: This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field Collapsing/Grouping* A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse. *High Cardinality Distributed Aggregation* A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation. In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish. *Large Scale Time Series Rollups* A client outside Solr makes individual calls to all servers in a collection and sorts on time dimensions. The client merge joins the sorted result sets and rolls up the time dimensions as it iterates through the data. was: This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field Collapsing/Grouping* A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse. *High Cardinality Distributed Aggregation* A client outside of Solr makes individual calls to all the servers in a single collection and sorts on a high cardinality field. The client then merge joins the sorted lists to perform the high cardinality aggregation. In these scenarios Solr is being used as a distributed sorting engine. Developers can write clients that take advantage of this sorting capability in any way they wish. Exporting Full Sorted Result Sets - Key: SOLR-5244 URL: https://issues.apache.org/jira/browse/SOLR-5244 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 5.0, 4.10 Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch This ticket allows Solr to export full sorted result sets. The proposed syntax is: {code} q=*:*rows=-1wt=xsortfl=a,b,csort=a desc,b desc {code} Under the covers, the rows=-1 parameter will signal Solr to use the ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the results. The SortingResponseWriter will sort the results based on the sort criteria and stream the results out. This capability will open up Solr for a whole range of uses that were typically done using aggregation engines like Hadoop. For example: *Large Distributed Joins* A client outside of Solr calls two different Solr collections and returns the results sorted by a join key. The client iterates through both streams and performs a merge join. *Fully Distributed Field Collapsing/Grouping* A client outside of Solr makes individual calls to all the servers in a single collection and returns results sorted by the collapse key. The client merge joins the sorted lists on the collapse key to perform the field collapse. *High Cardinality