[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.8.0_20) - Build # 4333 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4333/ Java: 32bit/jdk1.8.0_20 -client -XX:+UseSerialGC (asserts: true) 2 tests failed. REGRESSION: org.apache.solr.handler.TestReplicationHandlerBackup.doTestBackup Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([B331C6C78431C249]:0) FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandlerBackup Error Message: Suite timeout exceeded (= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (= 720 msec). at __randomizedtesting.SeedInfo.seed([B331C6C78431C249]:0) Build Log: [...truncated 11625 lines...] [junit4] Suite: org.apache.solr.handler.TestReplicationHandlerBackup [junit4] 2 Creating dataDir: C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\init-core-data-001 [junit4] 2 2210851 T5379 oas.SolrTestCaseJ4.setUp ###Starting doTestBackup [junit4] 2 2210870 T5379 oejs.Server.doStart jetty-8.1.10.v20130312 [junit4] 2 2210878 T5379 oejs.AbstractConnector.doStart Started SelectChannelConnector@127.0.0.1:62247 [junit4] 2 2210879 T5379 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4] 2 2210879 T5379 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) [junit4] 2 2210879 T5379 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001 [junit4] 2 2210879 T5379 oasc.SolrResourceLoader.init new SolrResourceLoader for directory: 'C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\' [junit4] 2 2210908 T5379 oasc.ConfigSolr.fromFile Loading container configuration from C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\solr.xml [junit4] 2 2210926 T5379 oasc.CoreContainer.init New CoreContainer 22745234 [junit4] 2 2210927 T5379 oasc.CoreContainer.load Loading cores into CoreContainer [instanceDir=C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\] [junit4] 2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter Setting socketTimeout to: 9 [junit4] 2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter Setting urlScheme to: [junit4] 2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter Setting connTimeout to: 15000 [junit4] 2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter Setting maxConnectionsPerHost to: 20 [junit4] 2 2210930 T5379 oashc.HttpShardHandlerFactory.getParameter Setting maxConnections to: 1 [junit4] 2 2210931 T5379 oashc.HttpShardHandlerFactory.getParameter Setting corePoolSize to: 0 [junit4] 2 2210931 T5379 oashc.HttpShardHandlerFactory.getParameter Setting maximumPoolSize to: 2147483647 [junit4] 2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter Setting maxThreadIdleTime to: 5 [junit4] 2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter Setting sizeOfQueue to: -1 [junit4] 2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter Setting fairnessPolicy to: false [junit4] 2 2210932 T5379 oasu.UpdateShardHandler.init Creating UpdateShardHandler HTTP client with params: socketTimeout=34connTimeout=45000retry=false [junit4] 2 2210933 T5379 oasl.LogWatcher.createWatcher SLF4J impl is org.slf4j.impl.Log4jLoggerFactory [junit4] 2 2210933 T5379 oasl.LogWatcher.newRegisteredLogWatcher Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] [junit4] 2 2210934 T5379 oasc.CoreContainer.load Host Name: 127.0.0.1 [junit4] 2 2210938 T5391 oasc.SolrResourceLoader.init new SolrResourceLoader for directory: 'C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\collection1\' [junit4] 2 2210971 T5391 oasc.SolrConfig.init Using Lucene MatchVersion: 5.0.0 [junit4] 2 2210983 T5391 oasc.SolrConfig.init Loaded SolrConfig: solrconfig.xml [junit4] 2 2210983 T5391 oass.IndexSchema.readSchema Reading Solr Schema from C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\collection1\conf\schema.xml
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 686 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/686/ 3 tests failed. REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: The Monkey ran for over 20 seconds and no jetties were stopped - this is worth investigating! Stack Trace: java.lang.AssertionError: The Monkey ran for over 20 seconds and no jetties were stopped - this is worth investigating! at __randomizedtesting.SeedInfo.seed([C3742817002DCD1:8DD1CC99075DBCED]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:535) at org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:140) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (SOLR-1387) Add more search options for filtering field facets.
[ https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216072#comment-14216072 ] Tom Winch commented on SOLR-1387: - As the name suggests, CharacterUtils works on a char[] whereas we have a BytesRef (essentially a byte[]). But I think CharacterUtils.toLowerCase() is doing essentially the same as I'm doing in StringHelper.contains() in that it converts using Unicode case mapping information (via Character.toLowerCase(int)). Yes, sadly making ignoreCase more general would spoil the efficiency of facet.prefix so I thought safest to leave as a sub-parameter of facet.contains, which spoils that efficiency already. Add more search options for filtering field facets. --- Key: SOLR-1387 URL: https://issues.apache.org/jira/browse/SOLR-1387 Project: Solr Issue Type: New Feature Components: search Reporter: Anil Khadka Assignee: Alan Woodward Fix For: 4.9, Trunk Attachments: SOLR-1387.patch Currently for filtering the facets, we have to use prefix (which use String.startsWith() in java). We can add some parameters like * facet.iPrefix : this would act like case-insensitive search. (or --- facet.prefix=afacet.caseinsense=on) * facet.regex : this is pure regular expression search (which obviously would be expensive if issued). Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this. All above concepts could be equally applicable to TermsComponent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
On 17/11/14 18:47, Toke Eskildsen wrote: Per Steffensen [st...@liace.dk] wrote: I understand that the request is for rows * #shards IDs+score in total, but if you have presented your alternative, I have failed to see that. I deliberately did not present the solution we did, in order for you guys not to focus on whether or not this particular solution to the problem already has been implemented after 4.4.0 (the version of Apache Solr we currently base our version of Solr on). Guess the problem can be solved in numerous ways, so just wanted you to focus on whether or not it has been solved in some way (do not care which way) Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. Im not sure what you are aiming at with this comment. But I can say that it is very very unlikely that the overall-top-1000 all originate from a single shard. It is likely (since we are not routing on anything that has to do with the content text-field) that the overall-top-1000 i fairly evenly distributed among the 1000 shards I was about to suggest collapsing to 2 or 3 months/shard, but that would be ruining a logistically nice setup. Yes, we are also considering options in that area, but we really would like not to have to go this way There are many additional reasons (besides the ones I mentioned in my previous mail). E.g. we are (maybe) about to introduce a bloom-filter on shard-level, which will help us reduce performance on indexing significantly. Bloom-filter will help quickly say document with this particular id does definitely not exist when doing optimistic locking (including version-lookup). First-iteration tests has shown that it can reduce the resources/time spent on indexing by up to 80%. Bloom-filter data does not merge very well. 5-50 billion records/server? That seems very high, but after hearing about many different Solr setups at Lucene/Solr Revolution, I try to adopt a sounds insane, but it's probably correct-mindset. We are not in the business of ms-response-times of thousands of searches per sec/min. We can accept response-times measured in secs, and there not performed thousands of searches per minute. We are in the business of being able to index enormous amounts of data per second though. But this issue is about searches - we really do not like 10-30-60 min response-times on searches that ought to run much faster. Anyway, setup accepted, problem acknowledged, your possibly re-usable solution not understood. What we did in our solution is the following Introduced the concept of distributed query algorithm controlled by request-param dqa. We are naming the existing (default) query-algorithm (not knowing about SOLR-5768) find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce an new alternative distributed query algorithm called find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias frfilrfbi :-) ) * find-id-relevance_fetch-by-ids does as always ** Find (by query) id and score (score is the measurement for relevance) for the top-X (1000 in my example) documents on each shard ** Sort out the ids of the overall-top-X and group them by shard. ids(S) is the set of ids among the overall-top-X that live on shard S ** For each shard S fetch by ids in ids(S) the full documents (or whatever is pointed out by fl-parameter) * find-relevance_find-ids-limited-rows_fetch-by-ids does it in a different way ** Find (by query) score (score is the measurement for relevance) for the top-X (1000 in my example) documents on each shard ** Sort out how many documents count(S) of the overall-top-X documents that live on each individual shard S ** For each shard S fetch (by query) the ids (ids(S)) for the count(S) most relevant documents ** For each shard S fetch by ids in ids(S) the full documents (or whatever is pointed out by fl-parameter) Since find score only (step 1 of find-relevance_find-ids-limited-rows_fetch-by-ids) actually does not have to go into the store to fetch anything (id not needed), it can be optimized to perform much much better than step 1 in find-id-relevance_fetch-by-ids (id needed). I step 3 of find-relevance_find-ids-limited-rows_fetch-by-ids, when you have to go to store, we are not asking for 1000 docs per shard, but only the number of documents among the overall-top-1000 documents that live on this particular shard. This way we go from potentially visiting the store for 1 mio docs across the cluster, to never visiting the store for more than 1000 docs across the cluster. In our particular test-setup (which simulates our production environment pretty well) it has given us an total response-time reduction of a factor 60 I believe SOLR-5768 (without having looked at it yet) has made the existing distributed query algorithm (what we call find-id-relevance_fetch-by-ids) do the following when sending distrib.singlePass paramter
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113 ] Modassar Ather commented on LUCENE-5205: I am trying following queries and facing an issue for which need your suggestions. The environment is 4 shard cluster with embedded zookeeper on one of them. q=field:(SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and does not return. We have set query timeAllowed to 5 minutes but it seems that it is not reaching here and continues. During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer after it has created token for double quotes and term SEARCH. Whereas the above query without (') gets transformed to following field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = +spanNear([field:search, field:tools, field:solution, field:provider, field:technology, field:co, field:ltd], 0, true) Need your help in understanding if I am not using the query properly or it can be an issue. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the
[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113 ] Modassar Ather edited comment on LUCENE-5205 at 11/18/14 11:59 AM: --- I am trying following queries and facing an issue for which need your suggestions. The environment is 4 shard cluster with embedded zookeeper on one of them. q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and does not return. We have set query timeAllowed to 5 minutes but it seems that it is not reaching here and continues. During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer after it has created token for double quotes and term SEARCH. Whereas the above query without (') gets transformed to following field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = +spanNear([field:search, field:tools, field:solution, field:provider, field:technology, field:co, field:ltd], 0, true) Need your help in understanding if I am not using the query properly or it can be an issue. was (Author: modassar): I am trying following queries and facing an issue for which need your suggestions. The environment is 4 shard cluster with embedded zookeeper on one of them. q=field:(SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and does not return. We have set query timeAllowed to 5 minutes but it seems that it is not reaching here and continues. During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer after it has created token for double quotes and term SEARCH. Whereas the above query without (') gets transformed to following field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = +spanNear([field:search, field:tools, field:solution, field:provider, field:technology, field:co, field:ltd], 0, true) Need your help in understanding if I am not using the query properly or it can be an issue. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and
[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113 ] Modassar Ather edited comment on LUCENE-5205 at 11/18/14 12:01 PM: --- I am trying following queries and facing an issue for which need your suggestions. The environment is 4 shard cluster with embedded zookeeper on one of them. q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and does not return. We have set query timeAllowed to 5 minutes but it seems that it is not reaching here and continues. During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer after it has created token for double quotes and term SEARCH. Whereas the above query without (') gets transformed to following field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = +spanNear([field:search, field:tools, field:solution, field:provider, field:technology, field:co, field:ltd], 0, true) Need your help in understanding if I am not using the query properly or it can be an issue. NOTE: A space between the field: and query is added to avoid transformation to smileys. was (Author: modassar): I am trying following queries and facing an issue for which need your suggestions. The environment is 4 shard cluster with embedded zookeeper on one of them. q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and does not return. We have set query timeAllowed to 5 minutes but it seems that it is not reaching here and continues. During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer after it has created token for double quotes and term SEARCH. Whereas the above query without (') gets transformed to following field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = +spanNear([field:search, field:tools, field:solution, field:provider, field:technology, field:co, field:ltd], 0, true) Need your help in understanding if I am not using the query properly or it can be an issue. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in:
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2214 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2214/ 1 tests failed. FAILED: org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at __randomizedtesting.SeedInfo.seed([81E2D093E929F1C3:402A0DD5484F206A]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:539) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:223) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
[jira] [Created] (SOLR-6754) ZkController.publish doesn't use the updateLastState parameter
Shalin Shekhar Mangar created SOLR-6754: --- Summary: ZkController.publish doesn't use the updateLastState parameter Key: SOLR-6754 URL: https://issues.apache.org/jira/browse/SOLR-6754 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 5.0, Trunk One of ZkController's overloaded publish method has the following: {code} public void publish(final CoreDescriptor cd, final String state, boolean updateLastState) throws KeeperException, InterruptedException { publish(cd, state, true, false); } {code} Regardless of the updateLastState argument, the method calls publish with updateLastState set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6754) ZkController.publish doesn't use the updateLastState parameter
[ https://issues.apache.org/jira/browse/SOLR-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6754: Attachment: SOLR-6754.patch Trivial patch to use the method argument is attached. ZkController.publish doesn't use the updateLastState parameter -- Key: SOLR-6754 URL: https://issues.apache.org/jira/browse/SOLR-6754 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6754.patch One of ZkController's overloaded publish method has the following: {code} public void publish(final CoreDescriptor cd, final String state, boolean updateLastState) throws KeeperException, InterruptedException { publish(cd, state, true, false); } {code} Regardless of the updateLastState argument, the method calls publish with updateLastState set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. In fact if you ask for 1000 hits in a distributed SolrCloud, each shard has to retrieve 1000 hits to get the unique key of each match and send it back to the shard responsible for the merge. This means that even if your data is fairly distributed among the 1000 shards, they still have to decompress 1000 documents during the first phase of the search. There are ways to avoid this, for instance you can check this JIRA where the idea is discussed: https://issues.apache.org/jira/browse/SOLR-5478 Bottom line is that if you have 1000 shards the GET_FIELDS stage should be fast (if your data is fairly distributed) but the GET_TOP_IDS is not. You could avoid a lot of decompression/reads by using the field cache to retrieve the unique key in the first stage. Cheers, Jim From: Per Steffensen st...@liace.dk Sent: Tuesday, November 18, 2014 19:26 To: dev@lucene.apache.org Subject: Re: Slow searching limited but high rows across many shards all with high hits On 17/11/14 18:47, Toke Eskildsen wrote: Per Steffensen [st...@liace.dk] wrote: I understand that the request is for rows * #shards IDs+score in total, but if you have presented your alternative, I have failed to see that. I deliberately did not present the solution we did, in order for you guys not to focus on whether or not this particular solution to the problem already has been implemented after 4.4.0 (the version of Apache Solr we currently base our version of Solr on). Guess the problem can be solved in numerous ways, so just wanted you to focus on whether or not it has been solved in some way (do not care which way) Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. Im not sure what you are aiming at with this comment. But I can say that it is very very unlikely that the overall-top-1000 all originate from a single shard. It is likely (since we are not routing on anything that has to do with the content text-field) that the overall-top-1000 i fairly evenly distributed among the 1000 shards I was about to suggest collapsing to 2 or 3 months/shard, but that would be ruining a logistically nice setup. Yes, we are also considering options in that area, but we really would like not to have to go this way There are many additional reasons (besides the ones I mentioned in my previous mail). E.g. we are (maybe) about to introduce a bloom-filter on shard-level, which will help us reduce performance on indexing significantly. Bloom-filter will help quickly say document with this particular id does definitely not exist when doing optimistic locking (including version-lookup). First-iteration tests has shown that it can reduce the resources/time spent on indexing by up to 80%. Bloom-filter data does not merge very well. 5-50 billion records/server? That seems very high, but after hearing about many different Solr setups at Lucene/Solr Revolution, I try to adopt a sounds insane, but it's probably correct-mindset. We are not in the business of ms-response-times of thousands of searches per sec/min. We can accept response-times measured in secs, and there not performed thousands of searches per minute. We are in the business of being able to index enormous amounts of data per second though. But this issue is about searches - we really do not like 10-30-60 min response-times on searches that ought to run much faster. Anyway, setup accepted, problem acknowledged, your possibly re-usable solution not understood. What we did in our solution is the following Introduced the concept of distributed query algorithm controlled by request-param dqa. We are naming the existing (default) query-algorithm (not knowing about SOLR-5768) find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce an new alternative distributed query algorithm called find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias frfilrfbi :-) ) * find-id-relevance_fetch-by-ids does as always ** Find (by query) id and score (score is the measurement for relevance) for the top-X (1000 in my example) documents on each shard ** Sort out the ids of the overall-top-X and group them by shard. ids(S) is the set of ids among the overall-top-X that live on shard S ** For each shard S fetch by ids in ids(S) the full documents (or whatever is pointed out by fl-parameter) * find-relevance_find-ids-limited-rows_fetch-by-ids does it in a different way ** Find (by query) score (score is the measurement for relevance) for the top-X (1000 in my example) documents on each shard ** Sort out how many documents count(S) of the overall-top-X documents that live on each individual shard S ** For each shard S fetch (by query) the ids (ids(S)) for the count(S) most relevant documents ** For each
[jira] [Updated] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query
[ https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manuel Lenormand updated SOLR-5611: --- Attachment: lec5-distributedIndexing.pdf The equation is on the 10th slide. Need to write an approximation for this or calculating offline for main values and making a 3d map out of it (#shards, rows, confidence level) that outputs shards.rows for each request When documents are uniformly distributed over shards, enable returning approximated results in distributed query Key: SOLR-5611 URL: https://issues.apache.org/jira/browse/SOLR-5611 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Isaac Hebsh Priority: Minor Labels: distributed_search, shard, solrcloud Fix For: 4.9, Trunk Attachments: lec5-distributedIndexing.pdf Query with rows=1000, which sent to a collection of 100 shards (shard key behaviour is default - based on hash of the unique key), will generate 100 requests of rows=1000, on each shard. This results to total number of rows*numShards unique keys to be retrieved. This behaviour is getting worst as numShards grows. If the documents are uniformly distributed over the shards, the expected number of document should be ~ rows/numShards. Obviously, there might be extreme cases, when all of the top X documents are in a specific shard. I suggest adding an optional parameter, say approxResults=true, which decides whether we should limit the rows in the shard requests to rows/numShardsor not. Moreover, we can add a numeric parameter which increases the limit, to be more accurate. For example, the query {{approxResults=trueapproxResults.factor=1.5}} will retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and rows=1000, each shard will return 15 documents. Furthermore, this can reduce the problem of deep paging, because the same thing can be applied there. when requested start=10, Solr creating shard request with start=0 and rows=START+ROWS. In the approximated approach, start parameter (in the shard requests) can be set to 10/numShards. The idea of the approxResults.factor creates some difficulties here, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query
[ https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216167#comment-14216167 ] Manuel Lenormand edited comment on SOLR-5611 at 11/18/14 1:32 PM: -- The equation is on the 20th slide. Need to write an approximation for this or calculating offline for main values and making a 3d map out of it (#shards, rows, confidence level) that outputs shards.rows for each request was (Author: manuel lenormand): The equation is on the 10th slide. Need to write an approximation for this or calculating offline for main values and making a 3d map out of it (#shards, rows, confidence level) that outputs shards.rows for each request When documents are uniformly distributed over shards, enable returning approximated results in distributed query Key: SOLR-5611 URL: https://issues.apache.org/jira/browse/SOLR-5611 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Isaac Hebsh Priority: Minor Labels: distributed_search, shard, solrcloud Fix For: 4.9, Trunk Attachments: lec5-distributedIndexing.pdf Query with rows=1000, which sent to a collection of 100 shards (shard key behaviour is default - based on hash of the unique key), will generate 100 requests of rows=1000, on each shard. This results to total number of rows*numShards unique keys to be retrieved. This behaviour is getting worst as numShards grows. If the documents are uniformly distributed over the shards, the expected number of document should be ~ rows/numShards. Obviously, there might be extreme cases, when all of the top X documents are in a specific shard. I suggest adding an optional parameter, say approxResults=true, which decides whether we should limit the rows in the shard requests to rows/numShardsor not. Moreover, we can add a numeric parameter which increases the limit, to be more accurate. For example, the query {{approxResults=trueapproxResults.factor=1.5}} will retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and rows=1000, each shard will return 15 documents. Furthermore, this can reduce the problem of deep paging, because the same thing can be applied there. when requested start=10, Solr creating shard request with start=0 and rows=START+ROWS. In the approximated approach, start parameter (in the shard requests) can be set to 10/numShards. The idea of the approxResults.factor creates some difficulties here, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
On 18/11/14 14:24, Ferenczi, Jim | EURHQ wrote: Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. In fact if you ask for 1000 hits in a distributed SolrCloud, each shard has to retrieve 1000 hits to get the unique key of each match and send it back to the shard responsible for the merge. Yes, at least if each shard has 1000 hits. It is when each shard has a lot of actual hits this issue becomes a problem This means that even if your data is fairly distributed among the 1000 shards, they still have to decompress 1000 documents during the first phase of the search. Eactly! There are ways to avoid this, for instance you can check this JIRA where the idea is discussed: https://issues.apache.org/jira/browse/SOLR-5478 Guess our solution (described in my previous mail) is kinda an alternative solution to SOLR-5478 Bottom line is that if you have 1000 shards the GET_FIELDS stage should be fast (if your data is fairly distributed) but the GET_TOP_IDS is not. Exactly! You could avoid a lot of decompression/reads by using the field cache to retrieve the unique key in the first stage. We have so much data and relatively little RAM, so we cannot use field-cache, because it requires an amount of memory linearly dependent on the number of docs in store. We can never fulfill this requirement. Doc-values is a valid approach for us, but currently our id-field is unfortunately not doc-value - at it is not easy for us to just re-index all documents with id as doc-value. Besides that, our solution is diagonal on a field-cache/doc-values solution in the way that one does not prevent the other, and if you do one of them you will still be able to benefit from doing the other one. Cheers, Jim Thanks, Jim
Re: Slow searching limited but high rows across many shards all with high hits
On Tue, 2014-11-18 at 11:26 +0100, Per Steffensen wrote: It is likely (since we are not routing on anything that has to do with the content text-field) that the overall-top-1000 i fairly evenly distributed among the 1000 shards Streaming in Heliosearch might work out of the box: http://heliosearch.org/streaming-aggregation-for-solrcloud/#CloudSolrStream Caveat: I haven't used streaming, so I can't say for sure and don't know how/if it handles early termination, which would be a prerequisite for speedup in your setup. [Detailed description of solution] [SOLR-5798] Hope you get the idea, and why it makes us perform much much better?! Yes, I got it. We discussed it a bit at the office and it seems like a really fine idea, new to Solr. As Solr is often used for log processing these days, the number of setups with many shards and non-trivial request sizes is growing: Your solution would help others. The obvious next step would be a JIRA. However, I know that you have had very limited success there, even for simple patches. General JIRA-handling might be a relevant topic for another thread, but I don't have the energy for that discussion right now. Of course, the concrete speed-up factor is highly dependent on how long it takes to resolve IDs. You state speeds of 10, 30, 60 minutes without the patch and a factor 60 speedup. As I understand it, the real difference is whether ~1000*#shards IDs are resolved or only 1000. With 50 shards or 50.000 ID-lookups per machine, that puts your worst case resolve-time at 50.000 IDs / (60 min * 60 s/min) ~= 13 IDs/s and the best case (10 min total) at ~83 IDs/s per machine. (guessing spinning drives here) With a setup with faster ID-resolving, the benefits from your patch might be too small for top-1000 to be really interesting as ID-resolving would not take up as much of the overall processing time. But it would make it possible to scaling that number up (top-1 or above). - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
Limiting my answer to the #shards*rows issue, you can have a look at https://issues.apache.org/jira/browse/SOLR-5611. The odds all top docs are in the same shard in a uniformly distributed index are negligible, so you can use it to request much fewer docs per shard. There's a nice discrete equation that gives you the shards.rows you should request, depending on the #shards, rows, and confidence level that all the top-rows are returned (confidence=0.95 would mean 95% of the responses will contain the exact top rows as if rows and shards.rows were equal). Our use case: 36 shards, 2000 rows, conf=99% -- shards.rows=49 which gave a good performance boost. On Tue, Nov 18, 2014 at 3:24 PM, Ferenczi, Jim | EURHQ jim.feren...@mail.rakuten.com wrote: Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. In fact if you ask for 1000 hits in a distributed SolrCloud, each shard has to retrieve 1000 hits to get the unique key of each match and send it back to the shard responsible for the merge. This means that even if your data is fairly distributed among the 1000 shards, they still have to decompress 1000 documents during the first phase of the search. There are ways to avoid this, for instance you can check this JIRA where the idea is discussed: https://issues.apache.org/jira/browse/SOLR-5478 Bottom line is that if you have 1000 shards the GET_FIELDS stage should be fast (if your data is fairly distributed) but the GET_TOP_IDS is not. You could avoid a lot of decompression/reads by using the field cache to retrieve the unique key in the first stage. Cheers, Jim From: Per Steffensen st...@liace.dk Sent: Tuesday, November 18, 2014 19:26 To: dev@lucene.apache.org Subject: Re: Slow searching limited but high rows across many shards all with high hits On 17/11/14 18:47, Toke Eskildsen wrote: Per Steffensen [st...@liace.dk] wrote: I understand that the request is for rows * #shards IDs+score in total, but if you have presented your alternative, I have failed to see that. I deliberately did not present the solution we did, in order for you guys not to focus on whether or not this particular solution to the problem already has been implemented after 4.4.0 (the version of Apache Solr we currently base our version of Solr on). Guess the problem can be solved in numerous ways, so just wanted you to focus on whether or not it has been solved in some way (do not care which way) Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. Im not sure what you are aiming at with this comment. But I can say that it is very very unlikely that the overall-top-1000 all originate from a single shard. It is likely (since we are not routing on anything that has to do with the content text-field) that the overall-top-1000 i fairly evenly distributed among the 1000 shards I was about to suggest collapsing to 2 or 3 months/shard, but that would be ruining a logistically nice setup. Yes, we are also considering options in that area, but we really would like not to have to go this way There are many additional reasons (besides the ones I mentioned in my previous mail). E.g. we are (maybe) about to introduce a bloom-filter on shard-level, which will help us reduce performance on indexing significantly. Bloom-filter will help quickly say document with this particular id does definitely not exist when doing optimistic locking (including version-lookup). First-iteration tests has shown that it can reduce the resources/time spent on indexing by up to 80%. Bloom-filter data does not merge very well. 5-50 billion records/server? That seems very high, but after hearing about many different Solr setups at Lucene/Solr Revolution, I try to adopt a sounds insane, but it's probably correct-mindset. We are not in the business of ms-response-times of thousands of searches per sec/min. We can accept response-times measured in secs, and there not performed thousands of searches per minute. We are in the business of being able to index enormous amounts of data per second though. But this issue is about searches - we really do not like 10-30-60 min response-times on searches that ought to run much faster. Anyway, setup accepted, problem acknowledged, your possibly re-usable solution not understood. What we did in our solution is the following Introduced the concept of distributed query algorithm controlled by request-param dqa. We are naming the existing (default) query-algorithm (not knowing about SOLR-5768) find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce an new alternative distributed query algorithm called find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias frfilrfbi :-) ) *
Re: Slow searching limited but high rows across many shards all with high hits
On 18/11/14 14:49, Manuel Le Normand wrote: Limiting my answer to the #shards*rows issue, you can have a look at https://issues.apache.org/jira/browse/SOLR-5611. Thanks for pointing out SOLR-5611! I did not know about it, and knowledge about work and ideas in this area was what I wanted to achieve by this mail to the mailing-list. Havnt dived much into SOLR-5611, but it seems to me that it will allow you to ask each shard for less than 1000 (if 1000 is the number in the outer super-request) rows in the get-id-score-sub-requests stage - ask each shard for 1000/#shards (maybe more?)? Our solution (as specified in a previous mail) will also ask each shard for less than 1000 rows in the get-id-score-sub-request stage. The difference is that it will start out calculating the exact rows-value to use for each shard, issuing very in-expensive score-only sub-requests. I think this approach is at least as nice, if not nicer, as the SOLR-5611 approach. Regards, Steff
Re: Slow searching limited but high rows across many shards all with high hits
On 18/11/14 14:41, Toke Eskildsen wrote: Your solution would help others. I agree, and that, of course, would be great The obvious next step would be a JIRA. However, I know that you have had very limited success there, even for simple patches. No shit :-) But hopefully better success in the future! One of the goals with this mailing-thread was, besides getting a feeling of whether or not this issue has already been fixed, to get an idea if the community was interested in getting (and helping shape) the solution. I can create a JIRA for sure, but I really do not want to merge our solution to branch_5x, trunk or whatever, and take the long discussions, if it is leading nowhere. General JIRA-handling might be a relevant topic for another thread, but I don't have the energy for that discussion right now. Me neither Of course, the concrete speed-up factor is highly dependent on how long it takes to resolve IDs. Yes! I cannot guarantee a speed-up factor of 60 or something. I just tried to state what we have seen, on our concrete setup, with our amount of data, with our data-distribution on our hardware. There are lots of factors, for sure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
On Tue, 2014-11-18 at 14:42 +0100, Per Steffensen wrote: Doc-values is a valid approach for us, but currently our id-field is unfortunately not doc-value - at it is not easy for us to just re-index all documents with id as doc-value. At Lucene/Solr Revolution 2014 I presented that exact problem at Stump the Chump. After stalling for a minute with obligatory derogatory comments on our project design and a just at obligatory car analogy, he pointed me in the direction of a filtered index reader and asked me to code it and make it Open Source. Thomas Egense and I plan to take a crack at it one of these days. If the field is stored, it should be possible to make it DocValued by optimizing the index with a custom reader. Besides that, our solution is diagonal on a field-cache/doc-values solution in the way that one does not prevent the other, and if you do one of them you will still be able to benefit from doing the other one. I noticed that. Multiplying solutions are awesome. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Slow searching limited but high rows across many shards all with high hits
On 18/11/14 15:52, Toke Eskildsen wrote: Thomas Egense and I plan to take a crack at it one of these days. If the field is stored, it should be possible to make it DocValued by optimizing the index with a custom reader. Yes, it ought to be, but AFAIK it currently is not. Looking forward to see the results of your work! Say hi to Thomas, BTW - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216267#comment-14216267 ] Tim Allison commented on LUCENE-5205: - Thank you for raising this, [~modassar]. The challenge is that the parser can use both and ' to mark the beginnings and endings of SpanNear. As an initial hack, I was hoping that users would backslash single quotes within phrases, but that puts too much burden on users. I'll see if I can add a bit more smarts so that if the parser knows that it is in a phrase, it will ignore ' and vice versa. Are you using the my github standalone jars? Or, how are you using this? [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6625) HttpClient callback in HttpSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216313#comment-14216313 ] Per Steffensen commented on SOLR-6625: -- bq. One thing I wanted to avoid with this patch is putting authentication-type specific details in HttpSolrServer. SOLR-4470 has a little logic there that is basic-auth specific Actually SOLR-4470 aims at introducing a framework for any authentication-type, and then (for now) implement basic-auth using this framework. It is prepared for adding new authentication types. See {{AuthCredentials}} carrying any kind of {{AbstractAuthMethod}} - currently only {{AbstractAuthMethod}}-implementation is {{BasicHttpAuth}}. Adding a new authentication type should basically be about adding a new {{AbstractAuthMethod}}-implementation. But sorry, I do not remember to many details. But what I do know, is that we have been using SOLR-4470 solution now in production for a long time, without any problems at all. bq. As for the suggestion of using a BufferedHttpEntity rather than the OPTIONS approach I describe above, that certainly may be an improvement. I do not know if it is an improvement compared to your approach. I just implemented in a way that worked. Supporting non-preemptive authenticating POST-requests was not the main focus of SOLR-4470, so I just quickly did it in the way that I found it could be done - without considering performance or anything HttpClient callback in HttpSolrServer - Key: SOLR-6625 URL: https://issues.apache.org/jira/browse/SOLR-6625 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Attachments: SOLR-6625.patch, SOLR-6625.patch Some of our setups use Solr in a SPNego/kerberos setup (we've done this by adding our own filters to the web.xml). We have an issue in that SPNego requires a negotiation step, but some HttpSolrServer requests are not repeatable, notably the PUT/POST requests. So, what happens is, HttpSolrServer sends the requests, the server responds with a negotiation request, and the request fails because the request is not repeatable. We've modified our code to send a repeatable request beforehand in these cases. It would be nicer if HttpSolrServer provided a pre/post callback when it was making an httpclient request. This would allow administrators to make changes to the request for authentication purposes, and would allow users to make per-request changes to the httpclient calls (i.e. modify httpclient requestconfig to modify the timeout on a per-request basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216353#comment-14216353 ] Modassar Ather commented on LUCENE-5205: Thanks [~talli...@apache.org] for your response. I am using it from lucene5205 branch(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene5205/) integrated as patch to latest Lucene core jar. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-5861) CachingTokenFilter should use ArrayList not LinkedList
[ https://issues.apache.org/jira/browse/LUCENE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed LUCENE-5861. Resolution: Duplicate Fix Version/s: 5.0 CachingTokenFilter should use ArrayList not LinkedList -- Key: LUCENE-5861 URL: https://issues.apache.org/jira/browse/LUCENE-5861 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 5.0 CachingTokenFilter, to my surprise, puts each new AttributeSource.State onto a LinkedList. I think it should be an ArrayList. On large fields that get analyzed, there can be a ton of State objects to cache. I also observe that State is itself a linked list of other State objects. Perhaps we could take this one step further and do parallel arrays of AttributeImpl, thereby bypassing State. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues
[ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216393#comment-14216393 ] Yonik Seeley commented on SOLR-5944: I was just chatting with Shalin while we were both at ApacheCon. In addition to leader-replica reordering issues, we also need to handle realtime-get in the single-node case. The way to do this is just add the update to the tlog like normal (with some indication that it's a partial update and doesn't contain all the fields). When /get is invoked and we find an update from the in-memory tlog map for that document, we need to go through the same logic as a soft commit (open a new realtime-searcher and clear the tlog map), and then use the realtime-searcher to get the latest document. Oh, and _version_ will need to use DocValues so it can be updated at the same time of course. Support updates of numeric DocValues Key: SOLR-5944 URL: https://issues.apache.org/jira/browse/SOLR-5944 Project: Solr Issue Type: New Feature Reporter: Ishan Chattopadhyaya Assignee: Shalin Shekhar Mangar Attachments: SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch LUCENE-5189 introduced support for updates to numeric docvalues. It would be really nice to have Solr support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216415#comment-14216415 ] Tim Allison commented on LUCENE-5205: - The permanent hang is surprising. When I isolate the singlequote regex, I get a permanent hang in Java, but not Perl. {noformat} String s = SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD; Matcher m = Pattern.compile('((?:''|[^']+)+)').matcher(s); while (m.find()) { System.out.println(m.start()); } System.out.println(done); {noformat} {noformat} my $s = SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD; while ($s =~/'((?:''|[^']+)+)'/g) { print here\n; } print done\n; {noformat} [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-6062: Attachment: LUCENE-6062.patch Updated patch with a dedicated test (keeping the old one as it was). The logic is the same as the first patch, but i tried to clarify better what is going on. Additionally, i removed some extraneous parameters in some of the related methods. Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6752) Buffer Cache allocate/lost should be exposed through JMX
[ https://issues.apache.org/jira/browse/SOLR-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-6752: - Assignee: Mark Miller Buffer Cache allocate/lost should be exposed through JMX Key: SOLR-6752 URL: https://issues.apache.org/jira/browse/SOLR-6752 Project: Solr Issue Type: Bug Reporter: Mike Drob Assignee: Mark Miller Labels: metrics Attachments: SOLR-6752.patch Currently, {{o.a.s.store.blockcache.Metrics}} has fields for tracking buffer allocations and losses, but they are never updated nor exposed to a receiving metrics system. We should do both. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6750) Solr adds RequestHandler SolrInfoMBeans twice to the JMX server.
[ https://issues.apache.org/jira/browse/SOLR-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-6750. --- Resolution: Duplicate Solr adds RequestHandler SolrInfoMBeans twice to the JMX server. Key: SOLR-6750 URL: https://issues.apache.org/jira/browse/SOLR-6750 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, Trunk I think we want to stop doing this for 5. It should be really cheap to enumerate and get stats for all of the SolrInfoMBeans, but between this and SOLR-6747, you will overall call getStatistics far too much. They are added twice because all request handlers are added using their path as the key, and then whatever the SolrResourceLoader has created is added using the default getName (the full class name) as the key. I think we should start only allowing an object to appear once in the bean map in 5.0. The way the code currently works, the replication handler objects would take precedence, which seems right to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?
[ https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216442#comment-14216442 ] Mark Miller commented on SOLR-3774: --- I duplicated this issue with SOLR-6750. The way I solved is to not let SolrResourceLoader.inform add any of the same objects that already exist with a simple check of !infoRegistry.containsValue(bean). I think it might be a better check than relying on names because we don't really ever want to add the same object twice - especially considering SOLR-6586. {code} for (SolrInfoMBean bean : arr) { if (!infoRegistry.containsValue(bean)) { try { infoRegistry.put(bean.getName(), bean); } catch (Exception e) { log.warn(could not register MBean ' + bean.getName() + '., e); } } } {code} /admin/mbean returning duplicate search handlers with names that map to their classes? -- Key: SOLR-3774 URL: https://issues.apache.org/jira/browse/SOLR-3774 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3774.patch Offshoot of SOLR-3232... bq. Along with some valid entries with names equal to the request handler names (/get search /browse) it also turned up one with the name org.apache.solr.handler.RealTimeGetHandler and another with the name org.apache.solr.handler.component.SearchHandler ...seems that we may have a bug with request handlers getting registered multiple times, once under their real name and once using their class? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6060) Remove IndexWriter.unLock
[ https://issues.apache.org/jira/browse/LUCENE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216453#comment-14216453 ] Michael McCandless commented on LUCENE-6060: Well, Solr still has the unlockOnStartup; I wasn't sure what to do with that so I left it for now and opened SOLR-6737. Most Lucene apps shouldn't be using the legacy SimpleFSLockFactory, and if they are 1) they must already be dealing with the remove lock on startup, 2) if they are doing so via IndexWriter.unlock, they will see the deprecation/compilation error on upgrade, dig in CHANGES, find this issue, and then have to do their own scary things: I think this is healthy. I don't really like the deleteOnExit method. Remove IndexWriter.unLock - Key: LUCENE-6060 URL: https://issues.apache.org/jira/browse/LUCENE-6060 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10, 5.0, Trunk Attachments: LUCENE-6060.patch This method used to be necessary, when our locking impls were buggy, but it's a godawful dangerous method: it invites index corruption. I think we should remove it. Apps that for some scary reason really need it can do their own thing... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6755) ClassCastException from CloudMLTQParserTest
Hoss Man created SOLR-6755: -- Summary: ClassCastException from CloudMLTQParserTest Key: SOLR-6755 URL: https://issues.apache.org/jira/browse/SOLR-6755 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Anshum Gupta The seed doesn't reproduce for me, but the ClassCastException seems hinky and worth looking into... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 50.7s J1 | CloudMLTQParserTest.testDistribSearch [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList [junit4]at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) [junit4]at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) [junit4]at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) [junit4]at java.lang.Thread.run(Thread.java:745) {noformat} http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) At revision 1640267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_67) - Build # 11466 - Failure!
I can't reproduce this, and i don't really understand it, but i know anshum was working on this very recently so i filed a jira for him so we don't lose track of it... https://issues.apache.org/jira/browse/SOLR-6755 : Date: Tue, 18 Nov 2014 04:01:34 + (UTC) : From: Policeman Jenkins Server jenk...@thetaphi.de : Reply-To: dev@lucene.apache.org : To: jbern...@apache.org, dev@lucene.apache.org : Subject: [JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_67) - Build # 11466 - : Failure! : : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/ : Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) : : 1 tests failed. : REGRESSION: org.apache.solr.search.mlt.CloudMLTQParserTest.testDistribSearch : : Error Message: : java.lang.String cannot be cast to java.util.ArrayList : : Stack Trace: : java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList : at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) : at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) : at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) : at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) : at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) : at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) : at java.lang.reflect.Method.invoke(Method.java:606) : at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) : at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) : at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) : at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) : at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) : at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) : at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) : at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) : at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) : at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) : at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) : at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) : at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) : at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) : at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) : at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) : at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) : at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) : at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) : at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) : at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) : at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) : at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) : at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) : at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) : at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) : at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) : at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) : at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) : at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) : at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) : at
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216477#comment-14216477 ] Tim Allison commented on LUCENE-5205: - Ha, turns out the hang isn't permanent, you just need to be patient. ;) The minimal code to reproduce this inefficiency: {noformat} String s = 'S SOLUTION a PROVIDER TESTABCD; long start = new Date().getTime(); Matcher m = Pattern.compile('(([^']+)+)').matcher(s); while (m.find()) { System.out.println(m.start()); } System.out.println(elapsed: + (new Date().getTime()-start)); {noformat} When I ran this against different length strings, I got these times. I did two runs for each string. ||String|| MILLIS_RUN1|| MILLIS_RUN2|| |'S SOLUTION a PROVIDER TE| 937|933| |'S SOLUTION a PROVIDER TES|1671| 1310| |'S SOLUTION a PROVIDER TEST| 3165| 2643| |'S SOLUTION a PROVIDER TESTA| 5165| 5227| |'S SOLUTION a PROVIDER TESTAB| 9335| 9872| |'S SOLUTION a PROVIDER TESTABC |19964| 18437| |'S SOLUTION a PROVIDER TESTABCD| 39387| 35961| I fixed the regex inefficiency on my github [site|https://github.com/tballison/lucene-addons]. I set that up for standalone addons that track with the latest stable builds. I'll respond to your other issues shortly. Thank you [~modassar] for raising this issue! [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for
[jira] [Created] (SOLR-6756) The cloud-dev scripts do not seem to work with the new example layout.
Mark Miller created SOLR-6756: - Summary: The cloud-dev scripts do not seem to work with the new example layout. Key: SOLR-6756 URL: https://issues.apache.org/jira/browse/SOLR-6756 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: Trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6757) SolrInfoMBean should be an abstract class rather than an interface.
Mark Miller created SOLR-6757: - Summary: SolrInfoMBean should be an abstract class rather than an interface. Key: SOLR-6757 URL: https://issues.apache.org/jira/browse/SOLR-6757 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, Trunk This will give us greater flexibility around adding things with back compat support in minor releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?
[ https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216493#comment-14216493 ] Tomás Fernández Löbbe commented on SOLR-3774: - I think that makes sense /admin/mbean returning duplicate search handlers with names that map to their classes? -- Key: SOLR-3774 URL: https://issues.apache.org/jira/browse/SOLR-3774 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3774.patch Offshoot of SOLR-3232... bq. Along with some valid entries with names equal to the request handler names (/get search /browse) it also turned up one with the name org.apache.solr.handler.RealTimeGetHandler and another with the name org.apache.solr.handler.component.SearchHandler ...seems that we may have a bug with request handlers getting registered multiple times, once under their real name and once using their class? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting
[ https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216504#comment-14216504 ] Mark Miller commented on SOLR-4735: --- bq. feel free to gut what we have in Solr5 We don't have a lot of time, but it would be great to solve SOLR-6586 - it really requires a different stats API to be sensible I think. It's a little tricky to make nice, but really the API calls for each individual attribute should be able to be calculated independently. Otherwise, there is just so much recalculation that it's hard to have everything be live and fast and even if you only want to fetch a single fast attribute, you will be penalized by the slowest. If you currently use a tool to enumerate and look at each attribute for monitoring, because of the duplicate bean issue and SOLR-6586, you can check the size of a directory like 40 times or something crazy when it really only had to be checked once. There is an API mismatch. Improve Solr metrics reporting -- Key: SOLR-4735 URL: https://issues.apache.org/jira/browse/SOLR-4735 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch Following on from a discussion on the mailing list: http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+ It would be good to make Solr play more nicely with existing devops monitoring systems, such as Graphite or Ganglia. Stats monitoring at the moment is poll-only, either via JMX or through the admin stats page. I'd like to refactor things a bit to make this more pluggable. This patch is a start. It adds a new interface, InstrumentedBean, which extends SolrInfoMBean to return a [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a couple of MetricReporters (which basically just duplicate the JMX and admin page reporting that's there at the moment, but which should be more extensible). The patch includes a change to RequestHandlerBase showing how this could work. The idea would be to eventually replace the getStatistics() call on SolrInfoMBean with this instead. The next step would be to allow more MetricReporters to be defined in solrconfig.xml. The Metrics library comes with ganglia and graphite reporting modules, and we can add contrib plugins for both of those. There's some more general cleanup that could be done around SolrInfoMBean (we've got two plugin handlers at /mbeans and /plugins that basically do the same thing, and the beans themselves have some weirdly inconsistent data on them - getVersion() returns different things for different impls, and getSource() seems pretty useless), but maybe that's for another issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216536#comment-14216536 ] Michael McCandless commented on LUCENE-6061: I think you could do this with PH using an appropriate tokenizer. Ie, you'd have a custom tokenizer that tokenizes your markup into the 4 different cases (so you are still indexing 4 different fields), but that tokenizer carefully sets the token offsets into the original text (which you'd store with no markup). At search time, regardless of which of the 4 fields was used for searching, you'd then use the token offsets against the same original stored field. You should be able to do this by overriding PostingsHighlighter.loadFieldValues... though maybe we could make this easier somehow, to say when I highlight field X, load its content from field Y... Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_20) - Build # 4439 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/ Java: 64bit/jdk1.8.0_20 -XX:-UseCompressedOops -XX:+UseSerialGC (asserts: true) 1 tests failed. REGRESSION: org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates Error Message: unpaired high surrogate: d86c, followed by: e28f Stack Trace: java.lang.AssertionError: unpaired high surrogate: d86c, followed by: e28f at __randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0) at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191) at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403) at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352) at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362) at org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216571#comment-14216571 ] Tim Allison commented on LUCENE-5205: - {quote} field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) {quote} Unfortunately, I can't think of a way around this. In the SpanQueryParser, single quotes should be used to mark a token that should not be further parsed, i.e. '/files/a/b/c/path.html' should be treated as a string not a regex. I toyed with requiring a space before the start ' and space after the ', but that seemed hacky. If you escape your apostrophes, you should get the results you expect (this is with a whitespace analyzer, you may get different results with StandardAnalyzer): {noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat} yields:f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., f1:ltd {noformat} {quote}q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) {quote} I think this is fixed on github. What Analyzer chain are you using? [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is
[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216571#comment-14216571 ] Tim Allison edited comment on LUCENE-5205 at 11/18/14 6:47 PM: --- {quote} field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) {quote} Unfortunately, I can't think of a way around this. In the SpanQueryParser, single quotes should be used to mark a token that should not be further parsed, i.e. '/files/a/b/c/path.html' should be treated as a string not a regex. I toyed with requiring a space before the start ' and space after the ', but that seemed hacky. If you escape your apostrophes, you should get the results you expect (this is with a whitespace analyzer, you may get different results with StandardAnalyzer): {noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat} yields: {noformat} f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., f1:ltd {noformat} {quote}q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) {quote} I think this is fixed on github. What Analyzer chain are you using? was (Author: talli...@mitre.org): {quote} field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, field:and, field:consulting, field:company], 0, true) {quote} Unfortunately, I can't think of a way around this. In the SpanQueryParser, single quotes should be used to mark a token that should not be further parsed, i.e. '/files/a/b/c/path.html' should be treated as a string not a regex. I toyed with requiring a space before the start ' and space after the ', but that seemed hacky. If you escape your apostrophes, you should get the results you expect (this is with a whitespace analyzer, you may get different results with StandardAnalyzer): {noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat} yields:f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., f1:ltd {noformat} {quote}q=field: (SEARCH TOOLS PROVIDER CONSULTING COMPANY) Gets transformed to following: +spanNear([field:search, field:tools, field:provider, field:, field:consulting, field:company], 0, true) {quote} I think this is fixed on github. What Analyzer chain are you using? [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in:
[jira] [Commented] (SOLR-6625) HttpClient callback in HttpSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216585#comment-14216585 ] Gregory Chanan commented on SOLR-6625: -- bq. Actually SOLR-4470 aims at introducing a framework for any authentication-type, and then (for now) implement basic-auth using this framework Ah, I see, I misinterpreted the SOLR-4470 code in HttpSolrServer -- it uses BasicAuthCache and BasicScheme which I thought were in reference to basic auth, but they are really just default implementations. What I'm really arguing -- and it's my fault I didn't make it clear with example code -- is that the authentication type may affect how you want the http requests to look, beyond just the credentials. For example, I'm using an authentication filter based off of Hadoop's AuthenticationFilter (https://github.com/apache/hadoop/blob/7250b0bf914a55d0fa4802834de7f1909f1b0d6b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java). That filter does SPNego negotiation on the first request, but sets a cookie you can use to avoid the negotiation on subsequent requests. So, I wouldn't want the the SOLR-4470 implementation where I buffer up every request; I only want to do that on the first request to the server on the connection. From seeing the SOLR-4470 code, though, it looks like I was thinking about this incorrectly. Instead of the HttpClientCallback being a function of the HttpSolrServer, it's really a function of the AuthCredentials implementation. So, the default implementation would just be the credentialsButNonPreemptive/getHttpContextForRequest code you have in HttpSolrServer in SOLR-4470, but other AuthCredentials implementations could override. Does that sound right to you, [~steff1193]? bq. I do not know if it is an improvement compared to your approach. I just implemented in a way that worked. Supporting non-preemptive authenticating POST-requests was not the main focus of SOLR-4470, so I just quickly did it in the way that I found it could be done - without considering performance or anything Cool, I'll investigate in another jira. HttpClient callback in HttpSolrServer - Key: SOLR-6625 URL: https://issues.apache.org/jira/browse/SOLR-6625 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Attachments: SOLR-6625.patch, SOLR-6625.patch Some of our setups use Solr in a SPNego/kerberos setup (we've done this by adding our own filters to the web.xml). We have an issue in that SPNego requires a negotiation step, but some HttpSolrServer requests are not repeatable, notably the PUT/POST requests. So, what happens is, HttpSolrServer sends the requests, the server responds with a negotiation request, and the request fails because the request is not repeatable. We've modified our code to send a repeatable request beforehand in these cases. It would be nicer if HttpSolrServer provided a pre/post callback when it was making an httpclient request. This would allow administrators to make changes to the request for authentication purposes, and would allow users to make per-request changes to the httpclient calls (i.e. modify httpclient requestconfig to modify the timeout on a per-request basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_20) - Build # 4439 - Failure!
I can't reproduce this, but its also not a random test. Just very simple asserts. I tried reproducing on linux with the master seed, same jvm version and flags, no luck. On Tue, Nov 18, 2014 at 1:32 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/ Java: 64bit/jdk1.8.0_20 -XX:-UseCompressedOops -XX:+UseSerialGC (asserts: true) 1 tests failed. REGRESSION: org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates Error Message: unpaired high surrogate: d86c, followed by: e28f Stack Trace: java.lang.AssertionError: unpaired high surrogate: d86c, followed by: e28f at __randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0) at org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191) at org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403) at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352) at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362) at org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Updated] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?
[ https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3774: -- Attachment: SOLR-3774.patch /admin/mbean returning duplicate search handlers with names that map to their classes? -- Key: SOLR-3774 URL: https://issues.apache.org/jira/browse/SOLR-3774 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3774.patch, SOLR-3774.patch Offshoot of SOLR-3232... bq. Along with some valid entries with names equal to the request handler names (/get search /browse) it also turned up one with the name org.apache.solr.handler.RealTimeGetHandler and another with the name org.apache.solr.handler.component.SearchHandler ...seems that we may have a bug with request handlers getting registered multiple times, once under their real name and once using their class? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?
[ https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216668#comment-14216668 ] Gregory Chanan commented on SOLR-3774: -- +1 /admin/mbean returning duplicate search handlers with names that map to their classes? -- Key: SOLR-3774 URL: https://issues.apache.org/jira/browse/SOLR-3774 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3774.patch, SOLR-3774.patch Offshoot of SOLR-3232... bq. Along with some valid entries with names equal to the request handler names (/get search /browse) it also turned up one with the name org.apache.solr.handler.RealTimeGetHandler and another with the name org.apache.solr.handler.component.SearchHandler ...seems that we may have a bug with request handlers getting registered multiple times, once under their real name and once using their class? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest
[ https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216687#comment-14216687 ] Anshum Gupta commented on SOLR-6755: I can't seem to reproduce it even after multiple runs. I'm adding some safety checks in the test though and will commit a patch that handles this. ClassCastException from CloudMLTQParserTest --- Key: SOLR-6755 URL: https://issues.apache.org/jira/browse/SOLR-6755 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Anshum Gupta The seed doesn't reproduce for me, but the ClassCastException seems hinky and worth looking into... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 50.7s J1 | CloudMLTQParserTest.testDistribSearch [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList [junit4] at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) [junit4] at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) [junit4] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) At revision 1640267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest
[ https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216714#comment-14216714 ] ASF subversion and git services commented on SOLR-6755: --- Commit 1640416 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1640416 ] SOLR-6755: Fix the test to always return 2 parsedqueries i.e. have more 2 shards ClassCastException from CloudMLTQParserTest --- Key: SOLR-6755 URL: https://issues.apache.org/jira/browse/SOLR-6755 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Anshum Gupta The seed doesn't reproduce for me, but the ClassCastException seems hinky and worth looking into... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 50.7s J1 | CloudMLTQParserTest.testDistribSearch [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList [junit4] at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) [junit4] at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) [junit4] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) At revision 1640267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest
[ https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216716#comment-14216716 ] ASF subversion and git services commented on SOLR-6755: --- Commit 1640417 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1640417 ] SOLR-6755: Fix the test to always return 2 parsedqueries i.e. have more 2 shards (merge from trunk) ClassCastException from CloudMLTQParserTest --- Key: SOLR-6755 URL: https://issues.apache.org/jira/browse/SOLR-6755 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Anshum Gupta The seed doesn't reproduce for me, but the ClassCastException seems hinky and worth looking into... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 50.7s J1 | CloudMLTQParserTest.testDistribSearch [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList [junit4] at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) [junit4] at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) [junit4] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) At revision 1640267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest
[ https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216719#comment-14216719 ] Anshum Gupta commented on SOLR-6755: This commit should fix the issue. Changed the test to always have 2 shards i.e. never have 1 shard, that returns a String instead of an ArrayListString in the debug response. ClassCastException from CloudMLTQParserTest --- Key: SOLR-6755 URL: https://issues.apache.org/jira/browse/SOLR-6755 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Anshum Gupta The seed doesn't reproduce for me, but the ClassCastException seems hinky and worth looking into... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 50.7s J1 | CloudMLTQParserTest.testDistribSearch [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList [junit4] at __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0) [junit4] at org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124) [junit4] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true) At revision 1640267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216742#comment-14216742 ] Michael McCandless commented on LUCENE-6062: +1 Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6708) Smoke tester couldn't communicate with Solr started using 'bin/solr start'
[ https://issues.apache.org/jira/browse/SOLR-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216750#comment-14216750 ] ASF subversion and git services commented on SOLR-6708: --- Commit 1640419 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1640419 ] SOLR-6708: wrap the kill existing Solr command in a try/except block Smoke tester couldn't communicate with Solr started using 'bin/solr start' -- Key: SOLR-6708 URL: https://issues.apache.org/jira/browse/SOLR-6708 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Steve Rowe Assignee: Timothy Potter Attachments: solr-example.log The nightly-smoke target failed on ASF Jenkins [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-5.x/208/]: {noformat} [smoker] unpack solr-5.0.0.tgz... [smoker] verify JAR metadata/identity/no javax.* or java.* classes... [smoker] unpack lucene-5.0.0.tgz... [smoker] **WARNING**: skipping check of /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar: it has javax.* classes [smoker] **WARNING**: skipping check of /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar: it has javax.* classes [smoker] verify WAR metadata/contained JAR identity/no javax.* or java.* classes... [smoker] unpack lucene-5.0.0.tgz... [smoker] copying unpacked distribution for Java 7 ... [smoker] test solr example w/ Java 7... [smoker] start Solr instance (log=/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/solr-example.log)... [smoker] startup done [smoker] Failed to determine the port of a local Solr instance, cannot create core! [smoker] test utf8... [smoker] [smoker] command sh ./exampledocs/test_utf8.sh http://localhost:8983/solr/techproducts; failed: [smoker] ERROR: Could not curl to Solr - is curl installed? Is Solr not running? [smoker] [smoker] [smoker] stop server using: bin/solr stop -p 8983 [smoker] No process found for Solr node running on port 8983 [smoker] ***WARNING***: Solr instance didn't respond to SIGINT; using SIGKILL now... [smoker] ***WARNING***: Solr instance didn't respond to SIGKILL; ignoring... [smoker] Traceback (most recent call last): [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 1526, in module [smoker] main() [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 1471, in main [smoker] smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir, c.is_signed, ' '.join(c.test_args)) [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 1515, in smokeTest [smoker] unpackAndVerify(java, 'solr', tmpDir, artifact, svnRevision, version, testArgs, baseURL) [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 616, in unpackAndVerify [smoker] verifyUnpacked(java, project, artifact, unpackPath, svnRevision, version, testArgs, tmpDir, baseURL) [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 783, in verifyUnpacked [smoker] testSolrExample(java7UnpackPath, java.java7_home, False) [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 888, in testSolrExample [smoker] run('sh ./exampledocs/test_utf8.sh http://localhost:8983/solr/techproducts', 'utf8.log') [smoker] File /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py, line 541, in run [smoker] raise RuntimeError('command %s failed; see log file %s' % (command, logPath)) [smoker] RuntimeError: command sh ./exampledocs/test_utf8.sh http://localhost:8983/solr/techproducts; failed; see log file /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/example/utf8.log BUILD FAILED
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216757#comment-14216757 ] Tim Allison commented on LUCENE-5205: - [~paul.elsc...@xs4all.nl], I'm sorry for taking so long to get back to you. I just merged trunk and made updates to my fork of the lucene5205 [branch|https://github.com/tballison/lucene-solr/tree/lucene5205]. Let me know if that is of any use to you. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #761: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/761/ 2 tests failed. FAILED: org.apache.solr.hadoop.MorphlineBasicMiniMRTest.testPathParts Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([99737C6C17DEC09]:0) FAILED: org.apache.solr.hadoop.MorphlineBasicMiniMRTest.org.apache.solr.hadoop.MorphlineBasicMiniMRTest Error Message: Suite timeout exceeded (= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (= 720 msec). at __randomizedtesting.SeedInfo.seed([99737C6C17DEC09]:0) Build Log: [...truncated 53887 lines...] BUILD FAILED /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:548: The following error occurred while executing this line: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:200: The following error occurred while executing this line: : Java returned: 1 Total time: 415 minutes 34 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6064) throw exception during sort for misconfigured field
Robert Muir created LUCENE-6064: --- Summary: throw exception during sort for misconfigured field Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6064) throw exception during sort for misconfigured field
[ https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-6064: Attachment: LUCENE-6064.patch Attached is an initial patch: its largish because this check found numerous test bugs. throw exception during sort for misconfigured field --- Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6064.patch If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6732) Back-compat break for LIR state in 4.10.2
[ https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216906#comment-14216906 ] ASF subversion and git services commented on SOLR-6732: --- Commit 1640432 from [~thelabdude] in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1640432 ] SOLR-6732: fix back-compat issue with unit test to verify solution Back-compat break for LIR state in 4.10.2 - Key: SOLR-6732 URL: https://issues.apache.org/jira/browse/SOLR-6732 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Shalin Shekhar Mangar Assignee: Timothy Potter Priority: Blocker Fix For: 4.10.3 Attachments: SOLR-6732.patch, SOLR-6732.patch We changed the LIR state to be kept as a map but it is not back-compatible. The problem is that we're checking for map or string after parsing JSON but if the key has down as a string then json parsing will fail. This was introduced in SOLR-6511. This error will prevent anyone from upgrading to 4.10.2 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?
[ https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216911#comment-14216911 ] Shalin Shekhar Mangar commented on SOLR-3774: - +1 /admin/mbean returning duplicate search handlers with names that map to their classes? -- Key: SOLR-3774 URL: https://issues.apache.org/jira/browse/SOLR-3774 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3774.patch, SOLR-3774.patch Offshoot of SOLR-3232... bq. Along with some valid entries with names equal to the request handler names (/get search /browse) it also turned up one with the name org.apache.solr.handler.RealTimeGetHandler and another with the name org.apache.solr.handler.component.SearchHandler ...seems that we may have a bug with request handlers getting registered multiple times, once under their real name and once using their class? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6732) Back-compat break for LIR state in 4.10.2
[ https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter resolved SOLR-6732. -- Resolution: Fixed Back-compat break for LIR state in 4.10.2 - Key: SOLR-6732 URL: https://issues.apache.org/jira/browse/SOLR-6732 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Shalin Shekhar Mangar Assignee: Timothy Potter Priority: Blocker Fix For: 4.10.3 Attachments: SOLR-6732.patch, SOLR-6732.patch We changed the LIR state to be kept as a map but it is not back-compatible. The problem is that we're checking for map or string after parsing JSON but if the key has down as a string then json parsing will fail. This was introduced in SOLR-6511. This error will prevent anyone from upgrading to 4.10.2 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6732) Back-compat break for LIR state in 4.10.2
[ https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216912#comment-14216912 ] ASF subversion and git services commented on SOLR-6732: --- Commit 1640434 from [~thelabdude] in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1640434 ] SOLR-6732: mention in changes Back-compat break for LIR state in 4.10.2 - Key: SOLR-6732 URL: https://issues.apache.org/jira/browse/SOLR-6732 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.10.2 Reporter: Shalin Shekhar Mangar Assignee: Timothy Potter Priority: Blocker Fix For: 4.10.3 Attachments: SOLR-6732.patch, SOLR-6732.patch We changed the LIR state to be kept as a map but it is not back-compatible. The problem is that we're checking for map or string after parsing JSON but if the key has down as a string then json parsing will fail. This was introduced in SOLR-6511. This error will prevent anyone from upgrading to 4.10.2 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6729) createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA
[ https://issues.apache.org/jira/browse/SOLR-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-6729: - Assignee: Mark Miller createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA Key: SOLR-6729 URL: https://issues.apache.org/jira/browse/SOLR-6729 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke Assignee: Mark Miller Priority: Minor The 'Replica placement strategy for solrcloud' SOLR-6220 ticket will allow more sophisticated replica placement logic but in the meantime this simple change here would allow more predictable locating of replicas via the ordering of the createNodeSet list provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-6086: --- Assignee: Shalin Shekhar Mangar Replica active during Warming - Key: SOLR-6086 URL: https://issues.apache.org/jira/browse/SOLR-6086 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6.1, 4.8.1 Reporter: ludovic Boutros Assignee: Shalin Shekhar Mangar Labels: difficulty-medium, impact-medium Attachments: SOLR-6086.patch, SOLR-6086.patch Original Estimate: 72h Remaining Estimate: 72h At least with Solr 4.6.1, replica are considered as active during the warming process. This means that if you restart a replica or create a new one, queries will be send to this replica and the query will hang until the end of the warming process (If cold searchers are not used). You cannot add or restart a node silently anymore. I think that the fact that the replica is active is not a bad thing. But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account. Currently, I have developped a new very simple component which check that a searcher is registered. I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state. This seems to be more a workaround than a solution but that's all I can do in this version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field
[ https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216974#comment-14216974 ] Adrien Grand commented on LUCENE-6064: -- +1 throw exception during sort for misconfigured field --- Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6064.patch If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6747) Add an optional caching option as a workaround for SOLR-6586.
[ https://issues.apache.org/jira/browse/SOLR-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216987#comment-14216987 ] Gregory Chanan commented on SOLR-6747: -- {code} NamedList cachedStats = this.cachedDynamicStats; NamedList stats; if (useCachedStatsBetweenGetMBeanInfoCalls cachedStats != null) { stats = cachedStats; } else { stats = infoBean.getStatistics(); } {code} small optimization, but maybe better to avoid reading the volatile value if useCachedStatsBetweenGetMBeanInfoCalls is false? i.e. {code} NamedList stats; if (useCachedStatsBetweenGetMBeanInfoCalls) { NamedList cachedStats = this.cachedDynamicStats; if (cachedStats != null) { stats = cachedStats; } } if (stats == null) { stats = infoBean.getStatistics(); } {code} could optimize further by eliminating the conditional when useCachedStatsBetweenGetMBeanInfoCalls is false but perhaps not worth it. Otherwise, looks good, +1. Add an optional caching option as a workaround for SOLR-6586. - Key: SOLR-6747 URL: https://issues.apache.org/jira/browse/SOLR-6747 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-6747.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun commented on LUCENE-6061: -- Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index) and that let's me do the highlighting as well so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:04 PM: - Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets). Like that I can implement a cool way of handling synonym searching as well) and enables me to do highlighting as well (I think it's similar to the approach PH uses? But with storing the results into an index) so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. was (Author: s4ke): Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index) and that let's me do the highlighting as well so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:05 PM: - Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. Like that I can implement a cool way of handling synonym searching as well) and enables me to do highlighting as well (I think it's similar to the approach PH uses? But with storing the results into an index) so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. was (Author: s4ke): Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets). Like that I can implement a cool way of handling synonym searching as well) and enables me to do highlighting as well (I think it's similar to the approach PH uses? But with storing the results into an index) so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:08 PM: - Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. (I think it's similar to the approach PH uses? But with storing the results into an index) Like that I can implement a cool way of handling synonym searching as well) and this enables me to do highlighting without the need of one of the Highlighters in Lucene so I am not that dependent on the Highlighting API anymore. But I think I might need the Highlighter API some time in the near future so I am keeping my _FastVectorHighlighterUtil_ Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. was (Author: s4ke): Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. (I think it's similar to the approach PH uses? But with storing the results into an index) Like that I can implement a cool way of handling synonym searching as well) and this enables me to do highlighting without the need of one of the Highlighters in Lucene so I am not that dependent on the Highlighting API anymore. But I think I might need the Highlighter API some time in the near future so I am keeping my _FastVectorHighlighterUtil_ Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:08 PM: - Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. (I think it's similar to the approach PH uses? But with storing the results into an index) Like that I can implement a cool way of handling synonym searching as well) and this enables me to do highlighting without the need of one of the Highlighters in Lucene so I am not that dependent on the Highlighting API anymore. But I think I might need the Highlighter API some time in the near future so I am keeping my _FastVectorHighlighterUtil_ Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. was (Author: s4ke): Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. Like that I can implement a cool way of handling synonym searching as well) and enables me to do highlighting as well (I think it's similar to the approach PH uses? But with storing the results into an index) so I am not that dependent on the Highlighting API anymore. Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)
[ https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002 ] Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:09 PM: - Well I am doing the synonym approach for other parts of my analysis already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. (I think it's similar to the approach PH uses? But with storing the results into an index) Like that I can implement a cool way of handling synonym searching as well) and this enables me to do highlighting without the need of one of the Highlighters in Lucene so I am not that dependent on the Highlighting API anymore. But I think I might need the Highlighter API some time in the near future so I am keeping my _FastVectorHighlighterUtil_ Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. was (Author: s4ke): Well I am doing the synonym approach for other parts already, but I think the FastVectorHighlighter approach is better as it does exactly the part with when I highlight field X, load its content from field Y, but I just want it to be able to render into arbitrary objects (or in my case I just want the plain offsets). I am currently working on a more sophisticated approach that lets me search for more information about one single token (I am reindexing the document's tokens into a new index with all it's occurences (offsets) by using the same analyzer chain that the complete documents use and extracting the attributes. (I think it's similar to the approach PH uses? But with storing the results into an index) Like that I can implement a cool way of handling synonym searching as well) and this enables me to do highlighting without the need of one of the Highlighters in Lucene so I am not that dependent on the Highlighting API anymore. But I think I might need the Highlighter API some time in the near future so I am keeping my _FastVectorHighlighterUtil_ Generally I just want to make the Highlighter API (I am talking about _FastVectorHighlighter_ here) easier to use and more intuitive than what I would need to do with the indexing trick. Add Support for something different than Strings in Highlighting (FastVectorHighlighter) Key: LUCENE-6061 URL: https://issues.apache.org/jira/browse/LUCENE-6061 Project: Lucene - Core Issue Type: Wish Components: core/search, modules/highlighter Affects Versions: Trunk Reporter: Martin Braun Priority: Critical Labels: FastVectorHighlighter, Highlighter, Highlighting Fix For: 4.10.2, 5.0, Trunk In my application I need Highlighting and I stumbled upon the really neat FastVectorHighlighter. One problem appeared though. It lacks a way to render the Highlights into something different than Strings, so I rearranged some of the code to support that: https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java Is there a specific reason to only support String[] as a return type? If not, I would be happy to write a new class that supports rendering into a generic Type and rewire that into the existing class (or just do it as an addition and leave the current class be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6729) createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA
[ https://issues.apache.org/jira/browse/SOLR-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217015#comment-14217015 ] Mark Miller commented on SOLR-6729: --- Looks good. We should probably add a simple test though. createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA Key: SOLR-6729 URL: https://issues.apache.org/jira/browse/SOLR-6729 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke Assignee: Mark Miller Priority: Minor The 'Replica placement strategy for solrcloud' SOLR-6220 ticket will allow more sophisticated replica placement logic but in the meantime this simple change here would allow more predictable locating of replicas via the ordering of the createNodeSet list provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field
[ https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217017#comment-14217017 ] Michael McCandless commented on LUCENE-6064: +1 throw exception during sort for misconfigured field --- Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6064.patch If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4792) stop shipping a war in trunk (6.0)
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217025#comment-14217025 ] Mark Miller commented on SOLR-4792: --- bq. The issue title is wrong ... we WILL be shipping a .war in 5.x versions. I think we just need to backport this to 5.x. No reason we have to wait until 6x. stop shipping a war in trunk (6.0) -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Fix For: Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4792) stop shipping a war in trunk (6.0)
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-4792: --- stop shipping a war in trunk (6.0) -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4792) stop shipping a war in trunk (6.0)
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4792: -- Fix Version/s: 5.0 Assignee: Mark Miller (was: Robert Muir) stop shipping a war in trunk (6.0) -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4792) stop shipping a war in trunk (6.0)
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217031#comment-14217031 ] Mark Miller commented on SOLR-4792: --- [~andyetitmoves] brought this up at Lucene Revolution - 5.x could be a long line and this is a fairly simple change internally - it just takes a volunteer - let's do it in 5.x as originally planned and voted on. stop shipping a war in trunk (6.0) -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4792) stop shipping a war in 5.0
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4792: -- Summary: stop shipping a war in 5.0 (was: stop shipping a war in trunk (6.0)) stop shipping a war in 5.0 -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6747) Add an optional caching option as a workaround for SOLR-6586.
[ https://issues.apache.org/jira/browse/SOLR-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217043#comment-14217043 ] Mark Miller commented on SOLR-6747: --- bq. small optimization, but maybe better to avoid reading the volatile value if useCachedStatsBetweenGetMBeanInfoCalls is false? +1 Add an optional caching option as a workaround for SOLR-6586. - Key: SOLR-6747 URL: https://issues.apache.org/jira/browse/SOLR-6747 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-6747.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2412) Multipath hierarchical faceting
[ https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217040#comment-14217040 ] Toke Eskildsen commented on SOLR-2412: -- Frankly, I am not sure it ever will. SOLR-2412 is huge and it is a completely separate facet implementation, of which Solr already has too many. We are not currently using it at my organization as we don't have the need for hierarchical faceting and since SOLR-5894 gives us a similar speed-boost when using multiple facets. I hope to add the hierarchical capabilities as overlay to the existing Solr facet code at some point, but I really cannot say when or if that will work out. Sorry about that and apologies for taking so long to come to that realization. Multipath hierarchical faceting --- Key: SOLR-2412 URL: https://issues.apache.org/jira/browse/SOLR-2412 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 4.0 Environment: Fast IO when huge hierarchies are used Reporter: Toke Eskildsen Labels: contrib, patch Attachments: SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch Hierarchical faceting with slow startup, low memory overhead and fast response. Distinguishing features as compared to SOLR-64 and SOLR-792 are * Multiple paths per document * Query-time analysis of the facet-field; no special requirements for indexing besides retaining separator characters in the terms used for faceting * Optional custom sorting of tag values * Recursive counting of references to tags at all levels of the output This is a shell around LUCENE-2369, making it work with the Solr API. The underlying principle is to reference terms by their ordinals and create an index wide documents to tags map, augmented with a compressed representation of hierarchical levels. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection
[ https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217085#comment-14217085 ] ASF subversion and git services commented on LUCENE-6063: - Commit 1640456 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1640456 ] LUCENE-6063: allow overriding whether/how ConcurrentMergeScheduler stalls incoming threads when merges are falling behind Allow overriding ConcurrentMergeScheduler's denial-of-service protection Key: LUCENE-6063 URL: https://issues.apache.org/jira/browse/LUCENE-6063 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6063.patch In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but the issue never converged, so I want to break out one small part of it here: the ability to override CMS's default aggressive denial-of-service protection where it forcefully stalls the incoming threads that are responsible for creating too many segments. More advanced applications can more gracefully handle the too many merges by e.g. slowing down the incoming indexing rate at a higher level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection
[ https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217088#comment-14217088 ] ASF subversion and git services commented on LUCENE-6063: - Commit 1640457 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1640457 ] LUCENE-6063: allow overriding whether/how ConcurrentMergeScheduler stalls incoming threads when merges are falling behind Allow overriding ConcurrentMergeScheduler's denial-of-service protection Key: LUCENE-6063 URL: https://issues.apache.org/jira/browse/LUCENE-6063 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6063.patch In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but the issue never converged, so I want to break out one small part of it here: the ability to override CMS's default aggressive denial-of-service protection where it forcefully stalls the incoming threads that are responsible for creating too many segments. More advanced applications can more gracefully handle the too many merges by e.g. slowing down the incoming indexing rate at a higher level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection
[ https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-6063. Resolution: Fixed Allow overriding ConcurrentMergeScheduler's denial-of-service protection Key: LUCENE-6063 URL: https://issues.apache.org/jira/browse/LUCENE-6063 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6063.patch In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but the issue never converged, so I want to break out one small part of it here: the ability to override CMS's default aggressive denial-of-service protection where it forcefully stalls the incoming threads that are responsible for creating too many segments. More advanced applications can more gracefully handle the too many merges by e.g. slowing down the incoming indexing rate at a higher level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 678 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/678/ 2 tests failed. REGRESSION: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: java.lang.NullPointerException Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: java.lang.NullPointerException at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:569) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at
Re: [JENKINS] Lucene-Solr-4.10-Linux (32bit/jdk1.9.0-ea-b34) - Build # 93 - Failure!
This looks like https://bugs.openjdk.java.net/browse/JDK-8038348 still, only without asserts. It might only still happen in 4.10.x, the codec pull API makes the flush code look completely different. On Sun, Nov 16, 2014 at 9:31 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.10-Linux/93/ Java: 32bit/jdk1.9.0-ea-b34 -server -XX:+UseG1GC (asserts: false) 2 tests failed. REGRESSION: org.apache.lucene.codecs.simpletext.TestSimpleTextTermVectorsFormat.testRamBytesUsed Error Message: 8196 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 8196 at __randomizedtesting.SeedInfo.seed([868B4D2568A55A5E:74285F65A2DA4508]:0) at org.apache.lucene.index.ByteSliceReader.nextSlice(ByteSliceReader.java:109) at org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:76) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:122) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:454) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:80) at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:114) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:439) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:510) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:621) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3227) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3203) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1774) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1754) at org.apache.lucene.index.BaseIndexFileFormatTestCase.testRamBytesUsed(BaseIndexFileFormatTestCase.java:228) at org.apache.lucene.index.BaseTermVectorsFormatTestCase.testRamBytesUsed(BaseTermVectorsFormatTestCase.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well
[ https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217243#comment-14217243 ] Alexandre Rafalovitch commented on SOLR-6633: - This is truly just storing original document, right? And only returning the whole thing as well? Because, in Elasticsearch, the *_src* field is actually used as source for several operations. For example, it is as a source for dynamic update as - by default - fields are not stored individually. And, I think, *_src* field also gets re-written/re-created on update, again because it is actually used as a source of truth. The second issue I wanted to raise is how this will interplay with UpdateRequestProcessors (ES does not really have those). I guess URPs will apply after the content of the field, so the actual fields may look quite different from what's in the *_src*. Finally, I am not clear on what this really means: ??all fields go into the 'df'?? . Do we mean, there is a magic copyField or something? I think we need a bit more specific use-case here, then just an implementation/configuration. Especially, since a similar-but-different implementation in Elasticsearch does not fully match Solr's setup. let /update/json/docs store the source json as well --- Key: SOLR-6633 URL: https://issues.apache.org/jira/browse/SOLR-6633 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Labels: EaseOfUse Fix For: 5.0, Trunk Attachments: SOLR-6633.patch, SOLR-6633.patch it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str /lst /initParams {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6655) Improve SimplePostTool to easily specify target port/collection etc.
[ https://issues.apache.org/jira/browse/SOLR-6655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217250#comment-14217250 ] Alexandre Rafalovitch commented on SOLR-6655: - [~janhoy]: spring.io may have a good base for a full-featured client with spring.data.solr, spring.shell and a bunch of other modules one could pull in. Might be a little *large* though :-) Improve SimplePostTool to easily specify target port/collection etc. Key: SOLR-6655 URL: https://issues.apache.org/jira/browse/SOLR-6655 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Erik Hatcher Labels: difficulty-easy, impact-medium Fix For: 5.0, Trunk Attachments: SOLR-6655.patch Right now, the SimplePostTool has a single parameter 'url' that can be used to send the request to a specific endpoint. It would make sense to allow users to specify just the collection name, port etc. explicitly and independently as separate parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6633) let /update/json/docs store the source json as well
[ https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6633: - Description: it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str str name=dftext/str /lst /initParams {code} was: it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str /lst /initParams {code} let /update/json/docs store the source json as well --- Key: SOLR-6633 URL: https://issues.apache.org/jira/browse/SOLR-6633 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Labels: EaseOfUse Fix For: 5.0, Trunk Attachments: SOLR-6633.patch, SOLR-6633.patch it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str str name=dftext/str /lst /initParams {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well
[ https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217261#comment-14217261 ] Noble Paul commented on SOLR-6633: -- bq.Because, in Elasticsearch, the _src field is actually used as source for several operations.. This feature is not the same. it is a feature of the {{/update/json/docs}} requesthandler . We can't do it like ES because , the same document can be updated using other commands as well bq.Finally, I am not clear on what this really means: all fields go into the 'df' . Solr is strongly typed , so to say. So it means we can't just put the content somewhere for searching. because all components use df as the default search field this component chooses to piggyback on the same field. The user can configure any other field as 'df' here. The next problem we need to address is that of uniqueKey. The component must extract the uniquekey field from the json itself or it should create one. That is the purpose of mapUniqueKeyOnly param We are not trying to be ES here. The use case is this. User has a bunch of json documents. He needs to index the data without configuring anything in the schema. The search result has to return some stored fields. Because Solr is strongly typed we can't store them in individual fields . So we must store the whole thing in some field and it made sense to store it in json itself. let /update/json/docs store the source json as well --- Key: SOLR-6633 URL: https://issues.apache.org/jira/browse/SOLR-6633 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Labels: EaseOfUse Fix For: 5.0, Trunk Attachments: SOLR-6633.patch, SOLR-6633.patch it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str str name=dftext/str /lst /initParams {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217273#comment-14217273 ] ASF subversion and git services commented on LUCENE-6062: - Commit 1640464 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1640464 ] LUCENE-6062: throw exception instead of doing nothing, when sorting/grouping etc on misconfigured field Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6658) SearchHandler should accept POST requests with JSON data in content stream for customized plug-in components
[ https://issues.apache.org/jira/browse/SOLR-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217290#comment-14217290 ] Mark Peng commented on SOLR-6658: - Is there committer could help with the patch of this issue? Or any alternative to solve this issue? It is very crucial for us. Best regards, Mark SearchHandler should accept POST requests with JSON data in content stream for customized plug-in components Key: SOLR-6658 URL: https://issues.apache.org/jira/browse/SOLR-6658 Project: Solr Issue Type: Improvement Components: search, SearchComponents - other Affects Versions: 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9, 4.9.1, 4.10, 4.10.1 Reporter: Mark Peng Attachments: SOLR-6658.patch This issue relates to the following one: *Return HTTP error on POST requests with no Content-Type* [https://issues.apache.org/jira/browse/SOLR-5517] The original consideration of the above is to make sure that incoming POST requests to SearchHandler have corresponding content-type specified. That is quite reasonable, however, the following lines in the patch cause to reject all POST requests with content stream data, which is not necessary to that issue: {code} Index: solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java === --- solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java (revision 1546817) +++ solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java (working copy) @@ -22,9 +22,11 @@ import java.util.List; import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrException.ErrorCode; import org.apache.solr.common.params.CommonParams; import org.apache.solr.common.params.ModifiableSolrParams; import org.apache.solr.common.params.ShardParams; +import org.apache.solr.common.util.ContentStream; import org.apache.solr.core.CloseHook; import org.apache.solr.core.PluginInfo; import org.apache.solr.core.SolrCore; @@ -165,6 +167,10 @@ { // int sleep = req.getParams().getInt(sleep,0); // if (sleep 0) {log.error(SLEEPING for + sleep); Thread.sleep(sleep);} +if (req.getContentStreams() != null req.getContentStreams().iterator().hasNext()) { + throw new SolrException(ErrorCode.BAD_REQUEST, Search requests cannot accept content streams); +} + ResponseBuilder rb = new ResponseBuilder(req, rsp, components); if (rb.requestInfo != null) { rb.requestInfo.setResponseBuilder(rb); {code} We are using Solr 4.5.1 in our production services and considering to upgrade to 4.9/5.0 to support more features. But due to this issue, we cannot have a chance to upgrade because we have some important customized SearchComponent plug-ins that need to get POST data from SearchHandler to do further processing. Therefore, we are requesting if it is possible to remove the content stream constraint shown above and to let SearchHandler accept POST requests with *Content-Type: application/json* to allow further components to get the data. Thank you. Best regards, Mark Peng -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well
[ https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217295#comment-14217295 ] Alexandre Rafalovitch commented on SOLR-6633: - Is this somehow superseding the behavior in SOLR-6304 and http://lucidworks.com/blog/indexing-custom-json-data/ ? I mean the field extraction code can already do ID mapping by specifying an appropriate path, right? And for 'df', would you need to specify it as a param (like in the example 4 in the article)? And I am still trying to wrap my head about the use case. I don't expect users not to want to configure *anything*. At least the dates would need to be parsed/detected. And, usually, after the initial dump, the users go back and start adding specific definitions field by field, type by type (and reindex). Is that part of this scenario as well? P.s. I know Solr cannot clone Elasticsearch. I was just making sure that we are not somehow missing Solr-specifics by assuming Elasticsearch like behavior. Perhaps having the field also called *_all* was what confused me. let /update/json/docs store the source json as well --- Key: SOLR-6633 URL: https://issues.apache.org/jira/browse/SOLR-6633 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Labels: EaseOfUse Fix For: 5.0, Trunk Attachments: SOLR-6633.patch, SOLR-6633.patch it is a common requirement to store the entire JSON as a field in Solr. we can have a extra param srcField=field_name to specify the field name the /update/json/docs is only useful when all the json fields are predefined or in schemaless mode. The better option would be to store the content in a store only field and index the data in another field in other modes the relevant section in solrconfig.xml {code:xml} initParams path=/update/json/docs lst name=defaults !--this ensures that the entire json doc will be stored verbatim into one field-- str name=srcField_src/str !--This means a the uniqueKeyField will be extracted from the fields and all fields go into the 'df' field. In this config df is already configured to be 'text' -- str name=mapUniqueKeyOnlytrue/str str name=dftext/str /lst /initParams {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field
[ https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217307#comment-14217307 ] ASF subversion and git services commented on LUCENE-6064: - Commit 1640469 from [~rcmuir] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1640469 ] LUCENE-6064: throw exception instead of doing nothing, when sorting/grouping etc on misconfigured field throw exception during sort for misconfigured field --- Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, Trunk Attachments: LUCENE-6064.patch If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6064) throw exception during sort for misconfigured field
[ https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-6064. - Resolution: Fixed Fix Version/s: Trunk 5.0 throw exception during sort for misconfigured field --- Key: LUCENE-6064 URL: https://issues.apache.org/jira/browse/LUCENE-6064 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, Trunk Attachments: LUCENE-6064.patch If you sort on field X, and it has no docvalues, today it will silently treat it as all values missing. This can be very confusing since it just means nothing will happen at all. But there is a distinction between no docs happen to have a value for this field and field isn't configured correctly. The latter should get an exception, telling the user to index docvalues, or wrap the reader with UninvertingReader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217309#comment-14217309 ] Robert Muir commented on LUCENE-6062: - I will first go back to 5.x, then see if the test fails in 4.x, and how feasible it is to backport. The code differs significantly here so the problem may have been recently introduced. Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2215 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2215/ 2 tests failed. REGRESSION: org.apache.solr.SampleTest.testSimple Error Message: SolrCore 'collection1' is not available due to init failure: Error instantiating class: 'org.apache.lucene.util.LuceneTestCase$3' Stack Trace: org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Error instantiating class: 'org.apache.lucene.util.LuceneTestCase$3' at __randomizedtesting.SeedInfo.seed([2E6E8F9ADADFEACF:16DDAB64FD2C3E1E]:0) at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:763) at org.apache.solr.util.TestHarness.getCoreInc(TestHarness.java:219) at org.apache.solr.util.TestHarness.update(TestHarness.java:235) at org.apache.solr.util.BaseTestHarness.checkUpdateStatus(BaseTestHarness.java:282) at org.apache.solr.util.BaseTestHarness.validateUpdate(BaseTestHarness.java:252) at org.apache.solr.SolrTestCaseJ4.checkUpdateU(SolrTestCaseJ4.java:677) at org.apache.solr.SolrTestCaseJ4.assertU(SolrTestCaseJ4.java:656) at org.apache.solr.SampleTest.testSimple(SampleTest.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217328#comment-14217328 ] ASF subversion and git services commented on LUCENE-6062: - Commit 1640471 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1640471 ] LUCENE-6062: pass correct fieldinfos to dv producer when the segment has updates Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5317) [PATCH] Concordance capability
[ https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated LUCENE-5317: Attachment: lucene5317v1.patch I merged in my local updates and I pushed these to my fork on github [link|https://github.com/tballison/lucene-solr]. I didn't have luck posting this to the review board. When I tried to post it, I entered the base directory and was returned to the starting page without any error message. [PATCH] Concordance capability -- Key: LUCENE-5317 URL: https://issues.apache.org/jira/browse/LUCENE-5317 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 4.5 Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz, lucene5317v1.patch This patch enables a Lucene-powered concordance search capability. Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By analytic search, I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after. Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases. Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after). Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring. Many thanks to my colleague, Jason Robinson, for input on the design of this patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5317) [PATCH] Concordance capability
[ https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217330#comment-14217330 ] Tim Allison edited comment on LUCENE-5317 at 11/19/14 3:12 AM: --- I merged in my local updates and I pushed these to my fork on github [link|https://github.com/tballison/lucene-solr]. I didn't have luck posting this to the review board. When I tried to post it, I entered the base directory and was returned to the starting page without any error message. For the record, I'm sure that this is user error. was (Author: talli...@mitre.org): I merged in my local updates and I pushed these to my fork on github [link|https://github.com/tballison/lucene-solr]. I didn't have luck posting this to the review board. When I tried to post it, I entered the base directory and was returned to the starting page without any error message. [PATCH] Concordance capability -- Key: LUCENE-5317 URL: https://issues.apache.org/jira/browse/LUCENE-5317 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 4.5 Reporter: Tim Allison Labels: patch Fix For: 4.9 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz, lucene5317v1.patch This patch enables a Lucene-powered concordance search capability. Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By analytic search, I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after. Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases. Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after). Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring. Many thanks to my colleague, Jason Robinson, for input on the design of this patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217335#comment-14217335 ] ASF subversion and git services commented on LUCENE-6062: - Commit 1640472 from [~rcmuir] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1640472 ] LUCENE-6062: pass correct fieldinfos to dv producer when the segment has updates Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.10.3, 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6062) Index corruption from numeric DV updates
[ https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-6062. - Resolution: Fixed Fix Version/s: (was: 4.10.3) The bug affects 4.10.x, but the fix would not be easy. On 5.0 fieldinfos handling has been simplified considerably around here, making it easy to pass the correct ones to producers. I think this is too much risk to backport. Index corruption from numeric DV updates Key: LUCENE-6062 URL: https://issues.apache.org/jira/browse/LUCENE-6062 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6062.patch, LUCENE-6062.patch I hit this while working on on LUCENE-6005: when cutting over TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled additional docValues in the test, and this this: {noformat} There was 1 failure: 1) testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates) java.io.FileNotFoundException: _1_Asserting_0.dvm in dir=RAMDirectory@259847e5 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130) at org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182) at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357) at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51) at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68) at org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63) at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769) {noformat} A one-line change to the existing test (on trunk) causes this corruption: {noformat} Index: lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java === --- lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (revision 1639580) +++ lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java (working copy) @@ -750,6 +750,7 @@ // second segment with no NDV doc = new Document(); doc.add(new StringField(id, doc1, Store.NO)); +doc.add(new NumericDocValuesField(foo, 3)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(id, doc2, Store.NO)); // document that isn't updated {noformat} For some reason, the base doc values for the 2nd segment is not being written, but clearly should have (to hold field foo)... I'm not sure why. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org