[JENKINS] Lucene-Solr-5.x-Windows (32bit/jdk1.8.0_20) - Build # 4333 - Failure!

2014-11-18 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4333/
Java: 32bit/jdk1.8.0_20 -client -XX:+UseSerialGC (asserts: true)

2 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandlerBackup.doTestBackup

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([B331C6C78431C249]:0)


FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandlerBackup

Error Message:
Suite timeout exceeded (= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (= 720 msec).
at __randomizedtesting.SeedInfo.seed([B331C6C78431C249]:0)




Build Log:
[...truncated 11625 lines...]
   [junit4] Suite: org.apache.solr.handler.TestReplicationHandlerBackup
   [junit4]   2 Creating dataDir: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\init-core-data-001
   [junit4]   2 2210851 T5379 oas.SolrTestCaseJ4.setUp ###Starting doTestBackup
   [junit4]   2 2210870 T5379 oejs.Server.doStart jetty-8.1.10.v20130312
   [junit4]   2 2210878 T5379 oejs.AbstractConnector.doStart Started 
SelectChannelConnector@127.0.0.1:62247
   [junit4]   2 2210879 T5379 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
   [junit4]   2 2210879 T5379 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
   [junit4]   2 2210879 T5379 oasc.SolrResourceLoader.locateSolrHome using 
system property solr.solr.home: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001
   [junit4]   2 2210879 T5379 oasc.SolrResourceLoader.init new 
SolrResourceLoader for directory: 
'C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\'
   [junit4]   2 2210908 T5379 oasc.ConfigSolr.fromFile Loading container 
configuration from 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\solr.xml
   [junit4]   2 2210926 T5379 oasc.CoreContainer.init New CoreContainer 
22745234
   [junit4]   2 2210927 T5379 oasc.CoreContainer.load Loading cores into 
CoreContainer 
[instanceDir=C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\]
   [junit4]   2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting socketTimeout to: 9
   [junit4]   2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting urlScheme to: 
   [junit4]   2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting connTimeout to: 15000
   [junit4]   2 2210929 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting maxConnectionsPerHost to: 20
   [junit4]   2 2210930 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting maxConnections to: 1
   [junit4]   2 2210931 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting corePoolSize to: 0
   [junit4]   2 2210931 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting maximumPoolSize to: 2147483647
   [junit4]   2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting maxThreadIdleTime to: 5
   [junit4]   2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting sizeOfQueue to: -1
   [junit4]   2 2210932 T5379 oashc.HttpShardHandlerFactory.getParameter 
Setting fairnessPolicy to: false
   [junit4]   2 2210932 T5379 oasu.UpdateShardHandler.init Creating 
UpdateShardHandler HTTP client with params: 
socketTimeout=34connTimeout=45000retry=false
   [junit4]   2 2210933 T5379 oasl.LogWatcher.createWatcher SLF4J impl is 
org.slf4j.impl.Log4jLoggerFactory
   [junit4]   2 2210933 T5379 oasl.LogWatcher.newRegisteredLogWatcher 
Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
   [junit4]   2 2210934 T5379 oasc.CoreContainer.load Host Name: 127.0.0.1
   [junit4]   2 2210938 T5391 oasc.SolrResourceLoader.init new 
SolrResourceLoader for directory: 
'C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\collection1\'
   [junit4]   2 2210971 T5391 oasc.SolrConfig.init Using Lucene 
MatchVersion: 5.0.0
   [junit4]   2 2210983 T5391 oasc.SolrConfig.init Loaded SolrConfig: 
solrconfig.xml
   [junit4]   2 2210983 T5391 oass.IndexSchema.readSchema Reading Solr Schema 
from 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.TestReplicationHandlerBackup-B331C6C78431C249-001\solr-instance-001\collection1\conf\schema.xml
   

[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 686 - Still Failing

2014-11-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/686/

3 tests failed.
REGRESSION:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch

Error Message:
The Monkey ran for over 20 seconds and no jetties were stopped - this is worth 
investigating!

Stack Trace:
java.lang.AssertionError: The Monkey ran for over 20 seconds and no jetties 
were stopped - this is worth investigating!
at 
__randomizedtesting.SeedInfo.seed([C3742817002DCD1:8DD1CC99075DBCED]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:535)
at 
org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:140)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (SOLR-1387) Add more search options for filtering field facets.

2014-11-18 Thread Tom Winch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216072#comment-14216072
 ] 

Tom Winch commented on SOLR-1387:
-

As the name suggests, CharacterUtils works on a char[] whereas we have a 
BytesRef (essentially a byte[]). But I think CharacterUtils.toLowerCase() is 
doing essentially the same as I'm doing in StringHelper.contains() in that it 
converts using Unicode case mapping information (via 
Character.toLowerCase(int)).

Yes, sadly making ignoreCase more general would spoil the efficiency of 
facet.prefix so I thought safest to leave as a sub-parameter of facet.contains, 
which spoils that efficiency already.

 Add more search options for filtering field facets.
 ---

 Key: SOLR-1387
 URL: https://issues.apache.org/jira/browse/SOLR-1387
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Anil Khadka
Assignee: Alan Woodward
 Fix For: 4.9, Trunk

 Attachments: SOLR-1387.patch


 Currently for filtering the facets, we have to use prefix (which use 
 String.startsWith() in java). 
 We can add some parameters like
 * facet.iPrefix : this would act like case-insensitive search. (or ---  
 facet.prefix=afacet.caseinsense=on)
 * facet.regex : this is pure regular expression search (which obviously would 
 be expensive if issued).
 Moreover, allowing multiple filtering for same field would be great like
 facet.prefix=a OR facet.prefix=A ... sth like this.
 All above concepts could be equally applicable to TermsComponent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Per Steffensen

On 17/11/14 18:47, Toke Eskildsen wrote:

Per Steffensen [st...@liace.dk] wrote:
I understand that the request is for rows * #shards IDs+score in 
total, but if you have presented your alternative, I have failed to 
see that.
I deliberately did not present the solution we did, in order for you 
guys not to focus on whether or not this particular solution to the 
problem already has been implemented after 4.4.0 (the version of Apache 
Solr we currently base our version of Solr on). Guess the problem can 
be solved in numerous ways, so just wanted you to focus on whether or 
not it has been solved in some way (do not care which way)

  Your third factoid: A high number of hits/shard, suggests that there is a 
possibility of all the final top-1000 hits to originate from a single shard.
Im not sure what you are aiming at with this comment. But I can say that 
it is very very unlikely that the overall-top-1000 all originate from a 
single shard. It is likely (since we are not routing on anything that 
has to do with the content text-field) that the overall-top-1000 i 
fairly evenly distributed among the 1000 shards

I was about to suggest collapsing to 2 or 3 months/shard, but that would be 
ruining a logistically nice setup.
Yes, we are also considering options in that area, but we really would 
like not to have to go this way


There are many additional reasons (besides the ones I mentioned in my 
previous mail). E.g. we are (maybe) about to introduce a bloom-filter on 
shard-level, which will help us reduce performance on indexing 
significantly. Bloom-filter will help quickly say document with this 
particular id does definitely not exist when doing optimistic locking 
(including version-lookup). First-iteration tests has shown that it can 
reduce the resources/time spent on indexing by up to 80%. Bloom-filter 
data does not merge very well.
5-50 billion records/server? That seems very high, but after hearing 
about many different Solr setups at Lucene/Solr Revolution, I try to 
adopt a sounds insane, but it's probably correct-mindset.
We are not in the business of ms-response-times of thousands of searches 
per sec/min. We can accept response-times measured in secs, and there 
not performed thousands of searches per minute. We are in the business 
of being able to index enormous amounts of data per second though. But 
this issue is about searches - we really do not like 10-30-60 min 
response-times on searches that ought to run much faster.
Anyway, setup accepted, problem acknowledged, your possibly re-usable 
solution not understood.

What we did in our solution is the following

Introduced the concept of distributed query algorithm controlled by 
request-param dqa. We are naming the existing (default) 
query-algorithm (not knowing about SOLR-5768) 
find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce 
an new alternative distributed query algorithm called 
find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias 
frfilrfbi :-) )

* find-id-relevance_fetch-by-ids does as always
** Find (by query) id and score (score is the measurement for relevance) 
for the top-X (1000 in my example) documents on each shard
** Sort out the ids of the overall-top-X and group them by shard. ids(S) 
is the set of ids among the overall-top-X that live on shard S
** For each shard S fetch by ids in ids(S) the full documents (or 
whatever is pointed out by fl-parameter)
* find-relevance_find-ids-limited-rows_fetch-by-ids does it in a 
different way
** Find (by query) score (score is the measurement for relevance) for 
the top-X (1000 in my example) documents on each shard
** Sort out how many documents count(S) of the overall-top-X documents 
that live on each individual shard S
** For each shard S fetch (by query) the ids (ids(S)) for the count(S) 
most relevant documents
** For each shard S fetch by ids in ids(S) the full documents (or 
whatever is pointed out by fl-parameter)
Since find score only (step 1 of 
find-relevance_find-ids-limited-rows_fetch-by-ids) actually does not 
have to go into the store to fetch anything (id not needed), it can be 
optimized to perform much much better than step 1 in 
find-id-relevance_fetch-by-ids (id needed). I step 3 of 
find-relevance_find-ids-limited-rows_fetch-by-ids, when you have to go 
to store, we are not asking for 1000 docs per shard, but only the number 
of documents among the overall-top-1000 documents that live on this 
particular shard. This way we go from potentially visiting the store for 
1 mio docs across the cluster, to never visiting the store for more than 
1000 docs across the cluster. In our particular test-setup (which 
simulates our production environment pretty well) it has given us an 
total response-time reduction of a factor 60


I believe SOLR-5768 (without having looked at it yet) has made the 
existing distributed query algorithm (what we call 
find-id-relevance_fetch-by-ids) do the following when sending 
distrib.singlePass paramter

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather commented on LUCENE-5205:


I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field:(SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the 

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather edited comment on LUCENE-5205 at 11/18/14 11:59 AM:
---

I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.


was (Author: modassar):
I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field:(SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and 

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather edited comment on LUCENE-5205 at 11/18/14 12:01 PM:
---

I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.
NOTE: A space between the field: and query is added to avoid transformation to 
smileys.


was (Author: modassar):
I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: 

[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2214 - Still Failing

2014-11-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2214/

1 tests failed.
FAILED:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability

Error Message:
No live SolrServers available to handle this request

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
at 
__randomizedtesting.SeedInfo.seed([81E2D093E929F1C3:402A0DD5484F206A]:0)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:539)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
 

[jira] [Created] (SOLR-6754) ZkController.publish doesn't use the updateLastState parameter

2014-11-18 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-6754:
---

 Summary: ZkController.publish doesn't use the updateLastState 
parameter
 Key: SOLR-6754
 URL: https://issues.apache.org/jira/browse/SOLR-6754
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 5.0, Trunk


One of ZkController's overloaded publish method has the following:
{code}
public void publish(final CoreDescriptor cd, final String state, boolean 
updateLastState) throws KeeperException, InterruptedException {
publish(cd, state, true, false);
  }
{code}

Regardless of the updateLastState argument, the method calls publish with 
updateLastState set to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6754) ZkController.publish doesn't use the updateLastState parameter

2014-11-18 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6754:

Attachment: SOLR-6754.patch

Trivial patch to use the method argument is attached.

 ZkController.publish doesn't use the updateLastState parameter
 --

 Key: SOLR-6754
 URL: https://issues.apache.org/jira/browse/SOLR-6754
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6754.patch


 One of ZkController's overloaded publish method has the following:
 {code}
 public void publish(final CoreDescriptor cd, final String state, boolean 
 updateLastState) throws KeeperException, InterruptedException {
 publish(cd, state, true, false);
   }
 {code}
 Regardless of the updateLastState argument, the method calls publish with 
 updateLastState set to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Ferenczi, Jim | EURHQ
  Your third factoid: A high number of hits/shard, suggests that there is a 
  possibility of all the final top-1000 hits to originate from a single shard.
In fact if you ask for 1000 hits in a distributed SolrCloud, each shard has to 
retrieve 1000 hits to get the unique key of each match and send it back to the 
shard responsible for the merge. This means that even if your data is fairly 
distributed among the 1000 shards, they still have to decompress 1000 documents 
during the first phase of the search. There are ways to avoid this, for 
instance you can check this JIRA where the idea is discussed:
https://issues.apache.org/jira/browse/SOLR-5478
Bottom line is that if you have 1000 shards the GET_FIELDS stage should be fast 
(if your data is fairly distributed) but the GET_TOP_IDS is not. You could 
avoid a lot of decompression/reads by using the field cache to retrieve the 
unique key in the first stage.

Cheers,
Jim




From: Per Steffensen st...@liace.dk
Sent: Tuesday, November 18, 2014 19:26
To: dev@lucene.apache.org
Subject: Re: Slow searching limited but high rows across many shards all with 
high hits

On 17/11/14 18:47, Toke Eskildsen wrote:
 Per Steffensen [st...@liace.dk] wrote:
 I understand that the request is for rows * #shards IDs+score in
 total, but if you have presented your alternative, I have failed to
 see that.
I deliberately did not present the solution we did, in order for you
guys not to focus on whether or not this particular solution to the
problem already has been implemented after 4.4.0 (the version of Apache
Solr we currently base our version of Solr on). Guess the problem can
be solved in numerous ways, so just wanted you to focus on whether or
not it has been solved in some way (do not care which way)
   Your third factoid: A high number of hits/shard, suggests that there is a 
 possibility of all the final top-1000 hits to originate from a single shard.
Im not sure what you are aiming at with this comment. But I can say that
it is very very unlikely that the overall-top-1000 all originate from a
single shard. It is likely (since we are not routing on anything that
has to do with the content text-field) that the overall-top-1000 i
fairly evenly distributed among the 1000 shards
 I was about to suggest collapsing to 2 or 3 months/shard, but that would be 
 ruining a logistically nice setup.
Yes, we are also considering options in that area, but we really would
like not to have to go this way

There are many additional reasons (besides the ones I mentioned in my
previous mail). E.g. we are (maybe) about to introduce a bloom-filter on
shard-level, which will help us reduce performance on indexing
significantly. Bloom-filter will help quickly say document with this
particular id does definitely not exist when doing optimistic locking
(including version-lookup). First-iteration tests has shown that it can
reduce the resources/time spent on indexing by up to 80%. Bloom-filter
data does not merge very well.
 5-50 billion records/server? That seems very high, but after hearing
 about many different Solr setups at Lucene/Solr Revolution, I try to
 adopt a sounds insane, but it's probably correct-mindset.
We are not in the business of ms-response-times of thousands of searches
per sec/min. We can accept response-times measured in secs, and there
not performed thousands of searches per minute. We are in the business
of being able to index enormous amounts of data per second though. But
this issue is about searches - we really do not like 10-30-60 min
response-times on searches that ought to run much faster.
 Anyway, setup accepted, problem acknowledged, your possibly re-usable
 solution not understood.
What we did in our solution is the following

Introduced the concept of distributed query algorithm controlled by
request-param dqa. We are naming the existing (default)
query-algorithm (not knowing about SOLR-5768)
find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce
an new alternative distributed query algorithm called
find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias
frfilrfbi :-) )
* find-id-relevance_fetch-by-ids does as always
** Find (by query) id and score (score is the measurement for relevance)
for the top-X (1000 in my example) documents on each shard
** Sort out the ids of the overall-top-X and group them by shard. ids(S)
is the set of ids among the overall-top-X that live on shard S
** For each shard S fetch by ids in ids(S) the full documents (or
whatever is pointed out by fl-parameter)
* find-relevance_find-ids-limited-rows_fetch-by-ids does it in a
different way
** Find (by query) score (score is the measurement for relevance) for
the top-X (1000 in my example) documents on each shard
** Sort out how many documents count(S) of the overall-top-X documents
that live on each individual shard S
** For each shard S fetch (by query) the ids (ids(S)) for the count(S)
most relevant documents
** For each 

[jira] [Updated] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query

2014-11-18 Thread Manuel Lenormand (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel Lenormand updated SOLR-5611:
---
Attachment: lec5-distributedIndexing.pdf

The equation is on the 10th slide. 

Need to write an approximation for this or calculating offline for main values 
and making a 3d map out of it (#shards, rows, confidence level) that outputs 
shards.rows for each request

 When documents are uniformly distributed over shards, enable returning 
 approximated results in distributed query
 

 Key: SOLR-5611
 URL: https://issues.apache.org/jira/browse/SOLR-5611
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Isaac Hebsh
Priority: Minor
  Labels: distributed_search, shard, solrcloud
 Fix For: 4.9, Trunk

 Attachments: lec5-distributedIndexing.pdf


 Query with rows=1000, which sent to a collection of 100 shards (shard key 
 behaviour is default - based on hash of the unique key), will generate 100 
 requests of rows=1000, on each shard.
 This results to total number of rows*numShards unique keys to be retrieved. 
 This behaviour is getting worst as numShards grows.
 If the documents are uniformly distributed over the shards, the expected 
 number of document should be ~ rows/numShards. Obviously, there might be 
 extreme cases, when all of the top X documents are in a specific shard.
 I suggest adding an optional parameter, say approxResults=true, which decides 
 whether we should limit the rows in the shard requests to rows/numShardsor 
 not. Moreover, we can add a numeric parameter which increases the limit, to 
 be more accurate.
 For example, the query {{approxResults=trueapproxResults.factor=1.5}} will 
 retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and 
 rows=1000, each shard will return 15 documents.
 Furthermore, this can reduce the problem of deep paging, because the same 
 thing can be applied there. when requested start=10, Solr creating shard 
 request with start=0 and rows=START+ROWS. In the approximated approach, start 
 parameter (in the shard requests) can be set to 10/numShards. The idea of 
 the approxResults.factor creates some difficulties here, though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query

2014-11-18 Thread Manuel Lenormand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216167#comment-14216167
 ] 

Manuel Lenormand edited comment on SOLR-5611 at 11/18/14 1:32 PM:
--

The equation is on the 20th slide. 

Need to write an approximation for this or calculating offline for main values 
and making a 3d map out of it (#shards, rows, confidence level) that outputs 
shards.rows for each request


was (Author: manuel lenormand):
The equation is on the 10th slide. 

Need to write an approximation for this or calculating offline for main values 
and making a 3d map out of it (#shards, rows, confidence level) that outputs 
shards.rows for each request

 When documents are uniformly distributed over shards, enable returning 
 approximated results in distributed query
 

 Key: SOLR-5611
 URL: https://issues.apache.org/jira/browse/SOLR-5611
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Isaac Hebsh
Priority: Minor
  Labels: distributed_search, shard, solrcloud
 Fix For: 4.9, Trunk

 Attachments: lec5-distributedIndexing.pdf


 Query with rows=1000, which sent to a collection of 100 shards (shard key 
 behaviour is default - based on hash of the unique key), will generate 100 
 requests of rows=1000, on each shard.
 This results to total number of rows*numShards unique keys to be retrieved. 
 This behaviour is getting worst as numShards grows.
 If the documents are uniformly distributed over the shards, the expected 
 number of document should be ~ rows/numShards. Obviously, there might be 
 extreme cases, when all of the top X documents are in a specific shard.
 I suggest adding an optional parameter, say approxResults=true, which decides 
 whether we should limit the rows in the shard requests to rows/numShardsor 
 not. Moreover, we can add a numeric parameter which increases the limit, to 
 be more accurate.
 For example, the query {{approxResults=trueapproxResults.factor=1.5}} will 
 retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and 
 rows=1000, each shard will return 15 documents.
 Furthermore, this can reduce the problem of deep paging, because the same 
 thing can be applied there. when requested start=10, Solr creating shard 
 request with start=0 and rows=START+ROWS. In the approximated approach, start 
 parameter (in the shard requests) can be set to 10/numShards. The idea of 
 the approxResults.factor creates some difficulties here, though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Per Steffensen

On 18/11/14 14:24, Ferenczi, Jim | EURHQ wrote:

   Your third factoid: A high number of hits/shard, suggests that there is a 
possibility of all the final top-1000 hits to originate from a single shard.
In fact if you ask for 1000 hits in a distributed SolrCloud, each shard has to 
retrieve 1000 hits to get the unique key of each match and send it back to the 
shard responsible for the merge.
Yes, at least if each shard has 1000 hits. It is when each shard has a 
lot of actual hits this issue becomes a problem

This means that even if your data is fairly distributed among the 1000 shards, 
they still have to decompress 1000 documents during the first phase of the 
search.

Eactly!

There are ways to avoid this, for instance you can check this JIRA where the 
idea is discussed:
https://issues.apache.org/jira/browse/SOLR-5478
Guess our solution (described in my previous mail) is kinda an 
alternative solution to SOLR-5478

Bottom line is that if you have 1000 shards the GET_FIELDS stage should be fast 
(if your data is fairly distributed) but the GET_TOP_IDS is not.

Exactly!

You could avoid a lot of decompression/reads by using the field cache to 
retrieve the unique key in the first stage.
We have so much data and relatively little RAM, so we cannot use 
field-cache, because it requires an amount of memory linearly dependent 
on the number of docs in store. We can never fulfill this requirement. 
Doc-values is a valid approach for us, but currently our id-field is 
unfortunately not doc-value - at it is not easy for us to just re-index 
all documents with id as doc-value. Besides that, our solution is 
diagonal on a field-cache/doc-values solution in the way that one does 
not prevent the other, and if you do one of them you will still be able 
to benefit from doing the other one.


Cheers,
Jim

Thanks, Jim


Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Toke Eskildsen
On Tue, 2014-11-18 at 11:26 +0100, Per Steffensen wrote:
 It is likely (since we are not routing on anything that has to do with
 the content text-field) that the overall-top-1000 i fairly evenly
 distributed among the 1000 shards

Streaming in Heliosearch might work out of the box:
http://heliosearch.org/streaming-aggregation-for-solrcloud/#CloudSolrStream
Caveat: I haven't used streaming, so I can't say for sure and don't know
how/if it handles early termination, which would be a prerequisite for
speedup in your setup.

[Detailed description of solution]

[SOLR-5798]

 Hope you get the idea, and why it makes us perform much much better?!

Yes, I got it. We discussed it a bit at the office and it seems like a
really fine idea, new to Solr. As Solr is often used for log processing
these days, the number of setups with many shards and non-trivial
request sizes is growing: Your solution would help others.

The obvious next step would be a JIRA. However, I know that you have had
very limited success there, even for simple patches. 

General JIRA-handling might be a relevant topic for another thread, but
I don't have the energy for that discussion right now.


Of course, the concrete speed-up factor is highly dependent on how long
it takes to resolve IDs. You state speeds of 10, 30, 60 minutes without
the patch and a factor 60 speedup. As I understand it, the real
difference is whether ~1000*#shards IDs are resolved or only 1000.
With 50 shards or 50.000 ID-lookups per machine, that puts your worst
case resolve-time at 50.000 IDs / (60 min * 60 s/min) ~= 13 IDs/s and
the best case (10 min total) at ~83 IDs/s per machine.

(guessing spinning drives here)

With a setup with faster ID-resolving, the benefits from your patch
might be too small for top-1000 to be really interesting as ID-resolving
would not take up as much of the overall processing time. But it would
make it possible to scaling that number up (top-1 or above).

- Toke Eskildsen, State and University Library, Denmark



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Manuel Le Normand
Limiting my answer to the #shards*rows issue, you can have a look at
https://issues.apache.org/jira/browse/SOLR-5611.

The odds all top docs are in the same shard in a uniformly distributed
index are negligible, so you can use it to request much fewer docs per
shard.
There's a nice discrete equation that gives you the shards.rows you should
request, depending on the #shards, rows, and confidence level that all the
top-rows are returned (confidence=0.95 would mean 95% of the responses will
contain the exact top rows as if rows and shards.rows were equal).

Our use case: 36 shards, 2000 rows, conf=99% -- shards.rows=49 which gave
a good performance boost.

On Tue, Nov 18, 2014 at 3:24 PM, Ferenczi, Jim | EURHQ 
jim.feren...@mail.rakuten.com wrote:

   Your third factoid: A high number of hits/shard, suggests that there is
 a possibility of all the final top-1000 hits to originate from a single
 shard.
 In fact if you ask for 1000 hits in a distributed SolrCloud, each shard
 has to retrieve 1000 hits to get the unique key of each match and send it
 back to the shard responsible for the merge. This means that even if your
 data is fairly distributed among the 1000 shards, they still have to
 decompress 1000 documents during the first phase of the search. There are
 ways to avoid this, for instance you can check this JIRA where the idea is
 discussed:
 https://issues.apache.org/jira/browse/SOLR-5478
 Bottom line is that if you have 1000 shards the GET_FIELDS stage should be
 fast (if your data is fairly distributed) but the GET_TOP_IDS is not. You
 could avoid a lot of decompression/reads by using the field cache to
 retrieve the unique key in the first stage.

 Cheers,
 Jim



 
 From: Per Steffensen st...@liace.dk
 Sent: Tuesday, November 18, 2014 19:26
 To: dev@lucene.apache.org
 Subject: Re: Slow searching limited but high rows across many shards all
 with high hits

 On 17/11/14 18:47, Toke Eskildsen wrote:
  Per Steffensen [st...@liace.dk] wrote:
  I understand that the request is for rows * #shards IDs+score in
  total, but if you have presented your alternative, I have failed to
  see that.
 I deliberately did not present the solution we did, in order for you
 guys not to focus on whether or not this particular solution to the
 problem already has been implemented after 4.4.0 (the version of Apache
 Solr we currently base our version of Solr on). Guess the problem can
 be solved in numerous ways, so just wanted you to focus on whether or
 not it has been solved in some way (do not care which way)
Your third factoid: A high number of hits/shard, suggests that there
 is a possibility of all the final top-1000 hits to originate from a single
 shard.
 Im not sure what you are aiming at with this comment. But I can say that
 it is very very unlikely that the overall-top-1000 all originate from a
 single shard. It is likely (since we are not routing on anything that
 has to do with the content text-field) that the overall-top-1000 i
 fairly evenly distributed among the 1000 shards
  I was about to suggest collapsing to 2 or 3 months/shard, but that would
 be ruining a logistically nice setup.
 Yes, we are also considering options in that area, but we really would
 like not to have to go this way

 There are many additional reasons (besides the ones I mentioned in my
 previous mail). E.g. we are (maybe) about to introduce a bloom-filter on
 shard-level, which will help us reduce performance on indexing
 significantly. Bloom-filter will help quickly say document with this
 particular id does definitely not exist when doing optimistic locking
 (including version-lookup). First-iteration tests has shown that it can
 reduce the resources/time spent on indexing by up to 80%. Bloom-filter
 data does not merge very well.
  5-50 billion records/server? That seems very high, but after hearing
  about many different Solr setups at Lucene/Solr Revolution, I try to
  adopt a sounds insane, but it's probably correct-mindset.
 We are not in the business of ms-response-times of thousands of searches
 per sec/min. We can accept response-times measured in secs, and there
 not performed thousands of searches per minute. We are in the business
 of being able to index enormous amounts of data per second though. But
 this issue is about searches - we really do not like 10-30-60 min
 response-times on searches that ought to run much faster.
  Anyway, setup accepted, problem acknowledged, your possibly re-usable
  solution not understood.
 What we did in our solution is the following

 Introduced the concept of distributed query algorithm controlled by
 request-param dqa. We are naming the existing (default)
 query-algorithm (not knowing about SOLR-5768)
 find-id-relevance_fetch-by-ids (short-alias firfbi) and we introduce
 an new alternative distributed query algorithm called
 find-relevance_find-ids-limited-rows_fetch-by-ids (short-alias
 frfilrfbi :-) )
 * 

Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Per Steffensen

On 18/11/14 14:49, Manuel Le Normand wrote:

Limiting my answer to the #shards*rows issue, you can have a look at
https://issues.apache.org/jira/browse/SOLR-5611.
Thanks for pointing out SOLR-5611! I did not know about it, and 
knowledge about work and ideas in this area was what I wanted to achieve 
by this mail to the mailing-list. Havnt dived much into SOLR-5611, but 
it seems to me that it will allow you to ask each shard for less than 
1000 (if 1000 is the number in the outer super-request) rows in the 
get-id-score-sub-requests stage - ask each shard for 1000/#shards (maybe 
more?)?


Our solution (as specified in a previous mail) will also ask each shard 
for less than 1000 rows in the get-id-score-sub-request stage. The 
difference is that it will start out calculating the exact rows-value to 
use for each shard, issuing very in-expensive score-only sub-requests. I 
think this approach is at least as nice, if not nicer, as the SOLR-5611 
approach.


Regards, Steff


Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Per Steffensen

On 18/11/14 14:41, Toke Eskildsen wrote:
Your solution would help others. 

I agree, and that, of course, would be great
The obvious next step would be a JIRA. However, I know that you have 
had very limited success there, even for simple patches.

No shit :-) But hopefully better success in the future!
One of the goals with this mailing-thread was, besides getting a feeling 
of whether or not this issue has already been fixed, to get an idea if 
the community was interested in getting (and helping shape) the solution.
I can create a JIRA for sure, but I really do not want to merge our 
solution to branch_5x, trunk or whatever, and take the long discussions, 
if it is leading nowhere.
General JIRA-handling might be a relevant topic for another thread, 
but I don't have the energy for that discussion right now.

Me neither
Of course, the concrete speed-up factor is highly dependent on how 
long it takes to resolve IDs.
Yes! I cannot guarantee a speed-up factor of 60 or something. I just 
tried to state what we have seen, on our concrete setup, with our amount 
of data, with our data-distribution on our hardware. There are lots of 
factors, for sure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Toke Eskildsen
On Tue, 2014-11-18 at 14:42 +0100, Per Steffensen wrote:

 Doc-values is a valid approach for us, but currently our id-field is
 unfortunately not doc-value - at it is not easy for us to just
 re-index all documents with id as doc-value.

At Lucene/Solr Revolution 2014 I presented that exact problem at Stump
the Chump. After stalling for a minute with obligatory derogatory
comments on our project design and a just at obligatory car analogy, he
pointed me in the direction of a filtered index reader and asked me to
code it and make it Open Source.

Thomas Egense and I plan to take a crack at it one of these days. If the
field is stored, it should be possible to make it DocValued by
optimizing the index with a custom reader.

 Besides that, our solution is diagonal on a field-cache/doc-values
 solution in the way that one does not prevent the other, and if you do
 one of them you will still be able to benefit from doing the other
 one.

I noticed that. Multiplying solutions are awesome.

- Toke Eskildsen, State and University Library, Denmark



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Per Steffensen

On 18/11/14 15:52, Toke Eskildsen wrote:

Thomas Egense and I plan to take a crack at it one of these days. If the
field is stored, it should be possible to make it DocValued by
optimizing the index with a custom reader.
Yes, it ought to be, but AFAIK it currently is not. Looking forward to 
see the results of your work! Say hi to Thomas, BTW



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216267#comment-14216267
 ] 

Tim Allison commented on LUCENE-5205:
-

Thank you for raising this, [~modassar].  The challenge is that the parser can 
use both  and ' to mark the beginnings and endings of SpanNear.  As an initial 
hack, I was hoping that users would backslash single quotes within phrases, but 
that puts too much burden on users.  I'll see if I can add a bit more smarts so 
that if the parser knows that it is in a  phrase, it will ignore ' and vice 
versa.  Are you using the my github standalone jars?  Or, how are you using 
this?

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6625) HttpClient callback in HttpSolrServer

2014-11-18 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216313#comment-14216313
 ] 

Per Steffensen commented on SOLR-6625:
--

bq.  One thing I wanted to avoid with this patch is putting authentication-type 
specific details in HttpSolrServer. SOLR-4470 has a little logic there that is 
basic-auth specific

Actually SOLR-4470 aims at introducing a framework for any authentication-type, 
and then (for now) implement basic-auth using this framework. It is prepared 
for adding new authentication types. See {{AuthCredentials}} carrying any kind 
of {{AbstractAuthMethod}} - currently only 
{{AbstractAuthMethod}}-implementation is {{BasicHttpAuth}}. Adding a new 
authentication type should basically be about adding a new 
{{AbstractAuthMethod}}-implementation. But sorry, I do not remember to many 
details. But what I do know, is that we have been using SOLR-4470 solution now 
in production for a long time, without any problems at all.

bq. As for the suggestion of using a BufferedHttpEntity rather than the OPTIONS 
approach I describe above, that certainly may be an improvement.

I do not know if it is an improvement compared to your approach. I just 
implemented in a way that worked. Supporting non-preemptive authenticating 
POST-requests was not the main focus of SOLR-4470, so I just quickly did it in 
the way that I found it could be done - without considering performance or 
anything

 HttpClient callback in HttpSolrServer
 -

 Key: SOLR-6625
 URL: https://issues.apache.org/jira/browse/SOLR-6625
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Attachments: SOLR-6625.patch, SOLR-6625.patch


 Some of our setups use Solr in a SPNego/kerberos setup (we've done this by 
 adding our own filters to the web.xml).  We have an issue in that SPNego 
 requires a negotiation step, but some HttpSolrServer requests are not 
 repeatable, notably the PUT/POST requests.  So, what happens is, 
 HttpSolrServer sends the requests, the server responds with a negotiation 
 request, and the request fails because the request is not repeatable.  We've 
 modified our code to send a repeatable request beforehand in these cases.
 It would be nicer if HttpSolrServer provided a pre/post callback when it was 
 making an httpclient request.  This would allow administrators to make 
 changes to the request for authentication purposes, and would allow users to 
 make per-request changes to the httpclient calls (i.e. modify httpclient 
 requestconfig to modify the timeout on a per-request basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216353#comment-14216353
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks [~talli...@apache.org] for your response. I am using it from lucene5205 
branch(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene5205/) 
integrated as patch to latest Lucene core jar.


 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-5861) CachingTokenFilter should use ArrayList not LinkedList

2014-11-18 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-5861.

   Resolution: Duplicate
Fix Version/s: 5.0

 CachingTokenFilter should use ArrayList not LinkedList
 --

 Key: LUCENE-5861
 URL: https://issues.apache.org/jira/browse/LUCENE-5861
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 5.0


 CachingTokenFilter, to my surprise, puts each new AttributeSource.State onto 
 a LinkedList.  I think it should be an ArrayList.  On large fields that get 
 analyzed, there can be a ton of State objects to cache.
 I also observe that State is itself a linked list of other State objects.  
 Perhaps we could take this one step further and do parallel arrays of 
 AttributeImpl, thereby bypassing State.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues

2014-11-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216393#comment-14216393
 ] 

Yonik Seeley commented on SOLR-5944:


I was just chatting with Shalin while we were both at ApacheCon.  In addition 
to leader-replica reordering issues,
we also need to handle realtime-get in the single-node case.  The way to do 
this is just add the update to the tlog like normal (with some indication that 
it's a partial update and doesn't contain all the fields).   When /get is 
invoked and we find an update from the in-memory tlog map for that document, we 
need to go through the same logic as a soft commit (open a new 
realtime-searcher and clear the tlog map), and then use the realtime-searcher 
to get the latest document.

Oh, and _version_ will need to use DocValues so it can be updated at the same 
time of course.

 Support updates of numeric DocValues
 

 Key: SOLR-5944
 URL: https://issues.apache.org/jira/browse/SOLR-5944
 Project: Solr
  Issue Type: New Feature
Reporter: Ishan Chattopadhyaya
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
 SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
 SOLR-5944.patch


 LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
 really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216415#comment-14216415
 ] 

Tim Allison commented on LUCENE-5205:
-

The permanent hang is surprising.  When I isolate the singlequote regex, I get 
a permanent hang in Java, but not Perl.

{noformat}
  String s = SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD;
  Matcher m = Pattern.compile('((?:''|[^']+)+)').matcher(s);
  while (m.find()) {
  System.out.println(m.start());
  }
  System.out.println(done);
{noformat}

{noformat}
my $s = SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD;

while ($s =~/'((?:''|[^']+)+)'/g) {
print here\n;
}

print done\n;
{noformat}

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-6062:

Attachment: LUCENE-6062.patch

Updated patch with a dedicated test (keeping the old one as it was). The logic 
is the same as the first patch, but i tried to clarify better what is going on. 
Additionally, i removed some extraneous parameters in some of the related 
methods.

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6752) Buffer Cache allocate/lost should be exposed through JMX

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-6752:
-

Assignee: Mark Miller

 Buffer Cache allocate/lost should be exposed through JMX
 

 Key: SOLR-6752
 URL: https://issues.apache.org/jira/browse/SOLR-6752
 Project: Solr
  Issue Type: Bug
Reporter: Mike Drob
Assignee: Mark Miller
  Labels: metrics
 Attachments: SOLR-6752.patch


 Currently, {{o.a.s.store.blockcache.Metrics}} has fields for tracking buffer 
 allocations and losses, but they are never updated nor exposed to a receiving 
 metrics system. We should do both. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6750) Solr adds RequestHandler SolrInfoMBeans twice to the JMX server.

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-6750.
---
Resolution: Duplicate

 Solr adds RequestHandler SolrInfoMBeans twice to the JMX server.
 

 Key: SOLR-6750
 URL: https://issues.apache.org/jira/browse/SOLR-6750
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, Trunk


 I think we want to stop doing this for 5.
 It should be really cheap to enumerate and get stats for all of the 
 SolrInfoMBeans, but between this and SOLR-6747, you will overall call 
 getStatistics far too much.
 They are added twice because all request handlers are added using their path 
 as the key, and then whatever the SolrResourceLoader has created is added 
 using the default getName (the full class name) as the key.
 I think we should start only allowing an object to appear once in the bean 
 map in 5.0. The way the code currently works, the replication handler objects 
 would take precedence, which seems right to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216442#comment-14216442
 ] 

Mark Miller commented on SOLR-3774:
---

I duplicated this issue with SOLR-6750.

The way I solved is to not let SolrResourceLoader.inform add any of the same 
objects that already exist with a simple check of 
!infoRegistry.containsValue(bean). I think it might be a better check than 
relying on names because we don't really ever want to add the same object twice 
- especially considering SOLR-6586.

{code}
for (SolrInfoMBean bean : arr) {
  if (!infoRegistry.containsValue(bean)) {
try {
  infoRegistry.put(bean.getName(), bean);
} catch (Exception e) {
  log.warn(could not register MBean ' + bean.getName() + '., e);
}
  }
}
{code}

 /admin/mbean returning duplicate search handlers with names that map to their 
 classes?
 --

 Key: SOLR-3774
 URL: https://issues.apache.org/jira/browse/SOLR-3774
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3774.patch


 Offshoot of SOLR-3232...
 bq. Along with some valid entries with names equal to the request handler 
 names (/get search /browse) it also turned up one with the name 
 org.apache.solr.handler.RealTimeGetHandler and another with the name 
 org.apache.solr.handler.component.SearchHandler
 ...seems that we may have a bug with request handlers getting registered 
 multiple times, once under their real name and once using their class?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6060) Remove IndexWriter.unLock

2014-11-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216453#comment-14216453
 ] 

Michael McCandless commented on LUCENE-6060:


Well, Solr still has the unlockOnStartup; I wasn't sure what to do with that so 
I left it for now and opened SOLR-6737.

Most Lucene apps shouldn't be using the legacy SimpleFSLockFactory, and if they 
are 1) they must already be dealing with the remove lock on startup, 2) if 
they are doing so via IndexWriter.unlock, they will see the 
deprecation/compilation error on upgrade, dig in CHANGES, find this issue, and 
then have to do their own scary things: I think this is healthy.

I don't really like the deleteOnExit method.

 Remove IndexWriter.unLock
 -

 Key: LUCENE-6060
 URL: https://issues.apache.org/jira/browse/LUCENE-6060
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10, 5.0, Trunk

 Attachments: LUCENE-6060.patch


 This method used to be necessary, when our locking impls were buggy, but it's 
 a godawful dangerous method: it invites index corruption.
 I think we should remove it.
 Apps that for some scary reason really need it can do their own thing...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6755) ClassCastException from CloudMLTQParserTest

2014-11-18 Thread Hoss Man (JIRA)
Hoss Man created SOLR-6755:
--

 Summary: ClassCastException from CloudMLTQParserTest
 Key: SOLR-6755
 URL: https://issues.apache.org/jira/browse/SOLR-6755
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Anshum Gupta


The seed doesn't reproduce for me, but the ClassCastException seems hinky and 
worth looking into...

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=CloudMLTQParserTest 
-Dtests.method=testDistribSearch -Dtests.seed=3AE918BB008859A6 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=iw 
-Dtests.timezone=America/Indiana/Vincennes -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   50.7s J1 | CloudMLTQParserTest.testDistribSearch 
   [junit4] Throwable #1: java.lang.ClassCastException: java.lang.String 
cannot be cast to java.util.ArrayList
   [junit4]at 
__randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
   [junit4]at 
org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
   [junit4]at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
   [junit4]at java.lang.Thread.run(Thread.java:745)
{noformat}

http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText
Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
At revision 1640267




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_67) - Build # 11466 - Failure!

2014-11-18 Thread Chris Hostetter

I can't reproduce this, and i don't really understand it, but i know 
anshum was working on this very recently so i filed a jira for him so we 
don't lose track of it...

https://issues.apache.org/jira/browse/SOLR-6755

: Date: Tue, 18 Nov 2014 04:01:34 + (UTC)
: From: Policeman Jenkins Server jenk...@thetaphi.de
: Reply-To: dev@lucene.apache.org
: To: jbern...@apache.org, dev@lucene.apache.org
: Subject: [JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_67) - Build # 11466 -
:  Failure!
: 
: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/
: Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
: 
: 1 tests failed.
: REGRESSION:  org.apache.solr.search.mlt.CloudMLTQParserTest.testDistribSearch
: 
: Error Message:
: java.lang.String cannot be cast to java.util.ArrayList
: 
: Stack Trace:
: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.util.ArrayList
:   at 
__randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
:   at 
org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
:   at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
:   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
:   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
:   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
:   at java.lang.reflect.Method.invoke(Method.java:606)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
:   at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
:   at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
:   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at 

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216477#comment-14216477
 ] 

Tim Allison commented on LUCENE-5205:
-

Ha, turns out the hang isn't permanent, you just need to be patient. ;)  The 
minimal code to reproduce this inefficiency:
{noformat}
String s = 'S SOLUTION a PROVIDER TESTABCD;
long start = new Date().getTime();
Matcher m = Pattern.compile('(([^']+)+)').matcher(s);
while (m.find()) {
System.out.println(m.start());
}
System.out.println(elapsed: + (new Date().getTime()-start));
{noformat}

When I ran this against different length strings, I got these times.  I did two 
runs for each string.
||String||  MILLIS_RUN1||   MILLIS_RUN2||
|'S SOLUTION a PROVIDER TE| 937|933|
|'S SOLUTION a PROVIDER TES|1671|   1310|
|'S SOLUTION a PROVIDER TEST|   3165|   2643|
|'S SOLUTION a PROVIDER TESTA|  5165|   5227|
|'S SOLUTION a PROVIDER TESTAB| 9335|   9872|
|'S SOLUTION a PROVIDER TESTABC |19964| 18437|
|'S SOLUTION a PROVIDER TESTABCD|   39387|  35961|

I fixed the regex inefficiency on my github 
[site|https://github.com/tballison/lucene-addons].  I set that up for 
standalone addons that track with the latest stable builds.

 I'll respond to your other issues shortly.  Thank you [~modassar] for raising 
this issue!


 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for 

[jira] [Created] (SOLR-6756) The cloud-dev scripts do not seem to work with the new example layout.

2014-11-18 Thread Mark Miller (JIRA)
Mark Miller created SOLR-6756:
-

 Summary: The cloud-dev scripts do not seem to work with the new 
example layout.
 Key: SOLR-6756
 URL: https://issues.apache.org/jira/browse/SOLR-6756
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: Trunk






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6757) SolrInfoMBean should be an abstract class rather than an interface.

2014-11-18 Thread Mark Miller (JIRA)
Mark Miller created SOLR-6757:
-

 Summary: SolrInfoMBean should be an abstract class rather than an 
interface.
 Key: SOLR-6757
 URL: https://issues.apache.org/jira/browse/SOLR-6757
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, Trunk


This will give us greater flexibility around adding things with back compat 
support in minor releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?

2014-11-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216493#comment-14216493
 ] 

Tomás Fernández Löbbe commented on SOLR-3774:
-

I think that makes sense

 /admin/mbean returning duplicate search handlers with names that map to their 
 classes?
 --

 Key: SOLR-3774
 URL: https://issues.apache.org/jira/browse/SOLR-3774
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3774.patch


 Offshoot of SOLR-3232...
 bq. Along with some valid entries with names equal to the request handler 
 names (/get search /browse) it also turned up one with the name 
 org.apache.solr.handler.RealTimeGetHandler and another with the name 
 org.apache.solr.handler.component.SearchHandler
 ...seems that we may have a bug with request handlers getting registered 
 multiple times, once under their real name and once using their class?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216504#comment-14216504
 ] 

Mark Miller commented on SOLR-4735:
---

bq.  feel free to gut what we have in Solr5 

We don't have a lot of time, but it would be great to solve SOLR-6586 - it 
really requires a different stats API to be sensible I think. It's a little 
tricky to make nice, but really the API calls for each individual attribute 
should be able to be calculated independently. Otherwise, there is just so much 
recalculation that it's hard to have everything be live and fast and even if 
you only want to fetch a single fast attribute, you will be penalized by the 
slowest.

If you currently use a tool to enumerate and look at each attribute for 
monitoring, because of the duplicate bean issue and SOLR-6586, you can check 
the size of a directory like 40 times or something crazy when it really only 
had to be checked once. There is an API mismatch.

 Improve Solr metrics reporting
 --

 Key: SOLR-4735
 URL: https://issues.apache.org/jira/browse/SOLR-4735
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch


 Following on from a discussion on the mailing list:
 http://search-lucene.com/m/IO0EI1qdyJF1/codahalesubj=Solr+metrics+in+Codahale+metrics+and+Graphite+
 It would be good to make Solr play more nicely with existing devops 
 monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
 moment is poll-only, either via JMX or through the admin stats page.  I'd 
 like to refactor things a bit to make this more pluggable.
 This patch is a start.  It adds a new interface, InstrumentedBean, which 
 extends SolrInfoMBean to return a 
 [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
 couple of MetricReporters (which basically just duplicate the JMX and admin 
 page reporting that's there at the moment, but which should be more 
 extensible).  The patch includes a change to RequestHandlerBase showing how 
 this could work.  The idea would be to eventually replace the getStatistics() 
 call on SolrInfoMBean with this instead.
 The next step would be to allow more MetricReporters to be defined in 
 solrconfig.xml.  The Metrics library comes with ganglia and graphite 
 reporting modules, and we can add contrib plugins for both of those.
 There's some more general cleanup that could be done around SolrInfoMBean 
 (we've got two plugin handlers at /mbeans and /plugins that basically do the 
 same thing, and the beans themselves have some weirdly inconsistent data on 
 them - getVersion() returns different things for different impls, and 
 getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216536#comment-14216536
 ] 

Michael McCandless commented on LUCENE-6061:


I think you could do this with PH using an appropriate tokenizer.

Ie, you'd have a custom tokenizer that tokenizes your markup into the 4 
different cases (so you are still indexing 4 different fields), but that 
tokenizer carefully sets the token offsets into the original text (which you'd 
store with no markup).

At search time, regardless of which of the 4 fields was used for searching, 
you'd then use the token offsets against the same original stored field.  You 
should be able to do this by overriding PostingsHighlighter.loadFieldValues... 
though maybe we could make this easier somehow, to say when I highlight field 
X, load its content from field Y...

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_20) - Build # 4439 - Failure!

2014-11-18 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/
Java: 64bit/jdk1.8.0_20 -XX:-UseCompressedOops -XX:+UseSerialGC (asserts: true)

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates

Error Message:
unpaired high surrogate: d86c, followed by: e28f

Stack Trace:
java.lang.AssertionError: unpaired high surrogate: d86c, followed by: e28f
at 
__randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0)
at 
org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191)
at 
org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
at 
org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216571#comment-14216571
 ] 

Tim Allison commented on LUCENE-5205:
-

{quote}
field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
 +spanNear([field:search, spanNear([field:s, field:provider], 0, true), 
field:s, field:and, field:consulting, field:company], 0, true)
{quote}
Unfortunately, I can't think of a way around this.  In the SpanQueryParser, 
single quotes should be used to mark a token that should not be further parsed, 
i.e. '/files/a/b/c/path.html' should be treated as a string not a regex.  I 
toyed with requiring a space before the start ' and space after the ', but that 
seemed hacky.

If you escape your apostrophes, you should get the results you expect (this is 
with a whitespace analyzer, you may get different results with 
StandardAnalyzer):
{noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat}
yields:f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., 
f1:ltd
{noformat}

{quote}q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed 
to following:
 +spanNear([field:search, field:tools, field:provider, field:, 
field:consulting, field:company], 0, true)
{quote}
I think this is fixed on github.  What Analyzer chain are you using?



 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is 

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216571#comment-14216571
 ] 

Tim Allison edited comment on LUCENE-5205 at 11/18/14 6:47 PM:
---

{quote}
field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
 +spanNear([field:search, spanNear([field:s, field:provider], 0, true), 
field:s, field:and, field:consulting, field:company], 0, true)
{quote}
Unfortunately, I can't think of a way around this.  In the SpanQueryParser, 
single quotes should be used to mark a token that should not be further parsed, 
i.e. '/files/a/b/c/path.html' should be treated as a string not a regex.  I 
toyed with requiring a space before the start ' and space after the ', but that 
seemed hacky.

If you escape your apostrophes, you should get the results you expect (this is 
with a whitespace analyzer, you may get different results with 
StandardAnalyzer):
{noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat}
yields:
{noformat}
f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., f1:ltd
{noformat}

{quote}q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed 
to following:
 +spanNear([field:search, field:tools, field:provider, field:, 
field:consulting, field:company], 0, true)
{quote}
I think this is fixed on github.  What Analyzer chain are you using?




was (Author: talli...@mitre.org):
{quote}
field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
 +spanNear([field:search, spanNear([field:s, field:provider], 0, true), 
field:s, field:and, field:consulting, field:company], 0, true)
{quote}
Unfortunately, I can't think of a way around this.  In the SpanQueryParser, 
single quotes should be used to mark a token that should not be further parsed, 
i.e. '/files/a/b/c/path.html' should be treated as a string not a regex.  I 
toyed with requiring a space before the start ' and space after the ', but that 
seemed hacky.

If you escape your apostrophes, you should get the results you expect (this is 
with a whitespace analyzer, you may get different results with 
StandardAnalyzer):
{noformat} SEARCH TOOL\\'S SOLUTION PROVIDER\\'S TECHNOLOGY CO., LTD{noformat}
yields:f1:search f1:tool's f1:solution f1:provider's f1:technology f1:co., 
f1:ltd
{noformat}

{quote}q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed 
to following:
 +spanNear([field:search, field:tools, field:provider, field:, 
field:consulting, field:company], 0, true)
{quote}
I think this is fixed on github.  What Analyzer chain are you using?



 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 

[jira] [Commented] (SOLR-6625) HttpClient callback in HttpSolrServer

2014-11-18 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216585#comment-14216585
 ] 

Gregory Chanan commented on SOLR-6625:
--

bq. Actually SOLR-4470 aims at introducing a framework for any 
authentication-type, and then (for now) implement basic-auth using this 
framework

Ah, I see, I misinterpreted the SOLR-4470 code in HttpSolrServer -- it uses 
BasicAuthCache and BasicScheme which I thought were in reference to basic auth, 
but they are really just default implementations.

What I'm really arguing -- and it's my fault I didn't make it clear with 
example code -- is that the authentication type may affect how you want the 
http requests to look, beyond just the credentials.  For example, I'm using an 
authentication filter based off of Hadoop's AuthenticationFilter 
(https://github.com/apache/hadoop/blob/7250b0bf914a55d0fa4802834de7f1909f1b0d6b/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationFilter.java).
  That filter does SPNego negotiation on the first request, but sets a cookie 
you can use to avoid the negotiation on subsequent requests.  So, I wouldn't 
want the the SOLR-4470 implementation where I buffer up every request; I only 
want to do that on the first request to the server on the connection.

From seeing the SOLR-4470 code, though, it looks like I was thinking about 
this incorrectly.  Instead of the HttpClientCallback being a function of the 
HttpSolrServer, it's really a function of the AuthCredentials implementation.  
So, the default implementation would just be the 
credentialsButNonPreemptive/getHttpContextForRequest code you have in 
HttpSolrServer in SOLR-4470, but other AuthCredentials implementations could 
override.  Does that sound right to you, [~steff1193]?

bq. I do not know if it is an improvement compared to your approach. I just 
implemented in a way that worked. Supporting non-preemptive authenticating 
POST-requests was not the main focus of SOLR-4470, so I just quickly did it in 
the way that I found it could be done - without considering performance or 
anything

Cool, I'll investigate in another jira.

 HttpClient callback in HttpSolrServer
 -

 Key: SOLR-6625
 URL: https://issues.apache.org/jira/browse/SOLR-6625
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Attachments: SOLR-6625.patch, SOLR-6625.patch


 Some of our setups use Solr in a SPNego/kerberos setup (we've done this by 
 adding our own filters to the web.xml).  We have an issue in that SPNego 
 requires a negotiation step, but some HttpSolrServer requests are not 
 repeatable, notably the PUT/POST requests.  So, what happens is, 
 HttpSolrServer sends the requests, the server responds with a negotiation 
 request, and the request fails because the request is not repeatable.  We've 
 modified our code to send a repeatable request beforehand in these cases.
 It would be nicer if HttpSolrServer provided a pre/post callback when it was 
 making an httpclient request.  This would allow administrators to make 
 changes to the request for authentication purposes, and would allow users to 
 make per-request changes to the httpclient calls (i.e. modify httpclient 
 requestconfig to modify the timeout on a per-request basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.8.0_20) - Build # 4439 - Failure!

2014-11-18 Thread Robert Muir
I can't reproduce this, but its also not a random test. Just very
simple asserts.

I tried reproducing on linux with the master seed, same jvm version
and flags, no luck.

On Tue, Nov 18, 2014 at 1:32 PM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/
 Java: 64bit/jdk1.8.0_20 -XX:-UseCompressedOops -XX:+UseSerialGC (asserts: 
 true)

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates

 Error Message:
 unpaired high surrogate: d86c, followed by: e28f

 Stack Trace:
 java.lang.AssertionError: unpaired high surrogate: d86c, followed by: e28f
 at 
 __randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0)
 at 
 org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191)
 at 
 org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136)
 at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403)
 at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352)
 at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
 at 
 org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 

[jira] [Updated] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3774:
--
Attachment: SOLR-3774.patch

 /admin/mbean returning duplicate search handlers with names that map to their 
 classes?
 --

 Key: SOLR-3774
 URL: https://issues.apache.org/jira/browse/SOLR-3774
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3774.patch, SOLR-3774.patch


 Offshoot of SOLR-3232...
 bq. Along with some valid entries with names equal to the request handler 
 names (/get search /browse) it also turned up one with the name 
 org.apache.solr.handler.RealTimeGetHandler and another with the name 
 org.apache.solr.handler.component.SearchHandler
 ...seems that we may have a bug with request handlers getting registered 
 multiple times, once under their real name and once using their class?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?

2014-11-18 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216668#comment-14216668
 ] 

Gregory Chanan commented on SOLR-3774:
--

+1

 /admin/mbean returning duplicate search handlers with names that map to their 
 classes?
 --

 Key: SOLR-3774
 URL: https://issues.apache.org/jira/browse/SOLR-3774
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3774.patch, SOLR-3774.patch


 Offshoot of SOLR-3232...
 bq. Along with some valid entries with names equal to the request handler 
 names (/get search /browse) it also turned up one with the name 
 org.apache.solr.handler.RealTimeGetHandler and another with the name 
 org.apache.solr.handler.component.SearchHandler
 ...seems that we may have a bug with request handlers getting registered 
 multiple times, once under their real name and once using their class?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest

2014-11-18 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216687#comment-14216687
 ] 

Anshum Gupta commented on SOLR-6755:


I can't seem to reproduce it even after multiple runs. I'm adding some safety 
checks in the test though and will commit a patch that handles this.

 ClassCastException from CloudMLTQParserTest
 ---

 Key: SOLR-6755
 URL: https://issues.apache.org/jira/browse/SOLR-6755
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Anshum Gupta

 The seed doesn't reproduce for me, but the ClassCastException seems hinky and 
 worth looking into...
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch 
 -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true 
 -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes 
 -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
[junit4] ERROR   50.7s J1 | CloudMLTQParserTest.testDistribSearch 
[junit4] Throwable #1: java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
[junit4]  at 
 org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText
 Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
 At revision 1640267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216714#comment-14216714
 ] 

ASF subversion and git services commented on SOLR-6755:
---

Commit 1640416 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1640416 ]

SOLR-6755: Fix the test to always return 2 parsedqueries i.e. have more 2 shards

 ClassCastException from CloudMLTQParserTest
 ---

 Key: SOLR-6755
 URL: https://issues.apache.org/jira/browse/SOLR-6755
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Anshum Gupta

 The seed doesn't reproduce for me, but the ClassCastException seems hinky and 
 worth looking into...
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch 
 -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true 
 -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes 
 -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
[junit4] ERROR   50.7s J1 | CloudMLTQParserTest.testDistribSearch 
[junit4] Throwable #1: java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
[junit4]  at 
 org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText
 Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
 At revision 1640267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216716#comment-14216716
 ] 

ASF subversion and git services commented on SOLR-6755:
---

Commit 1640417 from [~anshumg] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1640417 ]

SOLR-6755: Fix the test to always return 2 parsedqueries i.e. have more 2 
shards (merge from trunk)

 ClassCastException from CloudMLTQParserTest
 ---

 Key: SOLR-6755
 URL: https://issues.apache.org/jira/browse/SOLR-6755
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Anshum Gupta

 The seed doesn't reproduce for me, but the ClassCastException seems hinky and 
 worth looking into...
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch 
 -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true 
 -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes 
 -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
[junit4] ERROR   50.7s J1 | CloudMLTQParserTest.testDistribSearch 
[junit4] Throwable #1: java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
[junit4]  at 
 org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText
 Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
 At revision 1640267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6755) ClassCastException from CloudMLTQParserTest

2014-11-18 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216719#comment-14216719
 ] 

Anshum Gupta commented on SOLR-6755:


This commit should fix the issue. Changed the test to always have 2 shards i.e. 
never have 1 shard, that returns a String instead of an ArrayListString in 
the debug response.

 ClassCastException from CloudMLTQParserTest
 ---

 Key: SOLR-6755
 URL: https://issues.apache.org/jira/browse/SOLR-6755
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Anshum Gupta

 The seed doesn't reproduce for me, but the ClassCastException seems hinky and 
 worth looking into...
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=CloudMLTQParserTest -Dtests.method=testDistribSearch 
 -Dtests.seed=3AE918BB008859A6 -Dtests.multiplier=3 -Dtests.slow=true 
 -Dtests.locale=iw -Dtests.timezone=America/Indiana/Vincennes 
 -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
[junit4] ERROR   50.7s J1 | CloudMLTQParserTest.testDistribSearch 
[junit4] Throwable #1: java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([3AE918BB008859A6:BB0F96A377D7399A]:0)
[junit4]  at 
 org.apache.solr.search.mlt.CloudMLTQParserTest.doTest(CloudMLTQParserTest.java:124)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11466/consoleText
 Java: 64bit/jdk1.7.0_67 -XX:-UseCompressedOops -XX:+UseG1GC (asserts: true)
 At revision 1640267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216742#comment-14216742
 ] 

Michael McCandless commented on LUCENE-6062:


+1

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6708) Smoke tester couldn't communicate with Solr started using 'bin/solr start'

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216750#comment-14216750
 ] 

ASF subversion and git services commented on SOLR-6708:
---

Commit 1640419 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1640419 ]

SOLR-6708: wrap the kill existing Solr command in a try/except block

 Smoke tester couldn't communicate with Solr started using 'bin/solr start'
 --

 Key: SOLR-6708
 URL: https://issues.apache.org/jira/browse/SOLR-6708
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Steve Rowe
Assignee: Timothy Potter
 Attachments: solr-example.log


 The nightly-smoke target failed on ASF Jenkins 
 [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-5.x/208/]: 
 {noformat}
[smoker]   unpack solr-5.0.0.tgz...
[smoker] verify JAR metadata/identity/no javax.* or java.* classes...
[smoker] unpack lucene-5.0.0.tgz...
[smoker]   **WARNING**: skipping check of 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
  it has javax.* classes
[smoker]   **WARNING**: skipping check of 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar:
  it has javax.* classes
[smoker] verify WAR metadata/contained JAR identity/no javax.* or 
 java.* classes...
[smoker] unpack lucene-5.0.0.tgz...
[smoker] copying unpacked distribution for Java 7 ...
[smoker] test solr example w/ Java 7...
[smoker]   start Solr instance 
 (log=/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/solr-example.log)...
[smoker]   startup done
[smoker] Failed to determine the port of a local Solr instance, cannot 
 create core!
[smoker]   test utf8...
[smoker] 
[smoker] command sh ./exampledocs/test_utf8.sh 
 http://localhost:8983/solr/techproducts; failed:
[smoker] ERROR: Could not curl to Solr - is curl installed? Is Solr not 
 running?
[smoker] 
[smoker] 
[smoker]   stop server using: bin/solr stop -p 8983
[smoker] No process found for Solr node running on port 8983
[smoker] ***WARNING***: Solr instance didn't respond to SIGINT; using 
 SIGKILL now...
[smoker] ***WARNING***: Solr instance didn't respond to SIGKILL; 
 ignoring...
[smoker] Traceback (most recent call last):
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 1526, in module
[smoker] main()
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 1471, in main
[smoker] smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir, 
 c.is_signed, ' '.join(c.test_args))
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 1515, in smokeTest
[smoker] unpackAndVerify(java, 'solr', tmpDir, artifact, svnRevision, 
 version, testArgs, baseURL)
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 616, in unpackAndVerify
[smoker] verifyUnpacked(java, project, artifact, unpackPath, 
 svnRevision, version, testArgs, tmpDir, baseURL)
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 783, in verifyUnpacked
[smoker] testSolrExample(java7UnpackPath, java.java7_home, False)
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 888, in testSolrExample
[smoker] run('sh ./exampledocs/test_utf8.sh 
 http://localhost:8983/solr/techproducts', 'utf8.log')
[smoker]   File 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/dev-tools/scripts/smokeTestRelease.py,
  line 541, in run
[smoker] raise RuntimeError('command %s failed; see log file %s' % 
 (command, logPath))
[smoker] RuntimeError: command sh ./exampledocs/test_utf8.sh 
 http://localhost:8983/solr/techproducts; failed; see log file 
 /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-5.x/lucene/build/smokeTestRelease/tmp/unpack/solr-5.0.0-java7/example/utf8.log
 BUILD FAILED
 

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216757#comment-14216757
 ] 

Tim Allison commented on LUCENE-5205:
-

[~paul.elsc...@xs4all.nl], I'm sorry for taking so long to get back to you.  I 
just merged trunk and made updates to my fork of the lucene5205 
[branch|https://github.com/tballison/lucene-solr/tree/lucene5205].  Let me know 
if that is of any use to you.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #761: POMs out of sync

2014-11-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/761/

2 tests failed.
FAILED:  org.apache.solr.hadoop.MorphlineBasicMiniMRTest.testPathParts

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([99737C6C17DEC09]:0)


FAILED:  
org.apache.solr.hadoop.MorphlineBasicMiniMRTest.org.apache.solr.hadoop.MorphlineBasicMiniMRTest

Error Message:
Suite timeout exceeded (= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (= 720 msec).
at __randomizedtesting.SeedInfo.seed([99737C6C17DEC09]:0)




Build Log:
[...truncated 53887 lines...]
BUILD FAILED
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:548: 
The following error occurred while executing this line:
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:200: 
The following error occurred while executing this line:
: Java returned: 1

Total time: 415 minutes 34 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-6064:
---

 Summary: throw exception during sort for misconfigured field
 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


If you sort on field X, and it has no docvalues, today it will silently treat 
it as all values missing. This can be very confusing since it just means 
nothing will happen at all.

But there is a distinction between no docs happen to have a value for this 
field and field isn't configured correctly. The latter should get an 
exception, telling the user to index docvalues, or wrap the reader with 
UninvertingReader.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-6064:

Attachment: LUCENE-6064.patch

Attached is an initial patch: its largish because this check found numerous 
test bugs.

 throw exception during sort for misconfigured field
 ---

 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6064.patch


 If you sort on field X, and it has no docvalues, today it will silently treat 
 it as all values missing. This can be very confusing since it just means 
 nothing will happen at all.
 But there is a distinction between no docs happen to have a value for this 
 field and field isn't configured correctly. The latter should get an 
 exception, telling the user to index docvalues, or wrap the reader with 
 UninvertingReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6732) Back-compat break for LIR state in 4.10.2

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216906#comment-14216906
 ] 

ASF subversion and git services commented on SOLR-6732:
---

Commit 1640432 from [~thelabdude] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1640432 ]

SOLR-6732: fix back-compat issue with unit test to verify solution

 Back-compat break for LIR state in 4.10.2
 -

 Key: SOLR-6732
 URL: https://issues.apache.org/jira/browse/SOLR-6732
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2
Reporter: Shalin Shekhar Mangar
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 4.10.3

 Attachments: SOLR-6732.patch, SOLR-6732.patch


 We changed the LIR state to be kept as a map but it is not back-compatible. 
 The problem is that we're checking for map or string after parsing JSON but 
 if the key has down as a string then json parsing will fail.
 This was introduced in SOLR-6511. This error will prevent anyone from 
 upgrading to 4.10.2
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3774) /admin/mbean returning duplicate search handlers with names that map to their classes?

2014-11-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216911#comment-14216911
 ] 

Shalin Shekhar Mangar commented on SOLR-3774:
-

+1

 /admin/mbean returning duplicate search handlers with names that map to their 
 classes?
 --

 Key: SOLR-3774
 URL: https://issues.apache.org/jira/browse/SOLR-3774
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3774.patch, SOLR-3774.patch


 Offshoot of SOLR-3232...
 bq. Along with some valid entries with names equal to the request handler 
 names (/get search /browse) it also turned up one with the name 
 org.apache.solr.handler.RealTimeGetHandler and another with the name 
 org.apache.solr.handler.component.SearchHandler
 ...seems that we may have a bug with request handlers getting registered 
 multiple times, once under their real name and once using their class?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6732) Back-compat break for LIR state in 4.10.2

2014-11-18 Thread Timothy Potter (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-6732.
--
Resolution: Fixed

 Back-compat break for LIR state in 4.10.2
 -

 Key: SOLR-6732
 URL: https://issues.apache.org/jira/browse/SOLR-6732
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2
Reporter: Shalin Shekhar Mangar
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 4.10.3

 Attachments: SOLR-6732.patch, SOLR-6732.patch


 We changed the LIR state to be kept as a map but it is not back-compatible. 
 The problem is that we're checking for map or string after parsing JSON but 
 if the key has down as a string then json parsing will fail.
 This was introduced in SOLR-6511. This error will prevent anyone from 
 upgrading to 4.10.2
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6732) Back-compat break for LIR state in 4.10.2

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216912#comment-14216912
 ] 

ASF subversion and git services commented on SOLR-6732:
---

Commit 1640434 from [~thelabdude] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1640434 ]

SOLR-6732: mention in changes

 Back-compat break for LIR state in 4.10.2
 -

 Key: SOLR-6732
 URL: https://issues.apache.org/jira/browse/SOLR-6732
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.2
Reporter: Shalin Shekhar Mangar
Assignee: Timothy Potter
Priority: Blocker
 Fix For: 4.10.3

 Attachments: SOLR-6732.patch, SOLR-6732.patch


 We changed the LIR state to be kept as a map but it is not back-compatible. 
 The problem is that we're checking for map or string after parsing JSON but 
 if the key has down as a string then json parsing will fail.
 This was introduced in SOLR-6511. This error will prevent anyone from 
 upgrading to 4.10.2
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201411.mbox/%3c54636ed2.8040...@cytainment.de%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6729) createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-6729:
-

Assignee: Mark Miller

 createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA
 

 Key: SOLR-6729
 URL: https://issues.apache.org/jira/browse/SOLR-6729
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke
Assignee: Mark Miller
Priority: Minor

 The 'Replica placement strategy for solrcloud' SOLR-6220 ticket will allow 
 more sophisticated replica placement logic but in the meantime this simple 
 change here would allow more predictable locating of replicas via the 
 ordering of the createNodeSet list provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6086) Replica active during Warming

2014-11-18 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-6086:
---

Assignee: Shalin Shekhar Mangar

 Replica active during Warming
 -

 Key: SOLR-6086
 URL: https://issues.apache.org/jira/browse/SOLR-6086
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6.1, 4.8.1
Reporter: ludovic Boutros
Assignee: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-medium
 Attachments: SOLR-6086.patch, SOLR-6086.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 At least with Solr 4.6.1, replica are considered as active during the warming 
 process.
 This means that if you restart a replica or create a new one, queries will  
 be send to this replica and the query will hang until the end of the warming  
 process (If cold searchers are not used).
 You cannot add or restart a node silently anymore.
 I think that the fact that the replica is active is not a bad thing.
 But, the HttpShardHandler and the CloudSolrServer class should take the 
 warming process in account.
 Currently, I have developped a new very simple component which check that a 
 searcher is registered.
 I am also developping custom HttpShardHandler and CloudSolrServer classes 
 which will check the warming process in addition to the ACTIVE status in the 
 cluster state.
 This seems to be more a workaround than a solution but that's all I can do in 
 this version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216974#comment-14216974
 ] 

Adrien Grand commented on LUCENE-6064:
--

+1

 throw exception during sort for misconfigured field
 ---

 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6064.patch


 If you sort on field X, and it has no docvalues, today it will silently treat 
 it as all values missing. This can be very confusing since it just means 
 nothing will happen at all.
 But there is a distinction between no docs happen to have a value for this 
 field and field isn't configured correctly. The latter should get an 
 exception, telling the user to index docvalues, or wrap the reader with 
 UninvertingReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6747) Add an optional caching option as a workaround for SOLR-6586.

2014-11-18 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216987#comment-14216987
 ] 

Gregory Chanan commented on SOLR-6747:
--

{code}
NamedList cachedStats = this.cachedDynamicStats;
NamedList stats;
 if (useCachedStatsBetweenGetMBeanInfoCalls  cachedStats != null) {
  stats = cachedStats;
} else {
  stats = infoBean.getStatistics();
}
{code}

small optimization, but maybe better to avoid reading the volatile value if 
useCachedStatsBetweenGetMBeanInfoCalls is false?
i.e.
{code}
NamedList stats;
if (useCachedStatsBetweenGetMBeanInfoCalls) {
  NamedList cachedStats = this.cachedDynamicStats;
  if (cachedStats != null) {
stats = cachedStats;
  }
}
if (stats == null) {
  stats = infoBean.getStatistics();
}
{code}

could optimize further by eliminating the conditional when 
useCachedStatsBetweenGetMBeanInfoCalls is false but perhaps not worth it.

Otherwise, looks good, +1.

 Add an optional caching option as a workaround for SOLR-6586.
 -

 Key: SOLR-6747
 URL: https://issues.apache.org/jira/browse/SOLR-6747
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-6747.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun commented on LUCENE-6061:
--

Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index) and that let's me do the highlighting as well so I am not 
that dependent on the Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:04 PM:
-

Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets). Like that I can implement 
a cool way of handling synonym searching as well) and enables me to do 
highlighting as well (I think it's similar to the approach PH uses? But with 
storing the results into an index) so I am not that dependent on the 
Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.


was (Author: s4ke):
Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index) and that let's me do the highlighting as well so I am not 
that dependent on the Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:05 PM:
-

Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. Like that 
I can implement a cool way of handling synonym searching as well) and enables 
me to do highlighting as well (I think it's similar to the approach PH uses? 
But with storing the results into an index) so I am not that dependent on the 
Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.


was (Author: s4ke):
Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets). Like that I can implement 
a cool way of handling synonym searching as well) and enables me to do 
highlighting as well (I think it's similar to the approach PH uses? But with 
storing the results into an index) so I am not that dependent on the 
Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:08 PM:
-

Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. (I think 
it's similar to the approach PH uses? But with storing the results into an 
index) Like that I can implement a cool way of handling synonym searching as 
well) and this enables me to do highlighting without the need of one of the 
Highlighters in Lucene so I am not that dependent on the Highlighting API 
anymore.

But I think I might need the Highlighter API some time in the near future so I 
am keeping my _FastVectorHighlighterUtil_

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.


was (Author: s4ke):
Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. (I think 
it's similar to the approach PH uses? But with storing the results into an 
index) Like that I can implement a cool way of handling synonym searching as 
well) and this enables me to do highlighting without the need of one of the 
Highlighters in Lucene so I am not that dependent on the Highlighting API 
anymore.

But I think I might need the Highlighter API some time in the near future so I 
am keeping my _FastVectorHighlighterUtil_

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:08 PM:
-

Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. (I think 
it's similar to the approach PH uses? But with storing the results into an 
index) Like that I can implement a cool way of handling synonym searching as 
well) and this enables me to do highlighting without the need of one of the 
Highlighters in Lucene so I am not that dependent on the Highlighting API 
anymore.

But I think I might need the Highlighter API some time in the near future so I 
am keeping my _FastVectorHighlighterUtil_

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.


was (Author: s4ke):
Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. Like that 
I can implement a cool way of handling synonym searching as well) and enables 
me to do highlighting as well (I think it's similar to the approach PH uses? 
But with storing the results into an index) so I am not that dependent on the 
Highlighting API anymore.

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6061) Add Support for something different than Strings in Highlighting (FastVectorHighlighter)

2014-11-18 Thread Martin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217002#comment-14217002
 ] 

Martin Braun edited comment on LUCENE-6061 at 11/18/14 11:09 PM:
-

Well I am doing the synonym approach for other parts of my analysis already, 
but I think the FastVectorHighlighter approach is better as it does exactly the 
part with when I highlight field X, load its content from field Y, but I just 
want it to be able to render into arbitrary objects (or in my case I just want 
the plain offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. (I think 
it's similar to the approach PH uses? But with storing the results into an 
index) Like that I can implement a cool way of handling synonym searching as 
well) and this enables me to do highlighting without the need of one of the 
Highlighters in Lucene so I am not that dependent on the Highlighting API 
anymore.

But I think I might need the Highlighter API some time in the near future so I 
am keeping my _FastVectorHighlighterUtil_

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.


was (Author: s4ke):
Well I am doing the synonym approach for other parts already, but I think the 
FastVectorHighlighter approach is better as it does exactly the part with when 
I highlight field X, load its content from field Y, but I just want it to be 
able to render into arbitrary objects (or in my case I just want the plain 
offsets).

I am currently working on a more sophisticated approach that lets me search for 
more information about one single token (I am reindexing the document's tokens 
into a new index with all it's occurences (offsets) by using the same analyzer 
chain that the complete documents use and extracting the attributes. (I think 
it's similar to the approach PH uses? But with storing the results into an 
index) Like that I can implement a cool way of handling synonym searching as 
well) and this enables me to do highlighting without the need of one of the 
Highlighters in Lucene so I am not that dependent on the Highlighting API 
anymore.

But I think I might need the Highlighter API some time in the near future so I 
am keeping my _FastVectorHighlighterUtil_

Generally I just want to make the Highlighter API (I am talking about 
_FastVectorHighlighter_ here) easier to use and more intuitive than what I 
would need to do with the indexing trick.

 Add Support for something different than Strings in Highlighting 
 (FastVectorHighlighter)
 

 Key: LUCENE-6061
 URL: https://issues.apache.org/jira/browse/LUCENE-6061
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/search, modules/highlighter
Affects Versions: Trunk
Reporter: Martin Braun
Priority: Critical
  Labels: FastVectorHighlighter, Highlighter, Highlighting
 Fix For: 4.10.2, 5.0, Trunk


 In my application I need Highlighting and I stumbled upon the really neat 
 FastVectorHighlighter. One problem appeared though. It lacks a way to render 
 the Highlights into something different than Strings, so I rearranged some of 
 the code to support that:
 https://github.com/Hotware/LuceneBeanExtension/blob/master/src/main/java/de/hotware/lucene/extension/highlight/FVHighlighterUtil.java
 Is there a specific reason to only support String[] as a return type? If not, 
 I would be happy to write a new class that supports rendering into a generic 
 Type and rewire that into the existing class (or just do it as an addition 
 and leave the current class be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6729) createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217015#comment-14217015
 ] 

Mark Miller commented on SOLR-6729:
---

Looks good. We should probably add a simple test though.

 createNodeSet.shuffle=(true|false) support, createNodeSet for ADDREPLICA
 

 Key: SOLR-6729
 URL: https://issues.apache.org/jira/browse/SOLR-6729
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke
Assignee: Mark Miller
Priority: Minor

 The 'Replica placement strategy for solrcloud' SOLR-6220 ticket will allow 
 more sophisticated replica placement logic but in the meantime this simple 
 change here would allow more predictable locating of replicas via the 
 ordering of the createNodeSet list provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217017#comment-14217017
 ] 

Michael McCandless commented on LUCENE-6064:


+1

 throw exception during sort for misconfigured field
 ---

 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6064.patch


 If you sort on field X, and it has no docvalues, today it will silently treat 
 it as all values missing. This can be very confusing since it just means 
 nothing will happen at all.
 But there is a distinction between no docs happen to have a value for this 
 field and field isn't configured correctly. The latter should get an 
 exception, telling the user to index docvalues, or wrap the reader with 
 UninvertingReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4792) stop shipping a war in trunk (6.0)

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217025#comment-14217025
 ] 

Mark Miller commented on SOLR-4792:
---

bq. The issue title is wrong ... we WILL be shipping a .war in 5.x versions.

I think we just need to backport this to 5.x. No reason we have to wait until 
6x.

 stop shipping a war in trunk (6.0)
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-4792) stop shipping a war in trunk (6.0)

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-4792:
---

 stop shipping a war in trunk (6.0)
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4792) stop shipping a war in trunk (6.0)

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4792:
--
Fix Version/s: 5.0
 Assignee: Mark Miller  (was: Robert Muir)

 stop shipping a war in trunk (6.0)
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4792) stop shipping a war in trunk (6.0)

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217031#comment-14217031
 ] 

Mark Miller commented on SOLR-4792:
---

[~andyetitmoves] brought this up at Lucene Revolution - 5.x could be a long 
line and this is a fairly simple change internally - it just takes a volunteer 
- let's do it in 5.x as originally planned and voted on.

 stop shipping a war in trunk (6.0)
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4792) stop shipping a war in 5.0

2014-11-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4792:
--
Summary: stop shipping a war in 5.0  (was: stop shipping a war in trunk 
(6.0))

 stop shipping a war in 5.0
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6747) Add an optional caching option as a workaround for SOLR-6586.

2014-11-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217043#comment-14217043
 ] 

Mark Miller commented on SOLR-6747:
---

bq. small optimization, but maybe better to avoid reading the volatile value if 
useCachedStatsBetweenGetMBeanInfoCalls is false?

+1

 Add an optional caching option as a workaround for SOLR-6586.
 -

 Key: SOLR-6747
 URL: https://issues.apache.org/jira/browse/SOLR-6747
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-6747.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2412) Multipath hierarchical faceting

2014-11-18 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217040#comment-14217040
 ] 

Toke Eskildsen commented on SOLR-2412:
--

Frankly, I am not sure it ever will. SOLR-2412 is huge and it is a completely 
separate facet implementation, of which Solr already has too many. We are not 
currently using it at my organization as we don't have the need for 
hierarchical faceting and since SOLR-5894 gives us a similar speed-boost when 
using multiple facets.

I hope to add the hierarchical capabilities as overlay to the existing Solr 
facet code at some point, but I really cannot say when or if that will work out.

Sorry about that and apologies for taking so long to come to that realization.

 Multipath hierarchical faceting
 ---

 Key: SOLR-2412
 URL: https://issues.apache.org/jira/browse/SOLR-2412
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 4.0
 Environment: Fast IO when huge hierarchies are used
Reporter: Toke Eskildsen
  Labels: contrib, patch
 Attachments: SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, 
 SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch, SOLR-2412.patch


 Hierarchical faceting with slow startup, low memory overhead and fast 
 response. Distinguishing features as compared to SOLR-64 and SOLR-792 are
   * Multiple paths per document
   * Query-time analysis of the facet-field; no special requirements for 
 indexing besides retaining separator characters in the terms used for faceting
   * Optional custom sorting of tag values
   * Recursive counting of references to tags at all levels of the output
 This is a shell around LUCENE-2369, making it work with the Solr API. The 
 underlying principle is to reference terms by their ordinals and create an 
 index wide documents to tags map, augmented with a compressed representation 
 of hierarchical levels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217085#comment-14217085
 ] 

ASF subversion and git services commented on LUCENE-6063:
-

Commit 1640456 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1640456 ]

LUCENE-6063: allow overriding whether/how ConcurrentMergeScheduler stalls 
incoming threads when merges are falling behind

 Allow overriding ConcurrentMergeScheduler's denial-of-service protection
 

 Key: LUCENE-6063
 URL: https://issues.apache.org/jira/browse/LUCENE-6063
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6063.patch


 In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but
 the issue never converged, so I want to break out one small part of
 it here: the ability to override CMS's default aggressive
 denial-of-service protection where it forcefully stalls the incoming
 threads that are responsible for creating too many segments.
 More advanced applications can more gracefully handle the too many
 merges by e.g. slowing down the incoming indexing rate at a higher
 level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217088#comment-14217088
 ] 

ASF subversion and git services commented on LUCENE-6063:
-

Commit 1640457 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1640457 ]

LUCENE-6063: allow overriding whether/how ConcurrentMergeScheduler stalls 
incoming threads when merges are falling behind

 Allow overriding ConcurrentMergeScheduler's denial-of-service protection
 

 Key: LUCENE-6063
 URL: https://issues.apache.org/jira/browse/LUCENE-6063
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6063.patch


 In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but
 the issue never converged, so I want to break out one small part of
 it here: the ability to override CMS's default aggressive
 denial-of-service protection where it forcefully stalls the incoming
 threads that are responsible for creating too many segments.
 More advanced applications can more gracefully handle the too many
 merges by e.g. slowing down the incoming indexing rate at a higher
 level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6063) Allow overriding ConcurrentMergeScheduler's denial-of-service protection

2014-11-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-6063.

Resolution: Fixed

 Allow overriding ConcurrentMergeScheduler's denial-of-service protection
 

 Key: LUCENE-6063
 URL: https://issues.apache.org/jira/browse/LUCENE-6063
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6063.patch


 In LUCENE-5310 we explored improving CMS/SMS sharing/concurrency, but
 the issue never converged, so I want to break out one small part of
 it here: the ability to override CMS's default aggressive
 denial-of-service protection where it forcefully stalls the incoming
 threads that are responsible for creating too many segments.
 More advanced applications can more gracefully handle the too many
 merges by e.g. slowing down the incoming indexing rate at a higher
 level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 678 - Failure

2014-11-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/678/

2 tests failed.
REGRESSION:  
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
java.lang.NullPointerException 

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
java.lang.NullPointerException

at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:569)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:583)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 

Re: [JENKINS] Lucene-Solr-4.10-Linux (32bit/jdk1.9.0-ea-b34) - Build # 93 - Failure!

2014-11-18 Thread Robert Muir
This looks like https://bugs.openjdk.java.net/browse/JDK-8038348
still, only without asserts.

It might only still happen in 4.10.x, the codec pull API makes the
flush code look completely different.

On Sun, Nov 16, 2014 at 9:31 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.10-Linux/93/
 Java: 32bit/jdk1.9.0-ea-b34 -server -XX:+UseG1GC (asserts: false)

 2 tests failed.
 REGRESSION:  
 org.apache.lucene.codecs.simpletext.TestSimpleTextTermVectorsFormat.testRamBytesUsed

 Error Message:
 8196

 Stack Trace:
 java.lang.ArrayIndexOutOfBoundsException: 8196
 at 
 __randomizedtesting.SeedInfo.seed([868B4D2568A55A5E:74285F65A2DA4508]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.nextSlice(ByteSliceReader.java:109)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:76)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:122)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:454)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:80)
 at 
 org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:114)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:439)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:510)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:621)
 at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3227)
 at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3203)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1774)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1754)
 at 
 org.apache.lucene.index.BaseIndexFileFormatTestCase.testRamBytesUsed(BaseIndexFileFormatTestCase.java:228)
 at 
 org.apache.lucene.index.BaseTermVectorsFormatTestCase.testRamBytesUsed(BaseTermVectorsFormatTestCase.java:61)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
   

[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well

2014-11-18 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217243#comment-14217243
 ] 

Alexandre Rafalovitch commented on SOLR-6633:
-

This is truly just storing original document, right? And only returning the 
whole thing as well?

Because, in Elasticsearch, the *_src* field is actually used as source for 
several operations. For example, it is as a source for dynamic update as - by 
default - fields are not stored individually. And, I think, *_src* field also 
gets re-written/re-created on update, again because it is actually used as a 
source of truth.

The second issue I wanted to raise is how this will interplay with 
UpdateRequestProcessors (ES does not really have those). I guess URPs will 
apply after the content of the field, so the actual fields may look quite 
different from what's in the *_src*.

Finally, I am not clear on what this really means: ??all fields go into the 
'df'?? . Do we mean, there is a magic copyField or something?

I think we need a bit more specific use-case here, then just an 
implementation/configuration. Especially, since a similar-but-different 
implementation in Elasticsearch does not fully match Solr's setup. 

 let /update/json/docs store the source json as well
 ---

 Key: SOLR-6633
 URL: https://issues.apache.org/jira/browse/SOLR-6633
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
  Labels: EaseOfUse
 Fix For: 5.0, Trunk

 Attachments: SOLR-6633.patch, SOLR-6633.patch


 it is a common requirement to store the entire JSON as a field in Solr. 
 we can have a extra param srcField=field_name to specify the field name
 the /update/json/docs is only useful when all the json fields are predefined 
 or in schemaless mode.
 The better option would be to store the content in a store only field and 
 index the data in another field in other modes
 the relevant section in solrconfig.xml
 {code:xml}
  initParams path=/update/json/docs
 lst name=defaults
   !--this ensures that the entire json doc will be stored verbatim into 
 one field--
   str name=srcField_src/str
   !--This means a the uniqueKeyField will be extracted from the fields 
 and
all fields go into the 'df' field. In this config df is already 
 configured to be 'text'
 --
   str name=mapUniqueKeyOnlytrue/str
 /lst
   /initParams
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6655) Improve SimplePostTool to easily specify target port/collection etc.

2014-11-18 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217250#comment-14217250
 ] 

Alexandre Rafalovitch commented on SOLR-6655:
-

[~janhoy]: spring.io may have a good base for a full-featured client with 
spring.data.solr, spring.shell and a bunch of other modules one could pull in. 
Might be a little *large* though :-)

 Improve SimplePostTool to easily specify target port/collection etc.
 

 Key: SOLR-6655
 URL: https://issues.apache.org/jira/browse/SOLR-6655
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Erik Hatcher
  Labels: difficulty-easy, impact-medium
 Fix For: 5.0, Trunk

 Attachments: SOLR-6655.patch


 Right now, the SimplePostTool has a single parameter 'url' that can be used 
 to send the request to a specific endpoint. It would make sense to allow 
 users to specify just the collection name, port etc. explicitly and 
 independently as separate parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6633) let /update/json/docs store the source json as well

2014-11-18 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6633:
-
Description: 
it is a common requirement to store the entire JSON as a field in Solr. 

we can have a extra param srcField=field_name to specify the field name

the /update/json/docs is only useful when all the json fields are predefined or 
in schemaless mode.

The better option would be to store the content in a store only field and index 
the data in another field in other modes

the relevant section in solrconfig.xml
{code:xml}
 initParams path=/update/json/docs
lst name=defaults
  !--this ensures that the entire json doc will be stored verbatim into 
one field--
  str name=srcField_src/str
  !--This means a the uniqueKeyField will be extracted from the fields and
   all fields go into the 'df' field. In this config df is already 
configured to be 'text'
--
  str name=mapUniqueKeyOnlytrue/str
   str name=dftext/str
/lst

  /initParams
{code}

  was:
it is a common requirement to store the entire JSON as a field in Solr. 

we can have a extra param srcField=field_name to specify the field name

the /update/json/docs is only useful when all the json fields are predefined or 
in schemaless mode.

The better option would be to store the content in a store only field and index 
the data in another field in other modes

the relevant section in solrconfig.xml
{code:xml}
 initParams path=/update/json/docs
lst name=defaults
  !--this ensures that the entire json doc will be stored verbatim into 
one field--
  str name=srcField_src/str
  !--This means a the uniqueKeyField will be extracted from the fields and
   all fields go into the 'df' field. In this config df is already 
configured to be 'text'
--
  str name=mapUniqueKeyOnlytrue/str
/lst

  /initParams
{code}


 let /update/json/docs store the source json as well
 ---

 Key: SOLR-6633
 URL: https://issues.apache.org/jira/browse/SOLR-6633
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
  Labels: EaseOfUse
 Fix For: 5.0, Trunk

 Attachments: SOLR-6633.patch, SOLR-6633.patch


 it is a common requirement to store the entire JSON as a field in Solr. 
 we can have a extra param srcField=field_name to specify the field name
 the /update/json/docs is only useful when all the json fields are predefined 
 or in schemaless mode.
 The better option would be to store the content in a store only field and 
 index the data in another field in other modes
 the relevant section in solrconfig.xml
 {code:xml}
  initParams path=/update/json/docs
 lst name=defaults
   !--this ensures that the entire json doc will be stored verbatim into 
 one field--
   str name=srcField_src/str
   !--This means a the uniqueKeyField will be extracted from the fields 
 and
all fields go into the 'df' field. In this config df is already 
 configured to be 'text'
 --
   str name=mapUniqueKeyOnlytrue/str
str name=dftext/str
 /lst
   /initParams
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well

2014-11-18 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217261#comment-14217261
 ] 

Noble Paul commented on SOLR-6633:
--

bq.Because, in Elasticsearch, the _src field is actually used as source for 
several operations..

This feature is not the same. it is a feature of the {{/update/json/docs}} 
requesthandler . We can't do it like ES because , the same document can be 
updated using other commands as well


bq.Finally, I am not clear on what this really means: all fields go into the 
'df' .

Solr is strongly typed , so to say. So it means we can't just put the content 
somewhere for searching. because all components use df as the default search 
field this component chooses to piggyback on the same field. The user can 
configure any other field as 'df' here. The next problem we need to address is 
that of uniqueKey. The component must extract the uniquekey field from the json 
itself or it should create one.   That is the purpose of mapUniqueKeyOnly 
param

We are not trying to be ES here. The use case is this.
 User has a bunch of json documents. He needs to index the data without 
configuring anything in the schema. The search result has to return some stored 
fields. Because Solr is strongly typed we can't store them in individual 
fields . So we must store the whole thing in some field and it made sense to 
store it in json itself.



 let /update/json/docs store the source json as well
 ---

 Key: SOLR-6633
 URL: https://issues.apache.org/jira/browse/SOLR-6633
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
  Labels: EaseOfUse
 Fix For: 5.0, Trunk

 Attachments: SOLR-6633.patch, SOLR-6633.patch


 it is a common requirement to store the entire JSON as a field in Solr. 
 we can have a extra param srcField=field_name to specify the field name
 the /update/json/docs is only useful when all the json fields are predefined 
 or in schemaless mode.
 The better option would be to store the content in a store only field and 
 index the data in another field in other modes
 the relevant section in solrconfig.xml
 {code:xml}
  initParams path=/update/json/docs
 lst name=defaults
   !--this ensures that the entire json doc will be stored verbatim into 
 one field--
   str name=srcField_src/str
   !--This means a the uniqueKeyField will be extracted from the fields 
 and
all fields go into the 'df' field. In this config df is already 
 configured to be 'text'
 --
   str name=mapUniqueKeyOnlytrue/str
str name=dftext/str
 /lst
   /initParams
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217273#comment-14217273
 ] 

ASF subversion and git services commented on LUCENE-6062:
-

Commit 1640464 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1640464 ]

LUCENE-6062: throw exception instead of doing nothing, when sorting/grouping 
etc on misconfigured field

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6658) SearchHandler should accept POST requests with JSON data in content stream for customized plug-in components

2014-11-18 Thread Mark Peng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217290#comment-14217290
 ] 

Mark Peng commented on SOLR-6658:
-

Is there committer could help with the patch of this issue? Or any alternative 
to solve this issue? It is very crucial for us.

Best regards,
Mark

 SearchHandler should accept POST requests with JSON data in content stream 
 for customized plug-in components
 

 Key: SOLR-6658
 URL: https://issues.apache.org/jira/browse/SOLR-6658
 Project: Solr
  Issue Type: Improvement
  Components: search, SearchComponents - other
Affects Versions: 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9, 4.9.1, 4.10, 4.10.1
Reporter: Mark Peng
 Attachments: SOLR-6658.patch


 This issue relates to the following one:
 *Return HTTP error on POST requests with no Content-Type*
 [https://issues.apache.org/jira/browse/SOLR-5517]
 The original consideration of the above is to make sure that incoming POST 
 requests to SearchHandler have corresponding content-type specified. That is 
 quite reasonable, however, the following lines in the patch cause to reject 
 all POST requests with content stream data, which is not necessary to that 
 issue:
 {code}
 Index: solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
 ===
 --- solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java   
 (revision 1546817)
 +++ solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java   
 (working copy)
 @@ -22,9 +22,11 @@
  import java.util.List;
  
  import org.apache.solr.common.SolrException;
 +import org.apache.solr.common.SolrException.ErrorCode;
  import org.apache.solr.common.params.CommonParams;
  import org.apache.solr.common.params.ModifiableSolrParams;
  import org.apache.solr.common.params.ShardParams;
 +import org.apache.solr.common.util.ContentStream;
  import org.apache.solr.core.CloseHook;
  import org.apache.solr.core.PluginInfo;
  import org.apache.solr.core.SolrCore;
 @@ -165,6 +167,10 @@
{
  // int sleep = req.getParams().getInt(sleep,0);
  // if (sleep  0) {log.error(SLEEPING for  + sleep);  
 Thread.sleep(sleep);}
 +if (req.getContentStreams() != null  
 req.getContentStreams().iterator().hasNext()) {
 +  throw new SolrException(ErrorCode.BAD_REQUEST, Search requests cannot 
 accept content streams);
 +}
 +
  ResponseBuilder rb = new ResponseBuilder(req, rsp, components);
  if (rb.requestInfo != null) {
rb.requestInfo.setResponseBuilder(rb);
 {code}
 We are using Solr 4.5.1 in our production services and considering to upgrade 
 to 4.9/5.0 to support more features. But due to this issue, we cannot have a 
 chance to upgrade because we have some important customized SearchComponent 
 plug-ins that need to get POST data from SearchHandler to do further 
 processing.
 Therefore, we are requesting if it is possible to remove the content stream 
 constraint shown above and to let SearchHandler accept POST requests with 
 *Content-Type: application/json* to allow further components to get the data.
 Thank you.
 Best regards,
 Mark Peng



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6633) let /update/json/docs store the source json as well

2014-11-18 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217295#comment-14217295
 ] 

Alexandre Rafalovitch commented on SOLR-6633:
-

Is this somehow superseding the behavior in SOLR-6304 and 
http://lucidworks.com/blog/indexing-custom-json-data/ ? I mean the field 
extraction code can already do ID mapping by specifying an appropriate path, 
right? And for 'df', would you need to specify it as a param (like in the 
example 4 in the article)?

And I am still trying to wrap my head about the use case. I don't expect users 
not to want to configure *anything*. At least the dates would need to be 
parsed/detected. And, usually, after the initial dump, the users go back and 
start adding specific definitions field by field, type by type (and reindex). 
Is that part of this scenario as well? 

P.s. I know Solr cannot clone Elasticsearch. I was just making sure that we are 
not somehow missing Solr-specifics by assuming Elasticsearch like behavior. 
Perhaps having the field also called *_all* was what confused me.


 let /update/json/docs store the source json as well
 ---

 Key: SOLR-6633
 URL: https://issues.apache.org/jira/browse/SOLR-6633
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
  Labels: EaseOfUse
 Fix For: 5.0, Trunk

 Attachments: SOLR-6633.patch, SOLR-6633.patch


 it is a common requirement to store the entire JSON as a field in Solr. 
 we can have a extra param srcField=field_name to specify the field name
 the /update/json/docs is only useful when all the json fields are predefined 
 or in schemaless mode.
 The better option would be to store the content in a store only field and 
 index the data in another field in other modes
 the relevant section in solrconfig.xml
 {code:xml}
  initParams path=/update/json/docs
 lst name=defaults
   !--this ensures that the entire json doc will be stored verbatim into 
 one field--
   str name=srcField_src/str
   !--This means a the uniqueKeyField will be extracted from the fields 
 and
all fields go into the 'df' field. In this config df is already 
 configured to be 'text'
 --
   str name=mapUniqueKeyOnlytrue/str
str name=dftext/str
 /lst
   /initParams
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217307#comment-14217307
 ] 

ASF subversion and git services commented on LUCENE-6064:
-

Commit 1640469 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1640469 ]

LUCENE-6064: throw exception instead of doing nothing, when sorting/grouping 
etc on misconfigured field

 throw exception during sort for misconfigured field
 ---

 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6064.patch


 If you sort on field X, and it has no docvalues, today it will silently treat 
 it as all values missing. This can be very confusing since it just means 
 nothing will happen at all.
 But there is a distinction between no docs happen to have a value for this 
 field and field isn't configured correctly. The latter should get an 
 exception, telling the user to index docvalues, or wrap the reader with 
 UninvertingReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6064) throw exception during sort for misconfigured field

2014-11-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6064.
-
   Resolution: Fixed
Fix Version/s: Trunk
   5.0

 throw exception during sort for misconfigured field
 ---

 Key: LUCENE-6064
 URL: https://issues.apache.org/jira/browse/LUCENE-6064
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6064.patch


 If you sort on field X, and it has no docvalues, today it will silently treat 
 it as all values missing. This can be very confusing since it just means 
 nothing will happen at all.
 But there is a distinction between no docs happen to have a value for this 
 field and field isn't configured correctly. The latter should get an 
 exception, telling the user to index docvalues, or wrap the reader with 
 UninvertingReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217309#comment-14217309
 ] 

Robert Muir commented on LUCENE-6062:
-

I will first go back to 5.x, then see if the test fails in 4.x, and how 
feasible it is to backport.

The code differs significantly here so the problem may have been recently 
introduced.

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 2215 - Still Failing

2014-11-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/2215/

2 tests failed.
REGRESSION:  org.apache.solr.SampleTest.testSimple

Error Message:
SolrCore 'collection1' is not available due to init failure: Error 
instantiating class: 'org.apache.lucene.util.LuceneTestCase$3'

Stack Trace:
org.apache.solr.common.SolrException: SolrCore 'collection1' is not available 
due to init failure: Error instantiating class: 
'org.apache.lucene.util.LuceneTestCase$3'
at 
__randomizedtesting.SeedInfo.seed([2E6E8F9ADADFEACF:16DDAB64FD2C3E1E]:0)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:763)
at org.apache.solr.util.TestHarness.getCoreInc(TestHarness.java:219)
at org.apache.solr.util.TestHarness.update(TestHarness.java:235)
at 
org.apache.solr.util.BaseTestHarness.checkUpdateStatus(BaseTestHarness.java:282)
at 
org.apache.solr.util.BaseTestHarness.validateUpdate(BaseTestHarness.java:252)
at org.apache.solr.SolrTestCaseJ4.checkUpdateU(SolrTestCaseJ4.java:677)
at org.apache.solr.SolrTestCaseJ4.assertU(SolrTestCaseJ4.java:656)
at org.apache.solr.SampleTest.testSimple(SampleTest.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 

[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217328#comment-14217328
 ] 

ASF subversion and git services commented on LUCENE-6062:
-

Commit 1640471 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1640471 ]

LUCENE-6062: pass correct fieldinfos to dv producer when the segment has updates

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5317) [PATCH] Concordance capability

2014-11-18 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5317:

Attachment: lucene5317v1.patch

I merged in my local updates and I pushed these to my fork on github 
[link|https://github.com/tballison/lucene-solr].  I didn't have luck posting 
this to the review board.  When I tried to post it, I entered the base 
directory and was returned to the starting page without any error message.

 [PATCH] Concordance capability
 --

 Key: LUCENE-5317
 URL: https://issues.apache.org/jira/browse/LUCENE-5317
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz, 
 lucene5317v1.patch


 This patch enables a Lucene-powered concordance search capability.
 Concordances are extremely useful for linguists, lawyers and other analysts 
 performing analytic search vs. traditional snippeting/document retrieval 
 tasks.  By analytic search, I mean that the user wants to browse every time 
 a term appears (or at least the topn)  in a subset of documents and see the 
 words before and after.  
 Concordance technology is far simpler and less interesting than IR relevance 
 models/methods, but it can be extremely useful for some use cases.
 Traditional concordance sort orders are available (sort on words before the 
 target, words after, target then words before and target then words after).
 Under the hood, this is running SpanQuery's getSpans() and reanalyzing to 
 obtain character offsets.  There is plenty of room for optimizations and 
 refactoring.
 Many thanks to my colleague, Jason Robinson, for input on the design of this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5317) [PATCH] Concordance capability

2014-11-18 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217330#comment-14217330
 ] 

Tim Allison edited comment on LUCENE-5317 at 11/19/14 3:12 AM:
---

I merged in my local updates and I pushed these to my fork on github 
[link|https://github.com/tballison/lucene-solr]. 

 I didn't have luck posting this to the review board.  When I tried to post it, 
I entered the base directory and was returned to the starting page without any 
error message.  For the record, I'm sure that this is user error.


was (Author: talli...@mitre.org):
I merged in my local updates and I pushed these to my fork on github 
[link|https://github.com/tballison/lucene-solr].  I didn't have luck posting 
this to the review board.  When I tried to post it, I entered the base 
directory and was returned to the starting page without any error message.

 [PATCH] Concordance capability
 --

 Key: LUCENE-5317
 URL: https://issues.apache.org/jira/browse/LUCENE-5317
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5317.patch, concordance_v1.patch.gz, 
 lucene5317v1.patch


 This patch enables a Lucene-powered concordance search capability.
 Concordances are extremely useful for linguists, lawyers and other analysts 
 performing analytic search vs. traditional snippeting/document retrieval 
 tasks.  By analytic search, I mean that the user wants to browse every time 
 a term appears (or at least the topn)  in a subset of documents and see the 
 words before and after.  
 Concordance technology is far simpler and less interesting than IR relevance 
 models/methods, but it can be extremely useful for some use cases.
 Traditional concordance sort orders are available (sort on words before the 
 target, words after, target then words before and target then words after).
 Under the hood, this is running SpanQuery's getSpans() and reanalyzing to 
 obtain character offsets.  There is plenty of room for optimizations and 
 refactoring.
 Many thanks to my colleague, Jason Robinson, for input on the design of this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217335#comment-14217335
 ] 

ASF subversion and git services commented on LUCENE-6062:
-

Commit 1640472 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1640472 ]

LUCENE-6062: pass correct fieldinfos to dv producer when the segment has updates

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6062) Index corruption from numeric DV updates

2014-11-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6062.
-
   Resolution: Fixed
Fix Version/s: (was: 4.10.3)

The bug affects 4.10.x, but the fix would not be easy. On 5.0 fieldinfos 
handling has been simplified considerably around here, making it easy to pass 
the correct ones to producers.

I think this is too much risk to backport.

 Index corruption from numeric DV updates
 

 Key: LUCENE-6062
 URL: https://issues.apache.org/jira/browse/LUCENE-6062
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6062.patch, LUCENE-6062.patch


 I hit this while working on on LUCENE-6005: when cutting over 
 TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
 additional docValues in the test, and this this:
 {noformat}
 There was 1 failure:
 1) 
 testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
 java.io.FileNotFoundException: _1_Asserting_0.dvm in 
 dir=RAMDirectory@259847e5 
 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
   at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
   at 
 org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
   at 
 org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.init(Lucene50DocValuesProducer.java:130)
   at 
 org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
   at 
 org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:267)
   at 
 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
   at 
 org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
   at 
 org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
   at 
 org.apache.lucene.index.SegmentDocValuesProducer.init(SegmentDocValuesProducer.java:63)
   at 
 org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
   at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:109)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
   at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
   at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
   at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
   at 
 org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
 {noformat}
 A one-line change to the existing test (on trunk) causes this corruption:
 {noformat}
 Index: 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
 ===
 --- 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (revision 1639580)
 +++ 
 lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
 (working copy)
 @@ -750,6 +750,7 @@
  // second segment with no NDV
  doc = new Document();
  doc.add(new StringField(id, doc1, Store.NO));
 +doc.add(new NumericDocValuesField(foo, 3));
  writer.addDocument(doc);
  doc = new Document();
  doc.add(new StringField(id, doc2, Store.NO)); // document that isn't 
 updated
 {noformat}
 For some reason, the base doc values for the 2nd segment is not being 
 written, but clearly should have (to hold field foo)... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >