date:20140520


 [ 
https://issues.apache.org/jira/browse/SOLR-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-5609.
--

   Resolution: Fixed
Fix Version/s: (was: 4.9)
   4.8
 Assignee: Noble Paul

resolved

 Don't let cores create slices/named replicas
 

 Key: SOLR-5609
 URL: https://issues.apache.org/jira/browse/SOLR-5609
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.8, 5.0

 Attachments: SOLR-5609.patch, SOLR-5609.patch, SOLR-5609_5130.patch, 
 SOLR-5609_5130.patch, SOLR-5609_5130.patch, SOLR-5609_5130.patch


 In SolrCloud, it is possible for a core to come up in any node , and register 
 itself with an arbitrary slice/coreNodeName. This is a legacy requirement and 
 we would like to make it only possible for Overseer to initiate creation of 
 slice/replicas
 We plan to introduce cluster level properties at the top level
 /cluster-props.json
 {code:javascript}
 {
 noSliceOrReplicaByCores:true
 }
 {code}
 If this property is set to true, cores won't be able to send STATE commands 
 with unknown slice/coreNodeName . Those commands will fail at Overseer. This 
 is useful for SOLR-5310 / SOLR-5311 where a core/replica is deleted by a 
 command and  it comes up later and tries to create a replica/slice



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5096) Introduce a new mode to force collection api usage


 [ 
https://issues.apache.org/jira/browse/SOLR-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-5096.
--

Resolution: Duplicate

 Introduce a new mode to force collection api usage
 --

 Key: SOLR-5096
 URL: https://issues.apache.org/jira/browse/SOLR-5096
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0

 Attachments: SOLR-5096.patch


 In SOLR-4808, a special mode to disallow bootstrap parameters was discussed 
 so that cluster management is done via collection API always. A tentative 
 name was collectionApiMode. This issue is to find a better name and to 
 implement the mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically

[
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002928#comment-14002928
]

Shai Erera commented on LUCENE-5680:

Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think
we should preserve that capability here. As for unsetting, this could be very
useful e.g. for a onsale boolean field or the saleprice etc. I think you
proposed the unset capability while I was working on LUCENE-5189, but I
cannot find the reference :).

I agree we shouldn't bloat the API if unnecessary, and when I wrote the
{{update(Term,Field...)}} version it looked very simple and tests even passed.
So I think this direction is promising, but we should allow unsetting. Perhaps
we can put somewhere a constant {{NumericDocValuesField UNSET = ...}}? Maybe it
can be on IndexWriter .. the thing is, we don't need it for e.g. Binary, since
they take a BytesRef and at least for now, allow passing a null value, but we
can have a similar UNSET constant for binary too.

Allow updating multiple DocValues fields atomically
---

Key: LUCENE-5680
URL: https://issues.apache.org/jira/browse/LUCENE-5680
Project: Lucene - Core
Issue Type: New Feature
Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
Attachments: LUCENE-5680.patch, LUCENE-5680.patch

This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg)
-- it would be good if we can allow updating several doc-values fields,
atomically. It will also improve/simplify our tests, where today we index two
fields, e.g. the field itself and a control field. In some multi-threaded
tests, since we cannot be sure which updates came through first, we limit the
test such that each thread updates a different set of fields, otherwise they
will collide and it will be hard to verify the index in the end.
I was working on a patch and it looks pretty simple to do, will post a patch
shortly.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5680) Allow updating multiple DocValues fields atomically

[
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002928#comment-14002928
]

Shai Erera edited comment on LUCENE-5680 at 5/20/14 8:32 AM:
-

Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think
we should preserve that capability here. As for unsetting, this could be very
useful e.g. for a saleprice field as well as any other field which is
transient. I think you proposed the unset capability while I was working on
LUCENE-5189, but I cannot find the reference :).

was (Author: shaie):
Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think
we should preserve that capability here. As for unsetting, this could be very
useful e.g. for a onsale boolean field or the saleprice etc. I think you
proposed the unset capability while I was working on LUCENE-5189, but I
cannot find the reference :).

Allow updating multiple DocValues fields atomically
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers


[ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002963#comment-14002963
 ] 

Michael McCandless commented on LUCENE-5618:


+1, thanks Shai.

 DocValues updates send wrong fieldinfos to codec producers
 --

 Key: LUCENE-5618
 URL: https://issues.apache.org/jira/browse/LUCENE-5618
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Shai Erera
Priority: Blocker
 Fix For: 4.9

 Attachments: LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, 
 LUCENE-5618.patch, LUCENE-5618.patch


 Spinoff from LUCENE-5616.
 See the example there, docvalues readers get a fieldinfos, but it doesn't 
 contain the correct ones, so they have invalid field numbers at read time.
 This should really be fixed. Maybe a simple solution is to not write 
 batches of fields in updates but just have only one field per gen? 
 This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5679:
---

Attachment: LUCENE-5679.patch

Patch removes the single-arg delDocs Term/Query variants. Everything compiles 
and tests pass. I'll sun jdocs tests too, though eclipse showed no additional 
errors about referencing those methods.

Do you think it's OK to backport to 4x? The only concern I have is if apps will 
upgrade to e.g. 4.9 by only dropping the 4.9 jar, not compiling their code as 
well. I don't know if people still do that though ... :). Anyway, if we want to 
keep that, we can deprecate them in 4x.

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6095) SolrCloud cluster can end up without an overseer

Shalin Shekhar Mangar created SOLR-6095:
---

 Summary: SolrCloud cluster can end up without an overseer
 Key: SOLR-6095
 URL: https://issues.apache.org/jira/browse/SOLR-6095
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8
Reporter: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0


We have a large cluster running on ec2 which occasionally ends up without an 
overseer after a rolling restart. We always restart our overseer nodes at the 
very last otherwise we end up with a large number of shards that can't recover 
properly.

This cluster is running a custom branch forked from 4.8 and has SOLR-5473, 
SOLR-5495 and SOLR-5468 applied. We have a large number of small collections 
(120 collections each with approx 5M docs) on 16 Solr nodes. We are also using 
the overseer roles feature to designate two specified nodes as overseers. 
However, I think the problem that we're seeing is not specific to the overseer 
roles feature.

As soon as the overseer was shutdown, we saw the following on the node which 
was next in line to become the overseer:
{code}
2014-05-20 09:55:39,261 [main-EventThread] INFO  solr.cloud.ElectionContext  - 
I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr
2014-05-20 09:55:39,265 [main-EventThread] WARN  solr.cloud.LeaderElector  - 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /overseer_elect/leader
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373)
at 
org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110)
at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
{code}

When the overseer leader node is gracefully shutdown, we get the following in 
the logs:
{code}
2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer  - Exception in 
Overseer main queue loop
org.apache.solr.common.SolrException: Could not load collection from ZK:sm12
at 
org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
at 
org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
at 
org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
at 
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
at 
org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
at 
org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
at 
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
at 
org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767)
... 4 more
2014-05-20 09:55:39,254 [Thread-63] INFO  solr.cloud.Overseer  - Overseer Loop 
exiting : ec2-xx.compute-1.amazonaws.com:8986_solr
2014-05-20 09:55:39,256 [main-EventThread] WARN  common.cloud.ZkStateReader  - 
ZooKeeper watch triggered, but Solr cannot talk to ZK
2014-05-20 09:55:39,259 [ShutdownMonitor] INFO  server.handler.ContextHandler  
- stopped 
o.e.j.w.WebAppContext{/solr,file:/vol0/cloud86/solr-webapp/webapp/},/vol0/cloud86/webapps/solr.war
{code}

Notice how the overseer kept on running almost till the last point i.e. until 
the jetty context stopped. On some runs, we got the following on the overseer 
leader node

[jira] [Updated] (SOLR-6085) Suggester crashes

2014-05-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Ferrández updated SOLR-6085:
--

Affects Version/s: 4.8

 Suggester crashes
 -

 Key: SOLR-6085
 URL: https://issues.apache.org/jira/browse/SOLR-6085
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.7.1, 4.8
Reporter: Jorge Ferrández
 Fix For: 4.7.3, 4.8.1, 4.9, 5.0


 AnalyzingInfixSuggester class fails when is queried with a ß character 
 (ezsett) used in German, but it doesn't happen for all data or for all words 
 containing this character. The exception reported is the following: 
 {code:java}
 response
 lst name=responseHeader
 int name=status500/int
 int name=QTime18/int
 /lst
 lst name=error
 str name=msgString index out of range: 5/str
 str name=trace
 java.lang.StringIndexOutOfBoundsException: String index out of range: 5 at 
 java.lang.String.substring(String.java:1907) at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.addPrefixMatch(AnalyzingInfixSuggester.java:575)
  at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.highlight(AnalyzingInfixSuggester.java:525)
  at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.createResults(AnalyzingInfixSuggester.java:479)
  at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:437)
  at 
 org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:338)
  at 
 org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:181)
  at 
 org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:232)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:217)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
  at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) 
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Thread.java:744)
 /str
 int name=code500/int
 /lst
 /response
 {code}
 With this query
 http://localhost:8983/solr/suggest_de?suggest.q=gieß  (for gießen, which is 
 actually in the data)
 The problem seems to be that we use ASCIIFolding to unify ss and ß, which are 
 both valid alternatives in German. 
 Looking at

[jira] [Commented] (SOLR-6095) SolrCloud cluster can end up without an overseer


[ 
https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003046#comment-14003046
 ] 

Shalin Shekhar Mangar commented on SOLR-6095:
-

I also opened SOLR-6091 but that didn't help.

 SolrCloud cluster can end up without an overseer
 

 Key: SOLR-6095
 URL: https://issues.apache.org/jira/browse/SOLR-6095
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8
Reporter: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0


 We have a large cluster running on ec2 which occasionally ends up without an 
 overseer after a rolling restart. We always restart our overseer nodes at the 
 very last otherwise we end up with a large number of shards that can't 
 recover properly.
 This cluster is running a custom branch forked from 4.8 and has SOLR-5473, 
 SOLR-5495 and SOLR-5468 applied. We have a large number of small collections 
 (120 collections each with approx 5M docs) on 16 Solr nodes. We are also 
 using the overseer roles feature to designate two specified nodes as 
 overseers. However, I think the problem that we're seeing is not specific to 
 the overseer roles feature.
 As soon as the overseer was shutdown, we saw the following on the node which 
 was next in line to become the overseer:
 {code}
 2014-05-20 09:55:39,261 [main-EventThread] INFO  solr.cloud.ElectionContext  
 - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr
 2014-05-20 09:55:39,265 [main-EventThread] WARN  solr.cloud.LeaderElector  - 
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /overseer_elect/leader
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
   at 
 org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432)
   at 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373)
   at 
 org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551)
   at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142)
   at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110)
   at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
   at 
 org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}
 When the overseer leader node is gracefully shutdown, we get the following in 
 the logs:
 {code}
 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer  - Exception in 
 Overseer main queue loop
 org.apache.solr.common.SolrException: Could not load collection from ZK:sm12
   at 
 org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
   at 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
   at 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
   at 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
   at 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
   at 
 org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
   at 
 org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767)
   ... 4 more
 2014-05-20 09:55:39,254 [Thread-63] INFO  solr.cloud.Overseer  - Overseer 
 Loop exiting : ec2-xx.compute-1.amazonaws.com:8986_solr
 2014-05-20 09:55:39,256 [main-EventThread] WARN  common.cloud.ZkStateReader  
 - ZooKeeper watch triggered, but Solr cannot talk to ZK
 2014-05-20 09:55:39,259 [ShutdownMonitor] INFO  server.handler.ContextHandler 
  - stopped

[jira] [Resolved] (LUCENE-5154) ban tests from writing to CWD


 [ 
https://issues.apache.org/jira/browse/LUCENE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-5154.
-

Resolution: Duplicate

Incorporated into LUCENE-5650

 ban tests from writing to CWD
 -

 Key: LUCENE-5154
 URL: https://issues.apache.org/jira/browse/LUCENE-5154
 Project: Lucene - Core
  Issue Type: Test
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-5154.patch


 Currently each forked jvm has cwd = tempDir = .
 This provides some minimal protection against tests in different jvms from 
 interfering with each other, but we can do much better by splitting these 
 concerns: and setting cwd = . and tempDir = ./temp
 Tests that write files to CWD can confuse IDE users because they can create 
 dirty checkouts or other issues between different runs, and of course can 
 interfere with other tests in the *same* jvm (there are other possible ways 
 to do this to).
 So a test like this should fail with SecurityException, but currently does 
 not.
 {code}
 public void testBogus() throws Exception {
   File file = new File(foo.txt);
   FileOutputStream os = new FileOutputStream(file);
   os.write(1);
   os.close();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5154) ban tests from writing to CWD


 [ 
https://issues.apache.org/jira/browse/LUCENE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-5154:
---

Assignee: Dawid Weiss

 ban tests from writing to CWD
 -

 Key: LUCENE-5154
 URL: https://issues.apache.org/jira/browse/LUCENE-5154
 Project: Lucene - Core
  Issue Type: Test
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-5154.patch


 Currently each forked jvm has cwd = tempDir = .
 This provides some minimal protection against tests in different jvms from 
 interfering with each other, but we can do much better by splitting these 
 concerns: and setting cwd = . and tempDir = ./temp
 Tests that write files to CWD can confuse IDE users because they can create 
 dirty checkouts or other issues between different runs, and of course can 
 interfere with other tests in the *same* jvm (there are other possible ways 
 to do this to).
 So a test like this should fail with SecurityException, but currently does 
 not.
 {code}
 public void testBogus() throws Exception {
   File file = new File(foo.txt);
   FileOutputStream os = new FileOutputStream(file);
   os.write(1);
   os.close();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir


[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003071#comment-14003071
 ] 

Dawid Weiss commented on LUCENE-5650:
-

I handled some of the velocity and DIH. Running the tests now.

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Survey on Project Conventions

2014-05-20 Thread Martin Brandtner

Hello

My name is Martin Brandtner [1] and I’m a software engineering researcher
at the University of Zurich, Switzerland.
Together with Philipp Leitner [2], I currently work on an approach to
detect violations of project conventions based on data from the source code
repository, the issue tracker (e.g. Jira), and the build system (e.g.
Jenkins).

One example for such a project convention is: “You need to make sure that
the commit message contains at least the name of the contributor and
ideally a reference to the Bugzilla or JIRA issue where the patch was
submitted.” [3]

The idea is that our approach can detect violation of such a convention
automatically and therefore support the development process.

First of all we need conventions and that’s why we ask you to take part in
our survey. In the survey, we present five conventions and want you to rate
their relevance in your Apache project. Everybody contributing to your
Apache project can take part in this survey because we also want to see if
different roles may have different opinions about a convention.
The survey is totally anonymous and it will take about 15 minutes to answer
it.

We would be happy if you could fill out our survey under:
http://ww3.unipark.de/uc/SEAL_Research/1abe/ before May 30, 2014.

With the data collected in this survey we will implement a convention
violation detection in our tool called SQA-Timeline [4]. If your are
interested in our work, contact us via email or provide your email address
in the survey.

Best regards,
Martin and Philipp

[1] http://www.ifi.uzh.ch/seal/people/brandtner.html
[2] http://www.ifi.uzh.ch/seal/people/leitner.html
[3] http://www.apache.org/dev/committers.html#applying-patches
[4] https://www.youtube.com/watch?v=ZIsOODUapAE

[ANNOUNCE] Apache Lucene 4.8.1 released

May 2014, Apache Lucene™ 4.8.1 available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.1

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text search, especially
cross-platform.

The release is available for immediate download at:

http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.8.1 includes 15 bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[ANNOUNCE] Apache Solr 4.8.1 released

May 2014, Apache Solr™ 4.8.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.1 is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 4.8.1 includes 10 bug fixes, as well as Lucene 4.8.1 and its bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6096) Support Update and Delete on nested documents

2014-05-20 Thread Thomas Scheffler (JIRA)

Thomas Scheffler created SOLR-6096:
--

 Summary: Support Update and Delete on nested documents
 Key: SOLR-6096
 URL: https://issues.apache.org/jira/browse/SOLR-6096
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7.2
Reporter: Thomas Scheffler


When using nested or child document. Update and delete operation on the root 
document should also affect the nested documents, as no child can exist without 
its parent :-)

Example

{code:xml|title=First Import}
doc
  field name=id1/field
  field name=titleArticle with author/field
  doc
field name=nameSmith, John/field
field name=roleauthor/field
  /doc
/doc
{code}

If I change my mind and the author was not named *John* but *_Jane_*:

{code:xml|title=Changed name of author of '1'}
doc
  field name=id1/field
  field name=titleArticle with author/field
  doc
field name=nameSmith, Jane/field
field name=roleauthor/field
  /doc
/doc
{code}

I would expect that John is not in the index anymore. Currently he is. There 
might also be the case that any subdocument is removed by an update:

{code:xml|title=Remove author}
doc
  field name=id1/field
  field name=titleArticle without author/field
/doc
{code}

This should affect a delete on all nested documents, too. The same way all 
nested documents should be deleted if I delete the root document:

{code:xml|title=Deletion of '1'}
delete
  id1/id
  !-- implying also
query_root_:1/query
   --
/delete
{code}

This is currently possible to do all this stuff on client side by issuing 
additional request to delete document before every update. It would be more 
efficient if this could be handled on SOLR side. One would benefit on atomic 
update. The biggest plus shows when using delete-by-query. 

{code:xml|title=Deletion of '1' by query}
delete
  querytitle:*/query
  !-- implying also
query_root_:1/query
   --
/delete
{code}



In that case one would not have to first query all documents and issue deletes 
by those id and every document that are nested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003162#comment-14003162
 ] 

Robert Muir commented on LUCENE-5680:
-

I don't think i proposed unsetting :)

I don't see how it provides anything over an explicit 'onsale' and 'saleprice' 
that you update atomically here, which will work easier in general for the 
search APIs (ranking etc)

So I really don't think this should drive the API at all, I'm still not seeing 
a use case.


 Allow updating multiple DocValues fields atomically
 ---

 Key: LUCENE-5680
 URL: https://issues.apache.org/jira/browse/LUCENE-5680
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5680.patch, LUCENE-5680.patch


 This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) 
 -- it would be good if we can allow updating several doc-values fields, 
 atomically. It will also improve/simplify our tests, where today we index two 
 fields, e.g. the field itself and a control field. In some multi-threaded 
 tests, since we cannot be sure which updates came through first, we limit the 
 test such that each thread updates a different set of fields, otherwise they 
 will collide and it will be hard to verify the index in the end.
 I was working on a patch and it looks pretty simple to do, will post a patch 
 shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #632: POMs out of sync

2014-05-20 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/632/

1 tests failed.
FAILED:  org.apache.solr.cloud.HttpPartitionTest.testDistribSearch

Error Message:
No registered leader was found after waiting for 6ms , collection: 
c8n_1x3_lf slice: shard1

Stack Trace:
org.apache.solr.common.SolrException: No registered leader was found after 
waiting for 6ms , collection: c8n_1x3_lf slice: shard1
at 
__randomizedtesting.SeedInfo.seed([CBB469512D717EC6:4A52E7495A2E1EFA]:0)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:545)
at 
org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:348)
at 
org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:148)




Build Log:
[...truncated 54738 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:490: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:182: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/extra-targets.xml:77:
 Java returned: 1

Total time: 151 minutes 35 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003206#comment-14003206
 ] 

ASF subversion and git services commented on SOLR-5468:
---

Commit 1596234 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1596234 ]

SOLR-5468: Add wait loop to see replicas become active after restoring 
partitions; to address intermittent Jenkins test failures.

 Option to enforce a majority quorum approach to accepting updates in SolrCloud
 --

 Key: SOLR-5468
 URL: https://issues.apache.org/jira/browse/SOLR-5468
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.5
 Environment: All
Reporter: Timothy Potter
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch


 I've been thinking about how SolrCloud deals with write-availability using 
 in-sync replica sets, in which writes will continue to be accepted so long as 
 there is at least one healthy node per shard.
 For a little background (and to verify my understanding of the process is 
 correct), SolrCloud only considers active/healthy replicas when acknowledging 
 a write. Specifically, when a shard leader accepts an update request, it 
 forwards the request to all active/healthy replicas and only considers the 
 write successful if all active/healthy replicas ack the write. Any down / 
 gone replicas are not considered and will sync up with the leader when they 
 come back online using peer sync or snapshot replication. For instance, if a 
 shard has 3 nodes, A, B, C with A being the current leader, then writes to 
 the shard will continue to succeed even if B  C are down.
 The issue is that if a shard leader continues to accept updates even if it 
 loses all of its replicas, then we have acknowledged updates on only 1 node. 
 If that node, call it A, then fails and one of the previous replicas, call it 
 B, comes back online before A does, then any writes that A accepted while the 
 other replicas were offline are at risk to being lost. 
 SolrCloud does provide a safe-guard mechanism for this problem with the 
 leaderVoteWait setting, which puts any replicas that come back online before 
 node A into a temporary wait state. If A comes back online within the wait 
 period, then all is well as it will become the leader again and no writes 
 will be lost. As a side note, sys admins definitely need to be made more 
 aware of this situation as when I first encountered it in my cluster, I had 
 no idea what it meant.
 My question is whether we want to consider an approach where SolrCloud will 
 not accept writes unless there is a majority of replicas available to accept 
 the write? For my example, under this approach, we wouldn't accept writes if 
 both BC failed, but would if only C did, leaving A  B online. Admittedly, 
 this lowers the write-availability of the system, so may be something that 
 should be tunable?
 From Mark M: Yeah, this is kind of like one of many little features that we 
 have just not gotten to yet. I’ve always planned for a param that let’s you 
 say how many replicas an update must be verified on before responding 
 success. Seems to make sense to fail that type of request early if you notice 
 there are not enough replicas up to satisfy the param to begin with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

2014-05-20 Thread Varun Thacker (JIRA)

Varun Thacker created LUCENE-5688:
-

 Summary: NumericDocValues fields with sparse data can be 
compressed better 
 Key: LUCENE-5688
 URL: https://issues.apache.org/jira/browse/LUCENE-5688
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor


I ran into this problem where I had a dynamic field in Solr and indexed data 
into lots of fields. For each field only a few documents had actual values and 
the remaining documents the default value ( 0 ) got indexed. Now when I merge 
segments, the index size jumps up.

For example I have 10 segments - Each with 1 DV field. When I merge segments 
into 1 that segment will contain all 10 DV fields with lots if 0s. 

This was the motivation behind trying to come up with a compression for a use 
case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

2014-05-20 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-5688:
--

Attachment: LUCENE-5688.patch

Here is a quick patch. Wanted to get some feedback on the approach.

When I run  the showIndexBloat method without the SPARSE_COMPRESSED changes, 
this is the size of the docValues data - 
{noformat}
-rw-r--r--  1 varun  wheel   9.9M May 20 18:28 _a_Lucene45_0.dvd
-rw-r--r--  1 varun  wheel   312B May 20 18:28 _a_Lucene45_0.dvm
{noformat}

With the SPARSE_COMPRESSED changes
{noformat}
-rw-r--r--  1 varun  wheel   2.7M May 20 18:51 _a_Lucene45_0.dvd
-rw-r--r--  1 varun  wheel   352B May 20 18:51 _a_Lucene45_0.dvm
{noformat}

 NumericDocValues fields with sparse data can be compressed better 
 --

 Key: LUCENE-5688
 URL: https://issues.apache.org/jira/browse/LUCENE-5688
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
 Attachments: LUCENE-5688.patch


 I ran into this problem where I had a dynamic field in Solr and indexed data 
 into lots of fields. For each field only a few documents had actual values 
 and the remaining documents the default value ( 0 ) got indexed. Now when I 
 merge segments, the index size jumps up.
 For example I have 10 segments - Each with 1 DV field. When I merge segments 
 into 1 that segment will contain all 10 DV fields with lots if 0s. 
 This was the motivation behind trying to come up with a compression for a use 
 case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better


[ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003220#comment-14003220
 ] 

Robert Muir commented on LUCENE-5688:
-

I think this is a duplicate of LUCENE-4921 ?

I guess the main thing is to differentiate between sparse data and thousands 
and thousands of fields which usually hints at the problem not being in lucene 
:)

 NumericDocValues fields with sparse data can be compressed better 
 --

 Key: LUCENE-5688
 URL: https://issues.apache.org/jira/browse/LUCENE-5688
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
 Attachments: LUCENE-5688.patch


 I ran into this problem where I had a dynamic field in Solr and indexed data 
 into lots of fields. For each field only a few documents had actual values 
 and the remaining documents the default value ( 0 ) got indexed. Now when I 
 merge segments, the index size jumps up.
 For example I have 10 segments - Each with 1 DV field. When I merge segments 
 into 1 that segment will contain all 10 DV fields with lots if 0s. 
 This was the motivation behind trying to come up with a compression for a use 
 case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better


[ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003226#comment-14003226
 ] 

Robert Muir commented on LUCENE-5688:
-

Varun, i dont think we should make a long[] of size maxDoc in ram here just to 
save some space on disk.

 NumericDocValues fields with sparse data can be compressed better 
 --

 Key: LUCENE-5688
 URL: https://issues.apache.org/jira/browse/LUCENE-5688
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
 Attachments: LUCENE-5688.patch


 I ran into this problem where I had a dynamic field in Solr and indexed data 
 into lots of fields. For each field only a few documents had actual values 
 and the remaining documents the default value ( 0 ) got indexed. Now when I 
 merge segments, the index size jumps up.
 For example I have 10 segments - Each with 1 DV field. When I merge segments 
 into 1 that segment will contain all 10 DV fields with lots if 0s. 
 This was the motivation behind trying to come up with a compression for a use 
 case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

2014-05-20 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003231#comment-14003231
 ] 

Grant Ingersoll commented on LUCENE-5688:
-

bq. Varun, i dont think we should make a long[] of size maxDoc in ram here just 
to save some space on disk.

In a large index, this can be quite significant, FWIW.  Agreed on the long[] in 
RAM, but would be good to have a better way of controlling the on-disk behavior.

 NumericDocValues fields with sparse data can be compressed better 
 --

 Key: LUCENE-5688
 URL: https://issues.apache.org/jira/browse/LUCENE-5688
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
 Attachments: LUCENE-5688.patch


 I ran into this problem where I had a dynamic field in Solr and indexed data 
 into lots of fields. For each field only a few documents had actual values 
 and the remaining documents the default value ( 0 ) got indexed. Now when I 
 merge segments, the index size jumps up.
 For example I have 10 segments - Each with 1 DV field. When I merge segments 
 into 1 that segment will contain all 10 DV fields with lots if 0s. 
 This was the motivation behind trying to come up with a compression for a use 
 case like this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

2014-05-20 Thread Varun Thacker (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003234#comment-14003234
]

Varun Thacker commented on LUCENE-5688:
---

Completely overlooked LUCENE-4921. Should I mark this as duplicate and post the
same patch there?

bq. Varun, i dont think we should make a long[] of size maxDoc in ram here just
to save some space on disk.

I felt the same way when I was writing it, but that was the easiest way to get
a quick patch out. I will try to think of a better way to achieve this. Do you
have any suggestions?

NumericDocValues fields with sparse data can be compressed better
--

Key: LUCENE-5688
URL: https://issues.apache.org/jira/browse/LUCENE-5688
Project: Lucene - Core
Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
Attachments: LUCENE-5688.patch

I ran into this problem where I had a dynamic field in Solr and indexed data
into lots of fields. For each field only a few documents had actual values
and the remaining documents the default value ( 0 ) got indexed. Now when I
merge segments, the index size jumps up.
For example I have 10 segments - Each with 1 DV field. When I merge segments
into 1 that segment will contain all 10 DV fields with lots if 0s.
This was the motivation behind trying to come up with a compression for a use
case like this.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better

[
https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003321#comment-14003321
]

Robert Muir commented on LUCENE-5688:
-

You otherwise don't load hardly anything in RAM, so its extremely trappy to do
this.

As i mentioned, the obvious approach is O(log N), like android's SparseArray.
so array 1 is increasing documents that have value (can be a
monotonicblockreader). you can binarysearch that to find your value in the
real values.

You have to decide how 'missing' should be represented. currently it will be 1
bit per document as well. if it stays that way, you can check that first (which
is the typical case) before binary searching.

In all cases this has performance implications (slower access), and isn't
specific to numerics (all dv fields could be sparse). So I think its best to
start outside of the default codec rather than trying to do it automatically.
Not everyone will want the space-time tradeoff.

NumericDocValues fields with sparse data can be compressed better
--

Key: LUCENE-5688
URL: https://issues.apache.org/jira/browse/LUCENE-5688
Project: Lucene - Core
Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
Attachments: LUCENE-5688.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6093) delete docs only in a spec shard within a collection

2014-05-20 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003332#comment-14003332
 ] 

Erick Erickson commented on SOLR-6093:
--

Hmmm.. Couple of questions to add to the mix:

1 How to handle multiple shards hosted on the same node?
2 How  does adding distrib=false interact  with this idea?

 delete docs only in a spec shard within a collection
 

 Key: SOLR-6093
 URL: https://issues.apache.org/jira/browse/SOLR-6093
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 4.6, 4.7
Reporter: YouPeng Yang
Priority: Minor

 Suppose to delete docs in a spec shard within a collection,using this the 
 command
 [1]http://localhost:8082/solr/tv_201402/update?stream.body=deletequeryBEGINTIME:[2014-03-01
  00:00:00 TO *]/query/deleteshards=tv_201402commit=true.
 As a result, all the docs over the collection which BEGINTIME is larger than 
 2014-03-01 00:00:00 were deleted.
  After I check the source code  
 :DistributedUpdateProcessor.doDeleteByQuery.It shows that the _route_ and 
 shard.keys can make work, which indeed works after I have try with the cmd:
 [2]http://10.1.22.1:8082/solr/tv_201402/update?stream.body=deletequeryBEGINTIME:[2014-03-01
  00:00:00 TO 2014-03-02 
 00:00:00]/query/delete_route_=tv_201402commit=true
 As the first request[1],I use the shards parameter hoping to delete docs only 
 in the tv_201402, while the second request[1],it change to _route_ parameter  
 .
 The purpose that I file the JIRA is that we should  make it to be consist 
 that the shards parameter  should also can make request no distrib during 
 updating just as some as searching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003377#comment-14003377
 ] 

Michael McCandless commented on LUCENE-5679:


I think just drop them in 4.x.

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.

Robert Muir created LUCENE-5689:
---

 Summary: FieldInfo.setDocValuesGen should not be public.
 Key: LUCENE-5689
 URL: https://issues.apache.org/jira/browse/LUCENE-5689
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


its currently public and users can modify it. We made this class mostly 
immutable long ago: remember its returned by the atomicreader API!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-05-20 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003425#comment-14003425
]

Noble Paul commented on SOLR-5681:
--

The REQUESTSTATUS would not tell me what was the output of the run?
So, I miss that when async is used?

Make the OverseerCollectionProcessor multi-threaded
---

Key: SOLR-5681
URL: https://issues.apache.org/jira/browse/SOLR-5681
Project: Solr
Issue Type: Improvement
Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681.patch,
SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
SOLR-5681.patch, SOLR-5681.patch

Right now, the OverseerCollectionProcessor is single threaded i.e submitting
anything long running would have it block processing of other mutually
exclusive tasks.
When OCP tasks become optionally async (SOLR-5477), it'd be good to have
truly non-blocking behavior by multi-threading the OCP itself.
For example, a ShardSplit call on Collection1 would block the thread and
thereby, not processing a create collection task (which would stay queued in
zk) though both the tasks are mutually exclusive.
Here are a few of the challenges:
* Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An
easy way to handle that is to only let 1 task per collection run at a time.
* ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue.
The task from the workQueue is only removed on completion so that in case of
a failure, the new Overseer can re-consume the same task and retry. A queue
is not the right data structure in the first place to look ahead i.e. get the
2nd task from the queue when the 1st one is in process. Also, deleting tasks
which are not at the head of a queue is not really an 'intuitive' thing.
Proposed solutions for task management:
* Task funnel and peekAfter(): The parent thread is responsible for getting
and passing the request to a new thread (or one from the pool). The parent
method uses a peekAfter(last element) instead of a peek(). The peekAfter
returns the task after the 'last element'. Maintain this request information
and use it for deleting/cleaning up the workQueue.
* Another (almost duplicate) queue: While offering tasks to workQueue, also
offer them to a new queue (call it volatileWorkQueue?). The difference is, as
soon as a task from this is picked up for processing by the thread, it's
removed from the queue. At the end, the cleanup is done from the workQueue.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6055) TestMiniSolrCloudCluster has data dir in test's CWD

2014-05-20 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst resolved SOLR-6055.
--

Resolution: Duplicate

Rolling back into LUCENE-5650.

 TestMiniSolrCloudCluster has data dir in test's CWD
 ---

 Key: SOLR-6055
 URL: https://issues.apache.org/jira/browse/SOLR-6055
 Project: Solr
  Issue Type: Bug
Reporter: Ryan Ernst

 While investigating one of the test failures created when tightening test 
 permissions to restrict write access to CWD (see LUCENE-5650), I've found 
 {{TestMiniSolrCloudCluster}} is attemping to write transaction logs to 
 {{$CWD/data/tlog}}.
 I've traced this down to two things which are happening:
 # The test uses {{RAMDirectoryFactory}}, which always returns true for 
 {{isAbsolute}}.  This causes the directory factory to *not* adjust the 
 default relative to bring it under the instance dir.
 # The {{UpdateLog}} creates its tlog file with the relative data dir.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.8-Linux (64bit/jdk1.8.0_05) - Build # 205 - Failure!

2014-05-20 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.8-Linux/205/
Java: 64bit/jdk1.8.0_05 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings

Error Message:
startOffset must be non-negative, and endOffset must be = startOffset, 
startOffset=74,endOffset=7

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be = startOffset, startOffset=74,endOffset=7
at 
__randomizedtesting.SeedInfo.seed([D784772204AB6B46:BDDFC8335DE54BB5]:0)
at 
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
at 
org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter.unhyphenate(HyphenatedWordsFilter.java:144)
at 
org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter.incrementToken(HyphenatedWordsFilter.java:97)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:701)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:612)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:511)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:944)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at

[jira] [Updated] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir

2014-05-20 Thread Ryan Ernst (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Ernst updated LUCENE-5650:
---

Attachment: dih.patch

Looks like we overlapped on fixing these.  I like how you handle velocity 
better than me (I hacked through a way to set the log file).  But I'm not sure 
I like the DIH change.  I think it is bogus to default to the CWD if running 
without a core (which seems to only happen in tests?).  I changed this to 
default to solr.solr.home, then set this to a temp dir in the abstract DIH test 
base (see attached patch).

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch, dih.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003500#comment-14003500
 ] 

ASF subversion and git services commented on LUCENE-5679:
-

Commit 1596296 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1596296 ]

LUCENE-5679: remove the single-parameter deleteDocuments() versions, in favor 
of the vararg ones

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.


 [ 
https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5689:


Attachment: LUCENE-5689.patch

here's a patch. also adds some safety to checkConsistency.

We should probably add  assert checkConsistency() to the package-private setter.


 FieldInfo.setDocValuesGen should not be public.
 ---

 Key: LUCENE-5689
 URL: https://issues.apache.org/jira/browse/LUCENE-5689
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5689.patch


 its currently public and users can modify it. We made this class mostly 
 immutable long ago: remember its returned by the atomicreader API!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

[
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003518#comment-14003518
]

Anshum Gupta commented on SOLR-5681:

I had created SOLR-5886 for that.
Will take that one up soon.

Make the OverseerCollectionProcessor multi-threaded
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5680) Allow updating multiple DocValues fields atomically


 [ 
https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5680:
---

Attachment: LUCENE-5680.patch

Patch contains the {{Field...}} vs {{DocValuesUpdate...}} variants 
side-by-side, with support for unsetting a field's value. To support that with 
the {{Field...}} method,  I added {{Field.setNumberValue(Number value)}} and a 
{{.createMissing}} static method to NumericDocValuesField. This now works.

Now we have them side-by-side in a patch to review.

 Allow updating multiple DocValues fields atomically
 ---

 Key: LUCENE-5680
 URL: https://issues.apache.org/jira/browse/LUCENE-5680
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch


 This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) 
 -- it would be good if we can allow updating several doc-values fields, 
 atomically. It will also improve/simplify our tests, where today we index two 
 fields, e.g. the field itself and a control field. In some multi-threaded 
 tests, since we cannot be sure which updates came through first, we limit the 
 test such that each thread updates a different set of fields, otherwise they 
 will collide and it will be hard to verify the index in the end.
 I was working on a patch and it looks pretty simple to do, will post a patch 
 shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.


[ 
https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003556#comment-14003556
 ] 

Shai Erera commented on LUCENE-5689:


Looks good. So now FI.setDVGen is used only by ReaderAndUpdates, right? Perhaps 
it is possible to get rid of it entirely, by having RAU create a new FI 
instance, setting the new dvGen?

 FieldInfo.setDocValuesGen should not be public.
 ---

 Key: LUCENE-5689
 URL: https://issues.apache.org/jira/browse/LUCENE-5689
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5689.patch


 its currently public and users can modify it. We made this class mostly 
 immutable long ago: remember its returned by the atomicreader API!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()

2014-05-20 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5679.


   Resolution: Fixed
Fix Version/s: 5.0
   4.9
 Assignee: Shai Erera

Committed to trunk and 4x. Thanks Mike!

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003530#comment-14003530
 ] 

ASF subversion and git services commented on LUCENE-5679:
-

Commit 1596301 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1596301 ]

LUCENE-5679: remove the single-parameter deleteDocuments() versions, in favor 
of the vararg ones

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6097) Posting JSON with results in lost information

2014-05-20 Thread Kingston Duffie (JIRA)

Kingston Duffie created SOLR-6097:
-

 Summary: Posting JSON with   results in lost information
 Key: SOLR-6097
 URL: https://issues.apache.org/jira/browse/SOLR-6097
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7.2
Reporter: Kingston Duffie


Post the following JSON to add a document:

{ 
add : 
   { 
   commitWithin : 5000,
   doc : 
   {  
   id : 12345,
   body : a  b  c
   }
}
}

The body field is configured in the schema as:

   field name=body type=text_hive indexed=true stored=true 
required=false multiValued=false/

and

fieldType name=text_hive class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=15 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


The problem is this:  After submitting this post, if you go to the SOLR console 
and find this document, the stored body will be missing the contents between 
the less-than and greater-than symbols -- i.e., a c.  

If you encode the body (i.e.,  a lt; b gt; c), it will show up with  and  
symbols.  That is, it appears that SOLR is stripping out HTML tags even though 
we are not asking it to.

Note that it is not only the storage but also indexing that is affected (as we 
originally found the issue because searching for b would not match this 
document.

I'm willing to believe that I'm doing something wrong, but I can't see anywhere 
in any spec that suggests that strings inside JSON need to be 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003563#comment-14003563
 ] 

ASF subversion and git services commented on LUCENE-5679:
-

Commit 1596304 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1596304 ]

LUCENE-5679: leftover jdoc fix

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003564#comment-14003564
 ] 

ASF subversion and git services commented on LUCENE-5679:
-

Commit 1596306 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1596306 ]

LUCENE-5679: leftover jdoc fix

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1596301 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/CHANGES.txt lucene/core/ lucene/core/src/java/org/apache/lucene/index/IndexWriter.java

2014-05-20 Thread Uwe Schindler

Hi,

this is a binary backwards break in 4.x, because the method signature, user's 
code was compiled against in previous versions, is removed for no reason. In 
4.x I would keep the one-arg methods, but just let it delegate to the vararg 
version. The javadocs can stay the same.
In fact this change requires to recompile your source-code (source-code 
compatibility is ensured) but does not provide binary compatibility.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: sh...@apache.org [mailto:sh...@apache.org]
 Sent: Tuesday, May 20, 2014 6:02 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1596301 - in /lucene/dev/branches/branch_4x: ./
 lucene/ lucene/CHANGES.txt lucene/core/
 lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 
 Author: shaie
 Date: Tue May 20 16:02:17 2014
 New Revision: 1596301
 
 URL: http://svn.apache.org/r1596301
 Log:
 LUCENE-5679: remove the single-parameter deleteDocuments() versions, in
 favor of the vararg ones
 
 Modified:
 lucene/dev/branches/branch_4x/   (props changed)
 lucene/dev/branches/branch_4x/lucene/   (props changed)
 lucene/dev/branches/branch_4x/lucene/CHANGES.txt   (contents, props
 changed)
 lucene/dev/branches/branch_4x/lucene/core/   (props changed)
 
 lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i
 ndex/IndexWriter.java
 
 Modified: lucene/dev/branches/branch_4x/lucene/CHANGES.txt
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/CHA
 NGES.txt?rev=1596301r1=1596300r2=1596301view=diff
 ==
 
 --- lucene/dev/branches/branch_4x/lucene/CHANGES.txt (original)
 +++ lucene/dev/branches/branch_4x/lucene/CHANGES.txt Tue May 20
 16:02:17
 +++ 2014
 @@ -61,6 +61,10 @@ API Changes
  * LUCENE-5640: The Token class was deprecated. Since Lucene 2.9,
 TokenStreams
are using Attributes, Token is no longer used.  (Uwe Schindler, Robert 
 Muir)
 
 +* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and
 +  IndexWriter.deleteDocuments(Query) with their varargs counterparts.
 +  (Shai Erera)
 +
  Optimizations
 
  * LUCENE-5603: hunspell stemmer more efficiently strips prefixes
 
 Modified:
 lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i
 ndex/IndexWriter.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/cor
 e/src/java/org/apache/lucene/index/IndexWriter.java?rev=1596301r1=159
 6300r2=1596301view=diff
 ==
 
 ---
 lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i
 ndex/IndexWriter.java (original)
 +++
 lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene
 +++ /index/IndexWriter.java Tue May 20 16:02:17 2014
 @@ -77,8 +77,8 @@ import org.apache.lucene.util.ThreadInte
and otherwise open the existing index./p
 
pIn either case, documents are added with {@link
 #addDocument(Iterable)
 -  addDocument} and removed with {@link #deleteDocuments(Term)} or
 {@link
 -  #deleteDocuments(Query)}. A document can be updated with {@link
 +  addDocument} and removed with {@link #deleteDocuments(Term...)} or
 + {@link  #deleteDocuments(Query...)}. A document can be updated with
 + {@link
#updateDocument(Term, Iterable) updateDocument} (which just deletes
and then adds the entire document). When finished adding, deleting
and updating documents, {@link #close() close} should be called./p @@ -
 1323,28 +1323,6 @@ public class IndexWriter implements Clos
  }
}
 
 -  /**
 -   * Deletes the document(s) containing codeterm/code.
 -   *
 -   * pbNOTE/b: if this method hits an OutOfMemoryError
 -   * you should immediately close the writer.  See a
 -   * href=#OOMEabove/a for details./p
 -   *
 -   * @param term the term to identify the documents to be deleted
 -   * @throws CorruptIndexException if the index is corrupt
 -   * @throws IOException if there is a low-level IO error
 -   */
 -  public void deleteDocuments(Term term) throws IOException {
 -ensureOpen();
 -try {
 -  if (docWriter.deleteTerms(term)) {
 -processEvents(true, false);
 -  }
 -} catch (OutOfMemoryError oom) {
 -  handleOOM(oom, deleteDocuments(Term));
 -}
 -  }
 -
/** Expert: attempts to delete by document ID, as long as
 *  the provided reader is a near-real-time reader (from {@link
 *  DirectoryReader#open(IndexWriter,boolean)}).  If the @@ -1357,8
 +1335,7 @@ public class IndexWriter implements Clos
 *  bNOTE/b: this method can only delete documents
 *  visible to the currently open NRT reader.  If you need
 *  to delete documents indexed after opening the NRT
 -   *  reader you must use the other deleteDocument methods
 -   *  (e.g., {@link #deleteDocuments(Term)}). */
 +   *  reader you must use {@link

[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()

2014-05-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003574#comment-14003574
 ] 

Uwe Schindler commented on LUCENE-5679:
---

See my comment on the mailing list about 4.x:

{quote}
Hi,

this is a binary backwards break in 4.x, because the method signature, user's 
code was compiled against in previous versions, is removed for no reason. In 
4.x I would keep the one-arg methods, but just let it delegate to the vararg 
version. The javadocs can stay the same.
In fact this change requires to recompile your source-code (source-code 
compatibility is ensured) but does not provide binary compatibility.

Uwe
{quote}

 Consolidate IndexWriter.deleteDocuments()
 -

 Key: LUCENE-5679
 URL: https://issues.apache.org/jira/browse/LUCENE-5679
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5679.patch


 Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should 
 consolidate the various IW.deleteDocuments().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5886) Propagate more information in case of failed async tasks


[ 
https://issues.apache.org/jira/browse/SOLR-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003577#comment-14003577
 ] 

Noble Paul commented on SOLR-5886:
--

not just errors. All the output of the non async command should be available 
for async as well

 Propagate more information in case of failed async tasks
 

 Key: SOLR-5886
 URL: https://issues.apache.org/jira/browse/SOLR-5886
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 As of now, only the state of a pre-submitted tasks is returned in the 
 response  to the REQUESTSTATUS Collections API call. 
 Pass more information, specially in case of a call erroring out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5886) Propagate more information in case of failed async tasks


[ 
https://issues.apache.org/jira/browse/SOLR-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003579#comment-14003579
 ] 

Anshum Gupta commented on SOLR-5886:


sure, makes sense.

 Propagate more information in case of failed async tasks
 

 Key: SOLR-5886
 URL: https://issues.apache.org/jira/browse/SOLR-5886
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 As of now, only the state of a pre-submitted tasks is returned in the 
 response  to the REQUESTSTATUS Collections API call. 
 Pass more information, specially in case of a call erroring out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.


[ 
https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003580#comment-14003580
 ] 

Robert Muir commented on LUCENE-5689:
-

It maybe, i wasnt sure about the implications of that. I think we should first 
remove the 'public', because I do not know what will happen if someone invokes 
this setter on e.g. AtomicReader, but I'm guessing its not good :)

 FieldInfo.setDocValuesGen should not be public.
 ---

 Key: LUCENE-5689
 URL: https://issues.apache.org/jira/browse/LUCENE-5689
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5689.patch


 its currently public and users can modify it. We made this class mostly 
 immutable long ago: remember its returned by the atomicreader API!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6095) SolrCloud cluster can end up without an overseer


[ 
https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003585#comment-14003585
 ] 

Shalin Shekhar Mangar commented on SOLR-6095:
-

The problem that I could find is in LeaderElector.checkIfIamLeader where we 
have the following code:
{code}
if (seq = intSeqs.get(0)) {
  // first we delete the node advertising the old leader in case the ephem 
is still there
  try {
zkClient.delete(context.leaderPath, -1, true);
  } catch(Exception e) {
// fine
  }

  runIamLeaderProcess(context, replacement);
}
{code}

If for whatever reason, the zkClient.delete was unsuccessful, we just ignore 
and go ahead to runIamLeaderProcess(...) which leads to 
OverseerElectionContext.runLeaderProcess(...) where it tries to create the 
/overseer_elect/leader node:
{code}
zkClient.makePath(leaderPath, ZkStateReader.toJSON(myProps),
CreateMode.EPHEMERAL, true);
{code}
This is where things go wrong. Because the /overseer_elect/leader node already 
existed, the zkClient.makePath fails and the node decides to give up because it 
think that there is already a leader. It never tries to rejoin election ever. 
Then once the ephemeral /overseer_elect/leader node goes away (after the 
previous overseer leader exits), the cluster is left with no leader.

Shouldn't the node next in line to become a leader try again or rejoin the 
election instead of giving up?

 SolrCloud cluster can end up without an overseer
 

 Key: SOLR-6095
 URL: https://issues.apache.org/jira/browse/SOLR-6095
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8
Reporter: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0


 We have a large cluster running on ec2 which occasionally ends up without an 
 overseer after a rolling restart. We always restart our overseer nodes at the 
 very last otherwise we end up with a large number of shards that can't 
 recover properly.
 This cluster is running a custom branch forked from 4.8 and has SOLR-5473, 
 SOLR-5495 and SOLR-5468 applied. We have a large number of small collections 
 (120 collections each with approx 5M docs) on 16 Solr nodes. We are also 
 using the overseer roles feature to designate two specified nodes as 
 overseers. However, I think the problem that we're seeing is not specific to 
 the overseer roles feature.
 As soon as the overseer was shutdown, we saw the following on the node which 
 was next in line to become the overseer:
 {code}
 2014-05-20 09:55:39,261 [main-EventThread] INFO  solr.cloud.ElectionContext  
 - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr
 2014-05-20 09:55:39,265 [main-EventThread] WARN  solr.cloud.LeaderElector  - 
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /overseer_elect/leader
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
   at 
 org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432)
   at 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373)
   at 
 org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551)
   at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142)
   at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110)
   at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
   at 
 org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}
 When the overseer leader node is gracefully shutdown, we get the following in 
 the logs:
 {code}
 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer  - Exception in 
 Overseer main queue loop
 org.apache.solr.common.SolrException: Could not load collection from ZK:sm12
   at 
 org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
   at

[jira] [Created] (LUCENE-5690) expose sub-Terms from MultiTerms

2014-05-20 Thread Yonik Seeley (JIRA)

Yonik Seeley created LUCENE-5690:


 Summary: expose sub-Terms from MultiTerms
 Key: LUCENE-5690
 URL: https://issues.apache.org/jira/browse/LUCENE-5690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley


MultiTermsEnum and MultiDocsEnum both expose their subs.  It would be useful to 
do the same for MultiTerms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003626#comment-14003626
 ] 

Anshum Gupta commented on SOLR-5495:


The CHANGES.txt entry for trunk and 4x are in different sections.
trunk: 5.0 section
4x: 4.9 section

 Recovery strategy for leader partitioned from replica case.
 ---

 Key: SOLR-5495
 URL: https://issues.apache.org/jira/browse/SOLR-5495
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Timothy Potter
 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch


 We need to work out a strategy for the case of:
 Leader and replicas can still talk to ZooKeeper, Leader cannot talk to 
 replica.
 We punted on this in initial design, but I'd like to get something in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.


[ 
https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003638#comment-14003638
 ] 

ASF subversion and git services commented on SOLR-5495:
---

Commit 1596315 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1596315 ]

SOLR-5495: Re-arrange location of SOLR-5495 and SOLR-5468 in CHANGES.txt

 Recovery strategy for leader partitioned from replica case.
 ---

 Key: SOLR-5495
 URL: https://issues.apache.org/jira/browse/SOLR-5495
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Timothy Potter
 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch


 We need to work out a strategy for the case of:
 Leader and replicas can still talk to ZooKeeper, Leader cannot talk to 
 replica.
 We punted on this in initial design, but I'd like to get something in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003639#comment-14003639
 ] 

ASF subversion and git services commented on SOLR-5468:
---

Commit 1596315 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1596315 ]

SOLR-5495: Re-arrange location of SOLR-5495 and SOLR-5468 in CHANGES.txt

 Option to enforce a majority quorum approach to accepting updates in SolrCloud
 --

 Key: SOLR-5468
 URL: https://issues.apache.org/jira/browse/SOLR-5468
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.5
 Environment: All
Reporter: Timothy Potter
Assignee: Timothy Potter
Priority: Minor
 Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch


 I've been thinking about how SolrCloud deals with write-availability using 
 in-sync replica sets, in which writes will continue to be accepted so long as 
 there is at least one healthy node per shard.
 For a little background (and to verify my understanding of the process is 
 correct), SolrCloud only considers active/healthy replicas when acknowledging 
 a write. Specifically, when a shard leader accepts an update request, it 
 forwards the request to all active/healthy replicas and only considers the 
 write successful if all active/healthy replicas ack the write. Any down / 
 gone replicas are not considered and will sync up with the leader when they 
 come back online using peer sync or snapshot replication. For instance, if a 
 shard has 3 nodes, A, B, C with A being the current leader, then writes to 
 the shard will continue to succeed even if B  C are down.
 The issue is that if a shard leader continues to accept updates even if it 
 loses all of its replicas, then we have acknowledged updates on only 1 node. 
 If that node, call it A, then fails and one of the previous replicas, call it 
 B, comes back online before A does, then any writes that A accepted while the 
 other replicas were offline are at risk to being lost. 
 SolrCloud does provide a safe-guard mechanism for this problem with the 
 leaderVoteWait setting, which puts any replicas that come back online before 
 node A into a temporary wait state. If A comes back online within the wait 
 period, then all is well as it will become the leader again and no writes 
 will be lost. As a side note, sys admins definitely need to be made more 
 aware of this situation as when I first encountered it in my cluster, I had 
 no idea what it meant.
 My question is whether we want to consider an approach where SolrCloud will 
 not accept writes unless there is a majority of replicas available to accept 
 the write? For my example, under this approach, we wouldn't accept writes if 
 both BC failed, but would if only C did, leaving A  B online. Admittedly, 
 this lowers the write-availability of the system, so may be something that 
 should be tunable?
 From Mark M: Yeah, this is kind of like one of many little features that we 
 have just not gotten to yet. I’ve always planned for a param that let’s you 
 say how many replicas an update must be verified on before responding 
 success. Seems to make sense to fail that type of request early if you notice 
 there are not enough replicas up to satisfy the param to begin with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2014-05-20 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---

Attachment: SOLR-2894.patch

I haven't had a lot of time to review the updatd patch in depth, but I did 
spend some time trying to improve TestCloudPivotFacet to resolve some of the 
nocommits -- but i'm still seeing failures...

1) I realized the depth check i was trying to do was bogus and commented it 
out (still need to purge the code - didn't want to muck with that until the 
rest of the test was passing more reliably)


2) the NPE I mentioned in QueryResponse.readPivots is still happening, but i 
realized that it has nothing to do with the datatype of the fields being 
pivoted on -- it only seemed that way because of the poor randomization of 
values getting put in the single valued string fields vs the multivalued fields 
in the old version of the test.

The bug seems to pop up in _some_ cases where a pivot constraint has no 
sub-pivots.  Normally this results in a NamedList with 3 keys 
(field,value,count) -- the 4th pivot key is only included if there is a list 
of at least 1 sub-pivot.  But in some cases (I can't explain from looking at 
the code why) the server is responding back with a 4th entry using hte key 
pivot but the value is null

We need to get to the bottom of this -- it's not clear if there is a bug 
preventing real sub-pivot constraints from being returned correctly, or if this 
is just a mistake in the code where it's putting null in the NamedList 
instead of not adding anything at all (in which case it might be tempting to 
make QueryResponse.readPivots smart enough to deal with it, but if we did that 
it would still be broken for older clients -- best to stick with teh current 
API semantics)


In the attached patch update, this seed will fail showing the null sub-pivots 
problem...

{noformat}

   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet 
-Dtests.method=testDistribSearch -Dtests.seed=680E68425E7CA1BA 
-Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 41.7s | TestCloudPivotFacet.testDistribSearch 
   [junit4] Throwable #1: java.lang.AssertionError: Server sent back 
'null' for sub pivots?
   [junit4]at 
__randomizedtesting.SeedInfo.seed([680E68425E7CA1BA:E9E8E65A2923C186]:0)
   [junit4]at 
org.apache.solr.client.solrj.response.QueryResponse.readPivots(QueryResponse.java:383)
   [junit4]at 
org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:363)
   [junit4]at 
org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:148)
   [junit4]at 
org.apache.solr.client.solrj.response.QueryResponse.init(QueryResponse.java:91)
   [junit4]at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
   [junit4]at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:161)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)

{noformat}



3) Independent (i think) from the NPE issue, there is still something wonky 
with the refined counts when mincount is specified...

Here for example is a seed that gets based the QueryResponse.readPivots, but 
then fails the numFound validation queries used to check the pivot counts...

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet 
-Dtests.method=testDistribSearch -Dtests.seed=F08A107C384690FC 
-Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Jamaica 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 27.0s | TestCloudPivotFacet.testDistribSearch 
   [junit4] Throwable #1: java.lang.AssertionError: 
{main({main(facet.pivot.mincount=9),extra({main(facet.limit=12),extra({main(facet.pivot=pivot_y_s%2Cpivot_x_s1),extra(facet=truefacet.pivot=pivot_x_s1%2Cpivot_x_s)})})}),extra(rows=0q=id%3A%5B*+TO+503%5D)}
 == pivot_y_s,pivot_x_s1: 
{params(rows=0),defaults({main(rows=0q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})}
 expected:9 but was:14
   [junit4]at 
__randomizedtesting.SeedInfo.seed([F08A107C384690FC:716C9E644F19F0C0]:0)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:190)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)
   [junit4]at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
   [junit4]at java.lang.Thread.run(Thread.java:744)
   [junit4] Caused by: java.lang.AssertionError: pivot_y_s,pivot_x_s1:

[jira] [Resolved] (SOLR-6097) Posting JSON with results in lost information

2014-05-20 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-6097.


Resolution: Cannot Reproduce

Cannot Reproduce

Please post more details of your situation (inlcuding specifics on how exactly 
you are adding your data to Solr) to a new thread on the solr-user mailing list.

In the event that more details about your usage helps uncover 
reliable/reproducible steps to recreate the problem, we can re-open the issue 
with an updated summary.

Using hte 4.7.2 example configs...
{noformat}
hossman@frisbee:~$ curl -X POST -H Content-Type: application/json 
--data-binary '
{ 
add : 
   { 
   commitWithin : 5000,
   doc : 
   {  
   id : 12345,
   body_s : a  b  c
   }
}
}
' http://localhost:8983/solr/collection1/update
{responseHeader:{status:0,QTime:24}}
hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:12345wt=jsonindent=true'
{
  responseHeader:{
status:0,
QTime:1,
params:{
  indent:true,
  q:id:12345,
  wt:json}},
  response:{numFound:1,start:0,docs:[
  {
id:12345,
body_s:a  b  c,
_version_:1468642762402299904}]
  }}
{noformat}

 Posting JSON with   results in lost information
 -

 Key: SOLR-6097
 URL: https://issues.apache.org/jira/browse/SOLR-6097
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7.2
Reporter: Kingston Duffie

 Post the following JSON to add a document:
 { 
 add : 
{ 
commitWithin : 5000,
doc : 
{  
id : 12345,
body : a  b  c
}
 }
 }
 The body field is configured in the schema as:
field name=body type=text_hive indexed=true stored=true 
 required=false multiValued=false/
 and
 fieldType name=text_hive class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory minGramSize=1 
 maxGramSize=15 side=front/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 The problem is this:  After submitting this post, if you go to the SOLR 
 console and find this document, the stored body will be missing the contents 
 between the less-than and greater-than symbols -- i.e., a c.  
 If you encode the body (i.e.,  a lt; b gt; c), it will show up with  and 
  symbols.  That is, it appears that SOLR is stripping out HTML tags even 
 though we are not asking it to.
 Note that it is not only the storage but also indexing that is affected (as 
 we originally found the issue because searching for b would not match this 
 document.
 I'm willing to believe that I'm doing something wrong, but I can't see 
 anywhere in any spec that suggests that strings inside JSON need to be 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5690) expose sub-Terms from MultiTerms

2014-05-20 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-5690:
-

Attachment: LUCENE-5690.patch

Trivial patch attached.

 expose sub-Terms from MultiTerms
 

 Key: LUCENE-5690
 URL: https://issues.apache.org/jira/browse/LUCENE-5690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: LUCENE-5690.patch


 MultiTermsEnum and MultiDocsEnum both expose their subs.  It would be useful 
 to do the same for MultiTerms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!

2014-05-20 Thread Chris Hostetter


: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/
: Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC

The 2 facet test failures both related to DocValues and reproduce reliably 
for me on my Linux box.

Is it possible these are related to the UninvertingDirectoryReader 
changes?


   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestRandomFaceting 
-Dtests.method=testRandomFaceting -Dtests.seed=EF686C57D2698AAE 
-Dtests.slow=true -Dtests.locale=ar_BH -Dtests.timezone=Europe/Oslo 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR   3.13s | TestRandomFaceting.testRandomFaceting 
   [junit4] Throwable #1: org.apache.solr.common.SolrException: Error 
while processing facet fields: java.lang.AssertionError
   [junit4]at 
__randomizedtesting.SeedInfo.seed([EF686C57D2698AAE:E2004C8287904211]:0)
   [junit4]at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:595)
   [junit4]at 
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:260)
   [junit4]at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84)
   [junit4]at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:221)
   [junit4]at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   [junit4]at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:1964)
   [junit4]at 
org.apache.solr.util.TestHarness.query(TestHarness.java:295)
   [junit4]at 
org.apache.solr.util.TestHarness.query(TestHarness.java:278)
   [junit4]at 
org.apache.solr.TestRandomFaceting.doFacetTests(TestRandomFaceting.java:216)
   [junit4]at 
org.apache.solr.TestRandomFaceting.doFacetTests(TestRandomFaceting.java:138)
   [junit4]at 
org.apache.solr.TestRandomFaceting.testRandomFaceting(TestRandomFaceting.java:121)
   [junit4]at java.lang.Thread.run(Thread.java:745)
   [junit4] Caused by: java.lang.AssertionError
   [junit4]at 
org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:106)
   [junit4]at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:432)
   [junit4]at 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:561)
   [junit4]at 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:553)
   [junit4]at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]at 
org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:507)
   [junit4]at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:577)
   [junit4]... 50 more



   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFaceting 
-Dtests.method=testMultiThreadedFacets -Dtests.seed=EF686C57D2698AAE 
-Dtests.slow=true -Dtests.locale=ja -Dtests.timezone=Africa/Lusaka 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.30s | TestFaceting.testMultiThreadedFacets 
   [junit4] Throwable #1: java.lang.AssertionError: expected 
same:org.apache.lucene.index.MultiDocValues$MultiSortedSetDocValues@1840511 
was not:org.apache.lucene.index.MultiDocValues$MultiSortedSetDocValues@c88009
   [junit4]at 
__randomizedtesting.SeedInfo.seed([EF686C57D2698AAE:C040D5DEE621F7BA]:0)
   [junit4]at 
org.apache.solr.request.TestFaceting.assertEquals(TestFaceting.java:958)
   [junit4]at 
org.apache.solr.request.TestFaceting.testMultiThreadedFacets(TestFaceting.java:928)
   [junit4]at java.lang.Thread.run(Thread.java:745)










-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in


[ 
https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003723#comment-14003723
 ] 

Michael McCandless commented on LUCENE-5670:


Thanks Christian, the patch looks good to me!  I'll commit soon.

 org.apache.lucene.util.fst.FST should skip over outputs it is not interested 
 in
 ---

 Key: LUCENE-5670
 URL: https://issues.apache.org/jira/browse/LUCENE-5670
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Christian Ziech
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch


 Currently the FST uses the read(DataInput) method from the Outputs class to 
 skip over outputs it actually is not interested in. For most use cases this 
 just creates some additional objects that are immediately destroyed again.
 When traversing an FST with non-trivial data however this can easily add up 
 to several excess objects that nobody actually ever read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in


 [ 
https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5670:
---

Fix Version/s: 5.0
   4.9

 org.apache.lucene.util.fst.FST should skip over outputs it is not interested 
 in
 ---

 Key: LUCENE-5670
 URL: https://issues.apache.org/jira/browse/LUCENE-5670
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Christian Ziech
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch


 Currently the FST uses the read(DataInput) method from the Outputs class to 
 skip over outputs it actually is not interested in. For most use cases this 
 just creates some additional objects that are immediately destroyed again.
 When traversing an FST with non-trivial data however this can easily add up 
 to several excess objects that nobody actually ever read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!

On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/
 : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC

 The 2 facet test failures both related to DocValues and reproduce reliably
 for me on my Linux box.

 Is it possible these are related to the UninvertingDirectoryReader
 changes?



I think its a bug in the assert. I'll take a look in 5 minutes.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents

2014-05-20 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003745#comment-14003745
 ] 

Mikhail Khludnev commented on SOLR-6096:


bq. I would expect that John is not in the index anymore. Currently he is.
Giving the 
[test|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/update/AddBlockUpdateTest.java#L142]
 I suppose that update with children works fine if you specify  {code}add 
overwrite = truedoc.../doc /add{code} To make it work you need to have 
uniqueKey defined. 
hmm.. I just checked it manually myself, with the tiny data from the blog. It 
works fine as expected - overwrite = true is implied by default. My guess is 
you either dont have uniqueKey defined, or you forget to commit, after update. 



 Support Update and Delete on nested documents
 -

 Key: SOLR-6096
 URL: https://issues.apache.org/jira/browse/SOLR-6096
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7.2
Reporter: Thomas Scheffler
  Labels: blockjoin, nested

 When using nested or child document. Update and delete operation on the root 
 document should also affect the nested documents, as no child can exist 
 without its parent :-)
 Example
 {code:xml|title=First Import}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, John/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 If I change my mind and the author was not named *John* but *_Jane_*:
 {code:xml|title=Changed name of author of '1'}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, Jane/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 I would expect that John is not in the index anymore. Currently he is. There 
 might also be the case that any subdocument is removed by an update:
 {code:xml|title=Remove author}
 doc
   field name=id1/field
   field name=titleArticle without author/field
 /doc
 {code}
 This should affect a delete on all nested documents, too. The same way all 
 nested documents should be deleted if I delete the root document:
 {code:xml|title=Deletion of '1'}
 delete
   id1/id
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 This is currently possible to do all this stuff on client side by issuing 
 additional request to delete document before every update. It would be more 
 efficient if this could be handled on SOLR side. One would benefit on atomic 
 update. The biggest plus shows when using delete-by-query. 
 {code:xml|title=Deletion of '1' by query}
 delete
   querytitle:*/query
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 In that case one would not have to first query all documents and issue 
 deletes by those id and every document that are nested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit


 [ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4236:


Attachment: LUCENE-4236.patch

Patch just synced to trunk. Some of the previous stuff in it has been committed.

Its still not ready to be committed: there is a lot of messy/bogus stuff I did. 
Will maybe make a branch and try to clean it up more.

 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!

2014-05-20 Thread ASF subversion and git services (JIRA)

I committed a fix.

On Tue, May 20, 2014 at 1:58 PM, Robert Muir rcm...@gmail.com wrote:
 On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/
 : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC

 The 2 facet test failures both related to DocValues and reproduce reliably
 for me on my Linux box.

 Is it possible these are related to the UninvertingDirectoryReader
 changes?



 I think its a bug in the assert. I'll take a look in 5 minutes.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents

2014-05-20 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003767#comment-14003767
 ] 

Mikhail Khludnev commented on SOLR-6096:


bq.  There might also be the case that any subdocument is removed by an update:
right. that's what I raised in SOLR-5211. 
The problem is backward compatibility - that update wasn't treat as block 
update before, and now Solr can just imply that every update is the block 
update. However, we can introduce the special purposed {{/blockupdate/}} 
handler with explicit block semantics for all case above .
So far it can be healed but in an ugly way eg. it need to support _children 
nuke_ update eg:
{code}
doc
  field name=id1/field
  field name=titleArticle without author/field
  nodocs/
/doc
doc
  field name=id1/field
  field name=titleArticle without author/field
  docs/docs
/doc
doc childfree=true
  field name=id1/field
  field name=titleArticle without author/field
/doc
{code} 
pick your favorite one! and more than that SolrJ's SolrInputDocument amendments 
are expected even more weird.   


 Support Update and Delete on nested documents
 -

 Key: SOLR-6096
 URL: https://issues.apache.org/jira/browse/SOLR-6096
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7.2
Reporter: Thomas Scheffler
  Labels: blockjoin, nested

 When using nested or child document. Update and delete operation on the root 
 document should also affect the nested documents, as no child can exist 
 without its parent :-)
 Example
 {code:xml|title=First Import}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, John/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 If I change my mind and the author was not named *John* but *_Jane_*:
 {code:xml|title=Changed name of author of '1'}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, Jane/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 I would expect that John is not in the index anymore. Currently he is. There 
 might also be the case that any subdocument is removed by an update:
 {code:xml|title=Remove author}
 doc
   field name=id1/field
   field name=titleArticle without author/field
 /doc
 {code}
 This should affect a delete on all nested documents, too. The same way all 
 nested documents should be deleted if I delete the root document:
 {code:xml|title=Deletion of '1'}
 delete
   id1/id
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 This is currently possible to do all this stuff on client side by issuing 
 additional request to delete document before every update. It would be more 
 efficient if this could be handled on SOLR side. One would benefit on atomic 
 update. The biggest plus shows when using delete-by-query. 
 {code:xml|title=Deletion of '1' by query}
 delete
   querytitle:*/query
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 In that case one would not have to first query all documents and issue 
 deletes by those id and every document that are nested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5690) expose sub-Terms from MultiTerms

2014-05-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003803#comment-14003803
 ] 

Uwe Schindler commented on LUCENE-5690:
---

bq. MultiTermsEnum and MultiDocsEnum both expose their subs

I don't see this in the code of MultiTermsEnum. Both have the arrays private 
and MultiTermsEnum has no getter (at least in trunk). MultiDocsEnum has a 
getter (not sure why). In MultiTermsEnum there is one public getter, but this 
one returns a package private class, so unuseable to the user (this is a bug) - 
should be removed.

This patch is fine if you make the methods return ListTerms and 
ListReaderSlice, both with 
{{Collections.unmodifiableList(Arrays.asList(...))}}. But ReaderSlice is also a 
more or less private class (its just public for cross-package access).

What's the reason to have those public at all?

 expose sub-Terms from MultiTerms
 

 Key: LUCENE-5690
 URL: https://issues.apache.org/jira/browse/LUCENE-5690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: LUCENE-5690.patch


 MultiTermsEnum and MultiDocsEnum both expose their subs.  It would be useful 
 to do the same for MultiTerms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents

2014-05-20 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003804#comment-14003804
 ] 

Mikhail Khludnev commented on SOLR-6096:


please get me right, but I don't feel like sending 
{{query\_root\_:1/query}} is a much problem at comparison to {{id1/id}}

{{deletequerytitle:*/query/delete}} can be fixed by nuking children 
right before removing parents explicitly
{code}
delete
query{!child of=type_s:parent}title:*/query
querytitle:*/query
/delete
{code}

I share your concern that it not so cute as it could be. 

I've got one thought. what if you configure that darn \_root\_ field as aт 
uniqueKey? if it will work fine in all cases, but you wouldn't happy to specify 
such odd uniqueKey, we can create special DirectUpdateHandler2 which can use 
uniqueKey instead of \_root\_. WDYT?

 Support Update and Delete on nested documents
 -

 Key: SOLR-6096
 URL: https://issues.apache.org/jira/browse/SOLR-6096
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7.2
Reporter: Thomas Scheffler
  Labels: blockjoin, nested

 When using nested or child document. Update and delete operation on the root 
 document should also affect the nested documents, as no child can exist 
 without its parent :-)
 Example
 {code:xml|title=First Import}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, John/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 If I change my mind and the author was not named *John* but *_Jane_*:
 {code:xml|title=Changed name of author of '1'}
 doc
   field name=id1/field
   field name=titleArticle with author/field
   doc
 field name=nameSmith, Jane/field
 field name=roleauthor/field
   /doc
 /doc
 {code}
 I would expect that John is not in the index anymore. Currently he is. There 
 might also be the case that any subdocument is removed by an update:
 {code:xml|title=Remove author}
 doc
   field name=id1/field
   field name=titleArticle without author/field
 /doc
 {code}
 This should affect a delete on all nested documents, too. The same way all 
 nested documents should be deleted if I delete the root document:
 {code:xml|title=Deletion of '1'}
 delete
   id1/id
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 This is currently possible to do all this stuff on client side by issuing 
 additional request to delete document before every update. It would be more 
 efficient if this could be handled on SOLR side. One would benefit on atomic 
 update. The biggest plus shows when using delete-by-query. 
 {code:xml|title=Deletion of '1' by query}
 delete
   querytitle:*/query
   !-- implying also
 query_root_:1/query
--
 /delete
 {code}
 In that case one would not have to first query all documents and issue 
 deletes by those id and every document that are nested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6098) SOLR console displaying JSON does not escape text properly

2014-05-20 Thread Kingston Duffie (JIRA)

Kingston Duffie created SOLR-6098:
-

 Summary: SOLR console displaying JSON does not escape text properly
 Key: SOLR-6098
 URL: https://issues.apache.org/jira/browse/SOLR-6098
 Project: Solr
  Issue Type: Bug
Reporter: Kingston Duffie
Priority: Minor


In the SOLR admin web console, when displaying JSON response for Query, the 
text is not being HTML escaped, so any text that happens to match HTML markup 
is being processed as HTML. 

For example, enter strikehello/strike in the q textbox and the 
responseHeader will contain:

q: body:hello where the hello portion is shown using strikeout.  

This seems benign, but can be extremely confusing when viewing results, because 
if your fields happen to contain, for example, f...@bar.com, this will be 
completely missing (because the browser treats this as an invalid tag).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in


[ 
https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003834#comment-14003834
 ] 

ASF subversion and git services commented on LUCENE-5670:
-

Commit 1596368 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1596368 ]

LUCENE-5670: add skip/FinalOutput to FST Outputs

 org.apache.lucene.util.fst.FST should skip over outputs it is not interested 
 in
 ---

 Key: LUCENE-5670
 URL: https://issues.apache.org/jira/browse/LUCENE-5670
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Christian Ziech
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch


 Currently the FST uses the read(DataInput) method from the Outputs class to 
 skip over outputs it actually is not interested in. For most use cases this 
 just creates some additional objects that are immediately destroyed again.
 When traversing an FST with non-trivial data however this can easily add up 
 to several excess objects that nobody actually ever read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases


 [ 
https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5691:


Attachment: LUCENE-5691.patch

patch with a test. We should backport to 4.x too

 DocTermsOrds lookupTerm is wrong in some cases
 --

 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5691.patch


 needs the following two conditions:
 * underlying termsenum supports ord()
 * the term you lookup would be inserted at the end (e.g. seek returns END)
 the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases

Robert Muir created LUCENE-5691:
---

 Summary: DocTermsOrds lookupTerm is wrong in some cases
 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5691.patch

needs the following two conditions:
* underlying termsenum supports ord()
* the term you lookup would be inserted at the end (e.g. seek returns END)

the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003836#comment-14003836
 ] 

Michael McCandless commented on LUCENE-5691:


+1, nice catch!

 DocTermsOrds lookupTerm is wrong in some cases
 --

 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5691.patch


 needs the following two conditions:
 * underlying termsenum supports ord()
 * the term you lookup would be inserted at the end (e.g. seek returns END)
 the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in


[ 
https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003840#comment-14003840
 ] 

ASF subversion and git services commented on LUCENE-5670:
-

Commit 1596369 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1596369 ]

LUCENE-5670: add skip/FinalOutput to FST Outputs

 org.apache.lucene.util.fst.FST should skip over outputs it is not interested 
 in
 ---

 Key: LUCENE-5670
 URL: https://issues.apache.org/jira/browse/LUCENE-5670
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Christian Ziech
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch


 Currently the FST uses the read(DataInput) method from the Outputs class to 
 skip over outputs it actually is not interested in. For most use cases this 
 just creates some additional objects that are immediately destroyed again.
 When traversing an FST with non-trivial data however this can easily add up 
 to several excess objects that nobody actually ever read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6098) SOLR console displaying JSON does not escape text properly

2014-05-20 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003837#comment-14003837
 ] 

Stefan Matheis (steffkes) commented on SOLR-6098:
-

You're not telling, which release you're referring to? from your description, 
that sounds a bit like SOLR-5174 which got fixed with 4.5. please let me know 
if that's your issue as well - in which case upgrading would already fix it and 
i'm going to close this as duplicate or it's something else and needs to be 
taken care of

 SOLR console displaying JSON does not escape text properly
 --

 Key: SOLR-6098
 URL: https://issues.apache.org/jira/browse/SOLR-6098
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Kingston Duffie
Priority: Minor

 In the SOLR admin web console, when displaying JSON response for Query, the 
 text is not being HTML escaped, so any text that happens to match HTML markup 
 is being processed as HTML. 
 For example, enter strikehello/strike in the q textbox and the 
 responseHeader will contain:
 q: body:hello where the hello portion is shown using strikeout.  
 This seems benign, but can be extremely confusing when viewing results, 
 because if your fields happen to contain, for example, f...@bar.com, this 
 will be completely missing (because the browser treats this as an invalid 
 tag).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6098) SOLR console displaying JSON does not escape text properly

2014-05-20 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-6098:


Component/s: web gui

 SOLR console displaying JSON does not escape text properly
 --

 Key: SOLR-6098
 URL: https://issues.apache.org/jira/browse/SOLR-6098
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Kingston Duffie
Priority: Minor

 In the SOLR admin web console, when displaying JSON response for Query, the 
 text is not being HTML escaped, so any text that happens to match HTML markup 
 is being processed as HTML. 
 For example, enter strikehello/strike in the q textbox and the 
 responseHeader will contain:
 q: body:hello where the hello portion is shown using strikeout.  
 This seems benign, but can be extremely confusing when viewing results, 
 because if your fields happen to contain, for example, f...@bar.com, this 
 will be completely missing (because the browser treats this as an invalid 
 tag).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in

2014-05-20 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5670.


Resolution: Fixed

Thanks Christian!

 org.apache.lucene.util.fst.FST should skip over outputs it is not interested 
 in
 ---

 Key: LUCENE-5670
 URL: https://issues.apache.org/jira/browse/LUCENE-5670
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Christian Ziech
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch


 Currently the FST uses the read(DataInput) method from the Outputs class to 
 skip over outputs it actually is not interested in. For most use cases this 
 just creates some additional objects that are immediately destroyed again.
 When traversing an FST with non-trivial data however this can easily add up 
 to several excess objects that nobody actually ever read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6097) Posting JSON with results in lost information

2014-05-20 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003845#comment-14003845
 ] 

Stefan Matheis (steffkes) commented on SOLR-6097:
-

you didn't link those issues .. but since i saw SOLR-6098 after that one and 
already commented on that .. i guess they are related?

 Posting JSON with   results in lost information
 -

 Key: SOLR-6097
 URL: https://issues.apache.org/jira/browse/SOLR-6097
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7.2
Reporter: Kingston Duffie

 Post the following JSON to add a document:
 { 
 add : 
{ 
commitWithin : 5000,
doc : 
{  
id : 12345,
body : a  b  c
}
 }
 }
 The body field is configured in the schema as:
field name=body type=text_hive indexed=true stored=true 
 required=false multiValued=false/
 and
 fieldType name=text_hive class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory minGramSize=1 
 maxGramSize=15 side=front/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 The problem is this:  After submitting this post, if you go to the SOLR 
 console and find this document, the stored body will be missing the contents 
 between the less-than and greater-than symbols -- i.e., a c.  
 If you encode the body (i.e.,  a lt; b gt; c), it will show up with  and 
  symbols.  That is, it appears that SOLR is stripping out HTML tags even 
 though we are not asking it to.
 Note that it is not only the storage but also indexing that is affected (as 
 we originally found the issue because searching for b would not match this 
 document.
 I'm willing to believe that I'm doing something wrong, but I can't see 
 anywhere in any spec that suggests that strings inside JSON need to be 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases


[ 
https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003851#comment-14003851
 ] 

ASF subversion and git services commented on LUCENE-5691:
-

Commit 1596370 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1596370 ]

LUCENE-5691: DocTermOrds lookupTerm is wrong in some cases

 DocTermsOrds lookupTerm is wrong in some cases
 --

 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5691.patch


 needs the following two conditions:
 * underlying termsenum supports ord()
 * the term you lookup would be inserted at the end (e.g. seek returns END)
 the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases

2014-05-20 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5691.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.9

 DocTermsOrds lookupTerm is wrong in some cases
 --

 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5691.patch


 needs the following two conditions:
 * underlying termsenum supports ord()
 * the term you lookup would be inserted at the end (e.g. seek returns END)
 the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases


[ 
https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003855#comment-14003855
 ] 

ASF subversion and git services commented on LUCENE-5691:
-

Commit 1596371 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1596371 ]

LUCENE-5691: DocTermOrds lookupTerm is wrong in some cases

 DocTermsOrds lookupTerm is wrong in some cases
 --

 Key: LUCENE-5691
 URL: https://issues.apache.org/jira/browse/LUCENE-5691
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5691.patch


 needs the following two conditions:
 * underlying termsenum supports ord()
 * the term you lookup would be inserted at the end (e.g. seek returns END)
 the fix is simple, it just needs to handle SeekStatus.END properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003858#comment-14003858
 ] 

ASF subversion and git services commented on LUCENE-4236:
-

Commit 1596372 from [~rcmuir] in branch 'dev/branches/lucene4236'
[ https://svn.apache.org/r1596372 ]

LUCENE-4236: create branch to try to cleanup

 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003860#comment-14003860
 ] 

ASF subversion and git services commented on LUCENE-4236:
-

Commit 1596373 from [~rcmuir] in branch 'dev/branches/lucene4236'
[ https://svn.apache.org/r1596373 ]

LUCENE-4236: commit current state

 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!

2014-05-20 Thread Chris Hostetter


: I committed a fix.

For those trying to keep track at home...

r1596346 fixed the assertSame failure in TestFaceting.

i then pinged rmuir on IRC to ask him about the seeminly unrelated 
(code-not-test) assert failure in DocValuesFacets.getCounts which was 
causing TestRandomFaceting to fail using the same seed -- that aparently 
lead him to file LUCENE-5691.  

Once that was committed both failures seem to be completley fixed.

Thanks rmuir.


: On Tue, May 20, 2014 at 1:58 PM, Robert Muir rcm...@gmail.com wrote:
:  On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter
:  hossman_luc...@fucit.org wrote:
: 
:  : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/
:  : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC
: 
:  The 2 facet test failures both related to DocValues and reproduce reliably
:  for me on my Linux box.
: 
:  Is it possible these are related to the UninvertingDirectoryReader
:  changes?
: 
: 
: 
:  I think its a bug in the assert. I'll take a look in 5 minutes.
: 
: -
: To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-05-20 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003888#comment-14003888
]

ASF subversion and git services commented on SOLR-5681:
---

Commit 1596379 from [~anshumg] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1596379 ]

SOLR-5681: Make the OverseerCollectionProcessor multi-threaded, merge from
trunk (r1596089)

Make the OverseerCollectionProcessor multi-threaded
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5690) expose sub-Terms from MultiTerms

2014-05-20 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003889#comment-14003889
 ] 

Yonik Seeley commented on LUCENE-5690:
--

bq. I don't see this in the code of MultiTermsEnum.
see MultiTermsEnum .getMatchArray()

bq. This patch is fine if you make the methods return ListTerms and 
ListReaderSlice
Like the other methods, MultiTermsEnum .getMatchArray() and 
MultiDocsEnum.getSubs, we shouldn't add additional overhead of object creation 
just to inspect an object.  These are expert level APIs that are not on the 
base class (hence will never be used casually).

bq. What's the reason to have those public at all?

Sometimes better efficiency, sometimes more information.  One example for this 
specific addition is that MultiTerms.size() always returns -1.  If we look at 
the sub-terms we can at least see what the number of terms for each segment is.

 expose sub-Terms from MultiTerms
 

 Key: LUCENE-5690
 URL: https://issues.apache.org/jira/browse/LUCENE-5690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: LUCENE-5690.patch


 MultiTermsEnum and MultiDocsEnum both expose their subs.  It would be useful 
 to do the same for MultiTerms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

[
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anshum Gupta resolved SOLR-5681.

Resolution: Fixed

Make the OverseerCollectionProcessor multi-threaded
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6099) Fix cleanup mechanism for a previous (failed) SPLITSHARD

Anshum Gupta created SOLR-6099:
--

 Summary: Fix cleanup mechanism for a previous (failed) SPLITSHARD
 Key: SOLR-6099
 URL: https://issues.apache.org/jira/browse/SOLR-6099
 Project: Solr
  Issue Type: Bug
Reporter: Anshum Gupta


Right now, SPLITSHARD tries to cleanup an under-construction/recovery shard 
using the deleteshard API if it already exists. DELETESHARD on the other hand 
will never delete a shard that's not INACTIVE/uses implicit routing.
This would just raise an exception and never really delete a shard right now.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations


[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003983#comment-14003983
 ] 

Shalin Shekhar Mangar commented on SOLR-6091:
-

Would it be a good idea to include the overseer leader node name for which the 
QUIT message is intended? I imagine that it'd help us catch wrong/extra QUITs 
and race conditions during testing?

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-20 Thread Jessica Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004008#comment-14004008
 ] 

Jessica Cheng commented on SOLR-6091:
-

I think that's a good idea, and maybe we can go further and include the id in 
the /overseer/leader file (e.g. 
91790253334528928-host:8983_solr-n_13) so that the Overseer would 
only quit if its ID actually matched it completely. It'd be like a CAS--do this 
operation only if the state I read to make this decision is still valid.

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004036#comment-14004036
 ] 

Shalin Shekhar Mangar commented on SOLR-6091:
-

bq. maybe we can go further and include the id in the /overseer/leader file 
(e.g. 91790253334528928-host:8983_solr-n_13) so that the Overseer 
would only quit if its ID actually matched it completely.

+1

I'll work up a patch and try it out.

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5666) Add UninvertingReader


[ 
https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004172#comment-14004172
 ] 

ASF subversion and git services commented on LUCENE-5666:
-

Commit 1596429 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1596429 ]

LUCENE-5666: Fix idea project files

 Add UninvertingReader
 -

 Key: LUCENE-5666
 URL: https://issues.apache.org/jira/browse/LUCENE-5666
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 5.0

 Attachments: LUCENE-5666.patch


 Currently the fieldcache is not pluggable at all. It would be better if 
 everything used the docvalues apis.
 This would allow people to customize the implementation, extend the classes 
 with custom subclasses with additional stuff, etc etc.
 FieldCache can be accessed via the docvalues apis, using the FilterReader api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004229#comment-14004229
 ] 

ASF subversion and git services commented on LUCENE-4236:
-

Commit 1596440 from [~rcmuir] in branch 'dev/branches/lucene4236'
[ https://svn.apache.org/r1596440 ]

LUCENE-4236: try to make more palatable

 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 165 - Failure

2014-05-20 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/165/

No tests ran.

Build Log:
[...truncated 53826 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease
 [copy] Copying 431 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene
 [copy] Copying 239 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr
 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7
 [exec] NOTE: output encoding is US-ASCII
 [exec] 
 [exec] Load release URL 
file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/...
 [exec] 
 [exec] Test Lucene...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.01 sec (13.3 MB/sec)
 [exec]   check changes HTML...
 [exec]   download lucene-4.9.0-src.tgz...
 [exec] 27.5 MB in 0.04 sec (660.8 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.9.0.tgz...
 [exec] 61.4 MB in 0.09 sec (648.5 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.9.0.zip...
 [exec] 71.0 MB in 0.16 sec (444.6 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   unpack lucene-4.9.0.tgz...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5697 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.9.0.zip...
 [exec] verify JAR metadata/identity/no javax.* or java.* classes...
 [exec] test demo with 1.7...
 [exec]   got 5697 hits for query lucene
 [exec] check Lucene's javadoc JAR
 [exec]   unpack lucene-4.9.0-src.tgz...
 [exec] make sure no JARs/WARs in src dist...
 [exec] run ant validate
 [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket 
 -Dtests.disableHdfs=true'...
 [exec] test demo with 1.7...
 [exec]   got 250 hits for query lucene
 [exec] generate javadocs w/ Java 7...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [exec] 
 [exec] Test Solr...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.00 sec (53.0 MB/sec)
 [exec] Traceback (most recent call last):
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 1347, in module
 [exec] main()
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 1291, in main
 [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, 
testArgs)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 1333, in smokeTest
 [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 410, in checkSigs
 [exec] testChanges(project, version, changesURL)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 458, in testChanges
 [exec] checkChangesContent(s, version, changesURL, project, True)
 [exec]   File 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py,
 line 485, in checkChangesContent
 [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' 
% (name, m.group(1)))
 [exec] RuntimeError: incorrect issue (_ instead of -) in 
file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr/changes/Changes.html:
 SOLR_3671
 [exec]   check changes HTML...

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:387:
 exec returned: 1

Total time: 53 minutes 57 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6098) SOLR console displaying JSON does not escape text properly

2014-05-20 Thread Kingston Duffie (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004239#comment-14004239
 ] 

Kingston Duffie commented on SOLR-6098:
---

Yes.  Sorry.  We are using 4.4.  I suspect this is the same issue and so this 
can be closed.

 SOLR console displaying JSON does not escape text properly
 --

 Key: SOLR-6098
 URL: https://issues.apache.org/jira/browse/SOLR-6098
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Kingston Duffie
Priority: Minor

 In the SOLR admin web console, when displaying JSON response for Query, the 
 text is not being HTML escaped, so any text that happens to match HTML markup 
 is being processed as HTML. 
 For example, enter strikehello/strike in the q textbox and the 
 responseHeader will contain:
 q: body:hello where the hello portion is shown using strikeout.  
 This seems benign, but can be extremely confusing when viewing results, 
 because if your fields happen to contain, for example, f...@bar.com, this 
 will be completely missing (because the browser treats this as an invalid 
 tag).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2014-05-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004240#comment-14004240
 ] 

ASF subversion and git services commented on LUCENE-4236:
-

Commit 1596442 from [~rcmuir] in branch 'dev/branches/lucene4236'
[ https://svn.apache.org/r1596442 ]

LUCENE-4236: move all crazies into one place

 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit


[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004300#comment-14004300
 ] 

Robert Muir commented on LUCENE-4236:
-

I tried to clean this up as much as I can...

OR tasks look fine (with BS1 disabled on both checkouts):

{noformat}
Task   QPS trunk  StdDev   QPS patch  StdDev
Pct diff
OrNotHighLow  958.58  (2.1%)  988.37  (2.7%)
3.1% (  -1% -8%)
  OrHighHigh   18.76  (8.2%)   19.83 (15.1%)
5.7% ( -16% -   31%)
   OrHighMed   43.12  (8.3%)   45.64 (15.3%)
5.9% ( -16% -   32%)
   OrHighLow   44.76  (9.1%)   47.92 (16.9%)
7.1% ( -17% -   36%)
OrNotHighMed  168.34  (3.4%)  189.12  (4.3%)   
12.3% (   4% -   20%)
   OrNotHighHigh   45.16  (3.5%)   60.43 (12.2%)   
33.8% (  17% -   51%)
   OrHighNotHigh   27.07  (3.4%)   36.27 (12.6%)   
34.0% (  17% -   51%)
OrHighNotMed   79.81  (3.7%)  111.10 (15.3%)   
39.2% (  19% -   60%)
OrHighNotLow   96.78  (4.0%)  137.49 (17.1%)   
42.1% (  20% -   65%)
{noformat}

It would also be nice to run the eg.filter.tasks but these are currently 
broken in luceneutil.


 clean up booleanquery conjunction optimizations a bit
 -

 Key: LUCENE-4236
 URL: https://issues.apache.org/jira/browse/LUCENE-4236
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
 LUCENE-4236.patch, LUCENE-4236.patch


 After LUCENE-3505, I want to do a slight cleanup:
 * compute the term conjunctions optimization in scorer(), so its applied even 
 if we have optional and prohibited clauses that dont exist in the segment 
 (e.g. return null)
 * use the term conjunctions optimization when optional.size() == 
 minShouldMatch, as that means they are all mandatory, too.
 * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
 it means we have required clauses and in general BS2 should do a much better 
 job (e.g. use advance).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir


[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004367#comment-14004367
 ] 

Dawid Weiss commented on LUCENE-5650:
-

Commit it to the branch, Ryan. I fixed it the way I understand how Solr's 
source code works (which is to say: vaguely familiar). I'm sure your patch is 
better.

The build yesterday ended with this failure related to perm. denied:
{code}
[13:07:35.284] ERROR   0.00s J1 | TestFoldingMultitermExtrasQuery (suite)
 
 Throwable #1: org.apache.solr.common.SolrException: SolrCore 'collection1' is 
 not available due to init failure: access denied (java.io.FilePermission 
 analysis-extras\solr\collection1\conf write)
{code}

and the following, I guess notorious offenders:
{code}
  - org.apache.solr.spelling.suggest.TestAnalyzeInfixSuggestions (could not 
remove temp. files)
  - TestBlendedInfixSuggestions (same thing)

  - org.apache.solr.cloud.SyncSliceTest.testDistribSearch
Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live 
SolrServers available to handle this request

  - org.apache.solr.cloud.RecoveryZkTest.testDistribSearch 
Throwable #1: java.lang.AssertionError: shard1 is not consistent.  Got 92 
from https://127.0.0.1:54379/_koq/fr/collection1lastClient and got 60 from 
https://127.0.0.1:54410/_koq/fr/collection1
{code}

I won't be able to return to this today (maybe in the evening). Change my DIH 
fixes to yours on the branch -- I don't mind at all.

 createTempDir and associated functions no longer create java.io.tmpdir
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch, dih.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir