date:20140107

Ramkumar Aiyengar created SOLR-5615:
---

 Summary: Deadlock while trying to recover after a ZK session expiry
 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.5, 4.4
Reporter: Ramkumar Aiyengar


The sequence of events which might trigger this is as follows:

 - Leader of a shard, say OL, has a ZK expiry
 - The new leader, NL, starts the election process
 - NL, through Overseer, clears the current leader (OL) for the shard from the 
cluster state
 - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
 - OL marks itself down
 - OL sets up watches for cluster state, and then retrieves it (with no leader 
for this shard)
 - NL, through Overseer, updates cluster state to mark itself leader for the 
shard
 - OL tries to register itself as a replica, and waits till the cluster state 
is updated
   with the new leader from event thread
 - ZK sends a watch update to OL, but it is blocked on the event thread waiting 
for it.

Oops. This finally breaks out after trying to register itself as replica times 
out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: Allow ConnectionManager.process to run from multi...

2014-01-07 Thread andyetitmoves

GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/13

Allow ConnectionManager.process to run from multiple threads

One potential fix for SOLR-5615. Hardly sure about whether this is the 
correct way to go about this, but it's a start I guess..

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andyetitmoves/lucene-solr 
on-recovery-deadlock-4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/13.patch


commit ad7ac506bc614d43f391aaad7ab25d9b426421c4
Author: Ramkumar Aiyengar raiyen...@bloomberg.net
Date:   2014-01-07T11:57:25Z

Allow ConnectionManager.process to run from multiple threads




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864250#comment-13864250
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Submitted https://github.com/apache/lucene-solr/pull/13 for one possible 
solution, though I am not sure if this is the right way to go about this..

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar

 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Remi Melisson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864328#comment-13864328
 ] 

Remi Melisson commented on LUCENE-5354:
---

Hi, any news about this feature ?
Could I do anything else ?

 Blended score in AnalyzingInfixSuggester
 

 Key: LUCENE-5354
 URL: https://issues.apache.org/jira/browse/LUCENE-5354
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 4.4
Reporter: Remi Melisson
Priority: Minor
  Labels: suggester
 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch


 I'm working on a custom suggester derived from the AnalyzingInfix. I require 
 what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) 
 to transform the suggestion weights depending on the position of the searched 
 term(s) in the text.
 Right now, I'm using an easy solution :
 If I want 10 suggestions, then I search against the current ordered index for 
 the 100 first results and transform the weight :
 bq. a) by using the term position in the text (found with TermVector and 
 DocsAndPositionsEnum)
 or
 bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
 searching
 and return the updated 10 most weighted suggestions.
 Since we usually don't need to suggest so many things, the bigger search + 
 rescoring overhead is not so significant but I agree that this is not the 
 most elegant solution.
 We could include this factor (here the position of the term) directly into 
 the index.
 So, I can contribute to this if you think it's worth adding it.
 Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
 dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Not sure given the info, but the patch doesn't seem crazy to me. I've made a 
few adjustments in this patch.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5613) Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter


[ 
https://issues.apache.org/jira/browse/SOLR-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864364#comment-13864364
 ] 

Shawn Heisey commented on SOLR-5613:


I upgraded commons-codec to 1.9 on an up-to-date branch_4x checkout and found 
that all tests (both Lucene and Solr) passed.  This was on a linux machine.  I 
wasn't too surprised by this.  I think we can accommodate this request easily.

Just for giggles, I went even further and upgraded all commons.apache.org 
components to the newest versions I could find via ivy.  All tests *still* 
passed.  This was on a Windows 8 machine.  With so many upgrades, I was really 
surprised it passed.

{code}
Index: lucene/ivy-versions.properties
===
--- lucene/ivy-versions.properties  (revision 1555313)
+++ lucene/ivy-versions.properties  (working copy)
@@ -19,16 +19,16 @@
 /com.ibm.icu/icu4j = 52.1
 /com.spatial4j/spatial4j = 0.3
 /com.sun.jersey/jersey-core = 1.16
-/commons-beanutils/commons-beanutils = 1.7.0
+/commons-beanutils/commons-beanutils = 1.9.0
 /commons-cli/commons-cli = 1.2
-/commons-codec/commons-codec = 1.7
+/commons-codec/commons-codec = 1.9
 /commons-collections/commons-collections = 3.2.1
-/commons-configuration/commons-configuration = 1.6
-/commons-digester/commons-digester = 2.0
-/commons-fileupload/commons-fileupload = 1.2.1
-/commons-io/commons-io = 2.1
+/commons-configuration/commons-configuration = 1.10
+/commons-digester/commons-digester = 2.1
+/commons-fileupload/commons-fileupload = 1.3
+/commons-io/commons-io = 2.4
 /commons-lang/commons-lang = 2.6
-/commons-logging/commons-logging = 1.1.1
+/commons-logging/commons-logging = 1.1.3
 /de.l3s.boilerpipe/boilerpipe = 1.1.0
 /dom4j/dom4j = 1.6.1
 /edu.ucar/netcdf = 4.2-min
{code}

I'm not advocating that we upgrade all the components at once, but it looks 
like we can indeed upgrade them all eventually.  I only ran the basic tests, so 
additional tests (nightly, weekly, etc) need to be done.


 Upgrade Apache Commons Codec to version 1.9 in order to improve performance 
 of BeiderMorseFilter
 

 Key: SOLR-5613
 URL: https://issues.apache.org/jira/browse/SOLR-5613
 Project: Solr
  Issue Type: Improvement
  Components: Rules, Schema and Analysis, search
Affects Versions: 3.6, 3.6.1, 3.6.2, 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 
 4.4, 4.5, 4.5.1, 4.6
Reporter: Thomas Champagne
  Labels: codec, commons, commons-codec, phonetic, search

 In version 1.9 of commons-codec project, there are a lot of optimizations in 
 the Beider Morse encoder. This is used by the BeiderMorseFilter in Solr. 
 Do you think it is possible to upgrade this dependency ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)


 [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5463:
---

Description: 
I'd like to revist a solution to the problem of deep paging in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous page.  This is similar 
to the cursor model I've seen in several other REST APIs that support 
pagnation over a large sets of results (notable the twitter API and it's 
since_id param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode

{panel:title=Basic Usage}
* send a request with {{sort=Xstart=0rows=NcursorMark=*}}
** sort can be anything, but must include the uniqueKey field (as a tie 
breaker) 
** N can be any number you want per page
** start must be 0
** \* denotes you want to use a cursor starting at the beginning mark
* parse the response body and extract the (String) {{nextCursorMark}} value
* Replace the \* value in your initial request params with the 
{{nextCursorMark}} value from the response in the subsequent request
* repeat until the {{nextCursorMark}} value stops changing, or you have 
collected as many docs as you need
{panel}


  was:
I'd like to revist a solution to the problem of deep paging in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous page.  This is similar 
to the cursor model I've seen in several other REST APIs that support 
pagnation over a large sets of results (notable the twitter API and it's 
since_id param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode



 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0

 Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got cluster state from ZK and successfully 
found the leader, so there is actually a leader at this point contrary to what 
it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}


 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for

[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:02 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in

[jira] [Created] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)

Steven Bower created SOLR-5616:
--

 Summary: Make grouping code use response builder needDocList
 Key: SOLR-5616
 URL: https://issues.apache.org/jira/browse/SOLR-5616
 Project: Solr
  Issue Type: Bug
Reporter: Steven Bower


Right now the grouping code does this to check if it needs to generate a 
docList for grouped results:

{code}
if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
false) ){
...
}
{code}

this is ugly because any new component that needs a doclist, from grouped 
results, will need to modify QueryComponent to add a check to this if. Ideally 
this should just use the rb.isNeedDocList() flag...

Coincidentally this boolean is really never used at for non-grouped results it 
always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:04 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down

// many more such disc events, and then the watches

2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:281] ZooKeeper watch triggered, but Solr cannot talk to ZK
2014-01-06 07:19:21,694 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:210] A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 112)
2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:234] ZooKeeper watch triggered, but Solr cannot talk to ZK

{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover

[jira] [Updated] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Bower updated SOLR-5616:
---

Attachment: SOLR-5616.patch

Here is a patch that makes this change. It's against trunk but should easily 
patch onto older versions. Ideally this would get onto a 4.x release..

 Make grouping code use response builder needDocList
 ---

 Key: SOLR-5616
 URL: https://issues.apache.org/jira/browse/SOLR-5616
 Project: Solr
  Issue Type: Bug
Reporter: Steven Bower
 Attachments: SOLR-5616.patch


 Right now the grouping code does this to check if it needs to generate a 
 docList for grouped results:
 {code}
 if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
 false) ){
 ...
 }
 {code}
 this is ugly because any new component that needs a doclist, from grouped 
 results, will need to modify QueryComponent to add a check to this if. 
 Ideally this should just use the rb.isNeedDocList() flag...
 Coincidentally this boolean is really never used at for non-grouped results 
 it always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864401#comment-13864401
 ] 

Mark Miller commented on SOLR-5615:
---

Thanks, perfect.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Nolan Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864402#comment-13864402
 ] 

Nolan Lawson commented on SOLR-5379:


[~markus17]: They're boosted equally.  It was the subject of [a 
bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/31].

[~iorixxx]: I just tested it out now.  I got:

{code}
(+(DisjunctionMaxQuery((text:president usa~5)) 
(((+DisjunctionMaxQuery((text:president united states of 
america~5)))/no_coord/no_coord // parsedQuery
+((text:president usa~5) ((+(text:president united states of america~5 
// parsedQuery.toString()
{code}

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864434#comment-13864434
 ] 

Mark Miller commented on SOLR-5615:
---

Okay, now it's more clear to me. We need to run onReconnect in a background 
thread I think.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864446#comment-13864446
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

That, incidentally, was my first attempt at a fix! (Should have a diff..) 
However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded? It 
should still work as a solution I guess..

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864460#comment-13864460
 ] 

Mark Miller commented on SOLR-5615:
---

bq. However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded?

But it holds the ConnectionManager this lock while it runs right? I think we 
just don't want to hold that lock while it runs. 

I think the other changes are likely okay too - I'm playing around with a 
combination of the two.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Fix Version/s: 4.6.1
   4.7
   5.0
 Assignee: Mark Miller

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Iterating BinaryDocValues

2014-01-07 Thread Mikhail Khludnev

Joel,

I tried to hack it straightforwardly, but found no free gain there. The
only attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right
now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?

Have a good hack!


On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote:


 Hi,

 I'm looking for a faster way to perform large scale docId - bytesRef
 lookups for BinaryDocValues.

 I'm finding that I can't get the performance that I need from the random
 access seek in the BinaryDocValues interface.

 I'm wondering if sequentially scanning the docValues would be a faster
 approach. I have a BitSet of matching docs, so if I sequentially moved
 through the docValues I could test each one against that bitset.

 Wondering if that approach would be faster for bulk extracts and how
 tricky it would be to add an iterator to the BinaryDocValues interface?

 Thanks,
 Joel




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

[jira] [Resolved] (SOLR-5614) Boost documents using map and query functions


 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5614.


Resolution: Invalid

please don't file a bug just because you've been waiting 24 hours for an answer 
to a question on the solr-user mailing list - sometimes it takes longer then 
that for people to answer.

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201312.mbox/%3c52c17579.30...@kelkoo.com%3E

 Boost documents using map and query functions
 -

 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz

 We want to boost documents that contain specific search terms in its fields. 
 We tried the following simplified query : 
 http://localhost:8983/solr/collection1/select?q=ipod 
 belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power
 And we get the following error : 
 org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
 'power'
 And the stacktrace :
 ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
 Infinite Recursion detected parsing query 'power'
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
 parsing query 'power'
 at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
 at org.apache.solr.search.QParser.subQuery(QParser.java:200)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864475#comment-13864475
 ] 

Mark Miller commented on SOLR-5615:
---

Even with the other changes, I like the idea of using a background thread 
because I don't think it's right that we do that whole reconnect process before 
we set that we are connected to zk and get out of the connection manager. I 
really don't think that process should hold up the connection manager at all - 
it's meant to just trigger it.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5617) Default classloader restrictions may be too tight

Shawn Heisey created SOLR-5617:
--

 Summary: Default classloader restrictions may be too tight
 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
 Fix For: 5.0, 4.7


SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864491#comment-13864491
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Fair enough. Would that allow multiple onReconnect.command () invocations to 
run simultaneously, and is that fine? (on mobile, so my reading of the patch 
could be wrong) What if we were in the process of recovering when we were 
unfortunate enough to get a second expiry thereby bringing all nodes down?

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Mikhail Khludnev (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864496#comment-13864496
]

Mikhail Khludnev commented on SOLR-5244:

bq. 1) Add a special cache that speeds up the docId- bytesRef lookup. This
would be a segment level cache of the top N terms (by frequency) in the index.
The cache would be a simple int to BytesRef hashmap, mapping the segment level
ord to the bytesRef

that's exactly what you've got on FieldCache.DEFAULT.getTerms() for an indexed
field without docvalues enabled. See. FieldCacheImpl.BinaryDocValuesCache

Full Search Result Export
-

Key: SOLR-5244
URL: https://issues.apache.org/jira/browse/SOLR-5244
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
Fix For: 5.0

Attachments: SOLR-5244.patch

It would be great if Solr could efficiently export entire search result sets
without scoring or ranking documents. This would allow external systems to
perform rapid bulk imports from Solr. It also provides a possible platform
for exporting results to support distributed join scenarios within Solr.
This ticket provides a patch that has two pluggable components:
1) ExportQParserPlugin: which is a post filter that gathers a BitSet with
document results and does not delegate to ranking collectors. Instead it puts
the BitSet on the request context.
2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints
the entire result as a binary stream. A header is provided at the beginning
of the stream so external clients can self configure.
Note:
These two components will be sufficient for a non-distributed environment.
For distributed export a new Request handler will need to be developed.
After applying the patch and building the dist or example, you can register
the components through the following changes to solrconfig.xml
Register export contrib libraries:
lib dir=../../../dist/ regex=solr-export-\d.*\.jar /

queryParser name=export
class=org.apache.solr.export.ExportQParserPlugin/

queryResponseWriter name=xbin
class=org.apache.solr.export.BinaryExportWriter/

The following query will perform the export:
{code}
http://localhost:8983/solr/collection1/select?q=*:*fq={!export}wt=xbinfl=join_i
{code}
Initial patch supports export of four data-types:
1) Single value trie int, long and float
2) Binary doc values.
The numerics are currently exported from the FieldCache and the Binary doc
values can be in memory or on disk.
Since this is designed to export very large result sets efficiently, stored
fields are not used for the export.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight


 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
 order to get those jars to work, I must turn off all SOLR-4882 safety 
 checking.
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
 it is searched automatically.  If I need to define a system property to make 
 this happen, I'm OK with that -- as long as I don't have to turn off the 
 safety checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864502#comment-13864502
 ] 

Mark Miller commented on SOLR-5615:
---

Yeah, I've been considered the same thing. My inclination was it was okay, but 
we may have to add something to cancel our leader election before joining the 
election to be sure.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight

[
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864505#comment-13864505
]

Shawn Heisey commented on SOLR-5617:

I will have to double-check, but I probably have the specifics that required me
to turn off the safety checking wrong. It may have been configuration
components gathered via xinclude, not jarfiles. Either way, I am sure that
everything is under the solr home.

Default classloader restrictions may be too tight
-

Key: SOLR-5617
URL: https://issues.apache.org/jira/browse/SOLR-5617
Project: Solr
Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
Labels: security
Fix For: 5.0, 4.7

SOLR-4882 introduced restrictions for the Solr class loader that cause
resources outside the instanceDir to fail to load. This is a very good goal,
but it also causes resources in $\{solr.solr.home\}/lib to fail to load. In
order to get those jars to work, I must turn off all SOLR-4882 safety
checking.
I can understand not wanting to load resources from an arbitrary path, but
the solr home and its children should be about as trustworthy as instanceDir.
Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since
it is searched automatically. If I need to define a system property to make
this happen, I'm OK with that -- as long as I don't have to turn off the
safety checking entirely.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query

2014-01-07 Thread Isaac Hebsh (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Isaac Hebsh closed SOLR-5611.
-

Resolution: Not A Problem

Oops. I missed the {{shards.rows}} parameter.

When documents are uniformly distributed over shards, enable returning
approximated results in distributed query

Key: SOLR-5611
URL: https://issues.apache.org/jira/browse/SOLR-5611
Project: Solr
Issue Type: Improvement
Components: SolrCloud
Reporter: Isaac Hebsh
Labels: distributed_search, shard, solrcloud
Fix For: 4.7

Query with rows=1000, which sent to a collection of 100 shards (shard key
behaviour is default - based on hash of the unique key), will generate 100
requests of rows=1000, on each shard.
This results to total number of rows*numShards unique keys to be retrieved.
This behaviour is getting worst as numShards grows.
If the documents are uniformly distributed over the shards, the expected
number of document should be ~ rows/numShards. Obviously, there might be
extreme cases, when all of the top X documents are in a specific shard.
I suggest adding an optional parameter, say approxResults=true, which decides
whether we should limit the rows in the shard requests to rows/numShardsor
not. Moreover, we can add a numeric parameter which increases the limit, to
be more accurate.
For example, the query {{approxResults=trueapproxResults.factor=1.5}} will
retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and
rows=1000, each shard will return 15 documents.
Furthermore, this can reduce the problem of deep paging, because the same
thing can be applied there. when requested start=10, Solr creating shard
request with start=0 and rows=START+ROWS. In the approximated approach, start
parameter (in the shard requests) can be set to 10/numShards. The idea of
the approxResults.factor creates some difficulties here, though.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5560) Enable LocalParams without escaping the query

2014-01-07 Thread Isaac Hebsh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864553#comment-13864553
 ] 

Isaac Hebsh commented on SOLR-5560:
---

Hi [~ryancutter], thank you a lot!
I'm not familiar with parser states (thank god), so I can't review the patch.

What action is should be performed in order to commit this patch? (into 4.7?)

 Enable LocalParams without escaping the query
 -

 Key: SOLR-5560
 URL: https://issues.apache.org/jira/browse/SOLR-5560
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.6
Reporter: Isaac Hebsh
 Fix For: 4.7, 4.6.1

 Attachments: SOLR-5560.patch


 This query should be a legit syntax:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 currently it isn't, because the LocalParams can be specified on a single term 
 only.
 [~billnbell] thinks it is a bug.
 From the mailing list:
 {quote}
 We want to set a LocalParam on a nested query. When quering with v inline 
 parameter, it works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\}
 the parsedquery_toString is
 +id:TERM1 +(text:term2 text:term3 text:term4 term5)
 Query using the _query_ also works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\
 (parsedquery is exactly the same).
 Obviously, there is the option of external parameter ({... 
 v=$nestedq}nestedq=...)
 This is a good solution, but it is not practical, when having a lot of such 
 nested queries.
 BUT, when trying to put the nested query in place, it yields syntax error:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'
 The previous options are less preferred, because the escaping that should be 
 made on the nested query.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev that adds what I think is a decent change anyway - before joining 
an election, cancel any known previous election participation.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings


[ 
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864579#comment-13864579
 ] 

Hoss Man commented on SOLR-5594:


* Aren't there other parsers classes that will need similar changes? 
(PrefixQParserPlugin, SimplerQParserPlugin at a minimum i think)
* I think your new FieldType.getPrefixQuery method has a back compat break for 
any existing FieldTypes that people might be using because it now calls 
readableToIndexed ... that smells like it could break things for some 
FieldTypes ... but maybe i'm missing something?
* FieldType.getPrefixQuery has lots of bogus cut/pasted javadocs from 
getRangeQuery
* Can't your MyIndexedBinaryField just subclass BinaryField to reduce some 
code?  for that matter: is there any reason why we shouldn't just make 
BinaryField implement prefix queries in the way your MyIndexedBinaryField does?
* i'm not sure i understand why you need BinaryTokenStream for the test (see 
previous comment about just extending/improving BinaryField) but if so perhaps 
it should be moved from lucene/core to lucene/test-framework?

 Enable using extended field types with prefix queries for non-default encoded 
 strings
 -

 Key: SOLR-5594
 URL: https://issues.apache.org/jira/browse/SOLR-5594
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, Schema and Analysis
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
 Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch


 Enable users to be able to use prefix query with custom field types with 
 non-default encoding/decoding for queries more easily. e.g. having a custom 
 field work with base64 encoded query strings.
 Currently, the workaround for it is to have the override at getRewriteMethod 
 level. Perhaps having the prefixQuery also use the calling FieldType's 
 readableToIndexed method would work better.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864583#comment-13864583
 ] 

Adrien Grand commented on LUCENE-5361:
--

Thanks Nik, your fix looks good! I don't think cloning the queries is an issue, 
it happens all the time when doing rewrites, and it's definitely better than 
modifying those queries in-place.

I'll commit it tomorrow if there is no objection.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies

Further adventures in token streams have motivated me to play tech
writer some more.

Options:

1. just create github pull requests.
2. reopen prior jira
3. make new jira

preference?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Pull requests versus JIRAta

2014-01-07 Thread Robert Muir

I think 1 or 3 is best.

The downside of 2 is just the confusion, since the other doc was good,
i dont think we have to reopen it.

i cant imagine anyone worried about having too many jiras with
documentation fixes!

On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies bimargul...@gmail.com wrote:
 Further adventures in token streams have motivated me to play tech
 writer some more.

 Options:

 1. just create github pull requests.
 2. reopen prior jira
 3. make new jira

 preference?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies

OK. Hopefully this time I'll remember to watch my own JIRA so that I
don't ignore Uwe.

On Tue, Jan 7, 2014 at 3:24 PM, Robert Muir rcm...@gmail.com wrote:
 I think 1 or 3 is best.

 The downside of 2 is just the confusion, since the other doc was good,
 i dont think we have to reopen it.

 i cant imagine anyone worried about having too many jiras with
 documentation fixes!

 On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 Further adventures in token streams have motivated me to play tech
 writer some more.

 Options:

 1. just create github pull requests.
 2. reopen prior jira
 3. make new jira

 preference?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: The Old Git Discussion

2014-01-07 Thread Lajos

I've followed this thread with interest, and although I'm (sadly) a 
lapsed Apache committer (not Lucene/Solr), I finally had to comment as 
I've just gone through the pain of learning git after many happy years 
with svn.


In my long experience in IT I've learned one incontrovertible fact: most 
times, the technical merits of one technology over another are not 
nearly as important as everyone thinks. It is really all about how WELL 
you use a given technology to get the job done. The stuff I do in git 
now, I could do in SVN, and vice versa. I'd wager I could do the same in 
CVS or even older technologies. It like Ant versus Maven versus Gradle. 
I can do the same in each of these. Each has their own good and bad 
points. I'll stick with Ant and SVN to the end but hey, if a client 
works only with Gradle and Git and XYZ technology and has an 
intellectual investment there, I'm not gonna argue the point on 
technical merits.


That being said, I think the worst argument one could make about 
anything is that we should move to it because everyone else is. People 
will flock to fads as much (I could argue: more) than to genuine 
technical improvements (anyone remember the 70s? 80s? 90s?). Git feels a 
bit faddish to me, and is definitely immature. I get some of the 
advantages, but I don't think I should have to be a gitk expert to use 
the damn software - its over-engineered and actually opens up the door 
to more convoluted development processes.


Whether Git is a fad or not, the issue, as pointed out below, is 
supporting the way contributors work. The win-win situation would be to 
keep the core based on SVN but support git contributions (as I know 
someone else suggested). SVN is a technology that is stable and which 
all core committers know like the back of their hands - no sense in 
wasting time learning git when people are donating time and that time is 
better spent on JIRAs. What I don't know is how this GIT integration 
would work, but I'd hope it could be done.


Just to push home the point, I'll bet most of us who have been around a 
while have plenty of stories of IT shops moving from one technology to 
another ... and then in a few years to another ... and then to another - 
all because some manager got a burr up his rear or was wined and dined 
by a vendor. Why? Why hurt productivity for the sake of keep up with the 
times? How about setting an example of sticking with what works despite 
the made rush to github?


My €.02.

Lajos Moczar




On 06/01/2014 17:01, Robert Muir wrote:

On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller markrmil...@gmail.com wrote:

My point here is not really to discuss the merits of Git VS SVN on a feature
/ interface basis. We might as well talk about MySQL vs Postgres.

Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
and like everyone, it's pretty much second nature.

The problem is the world is moving. It may not be clear to everyone yet, but
give it a bit more time and it will be.

Git already owns the open source world. It rivals SVN by most guesses in the
proprietary world. This is a strong hard trend. The same trend that saw SVN
eat CVS. I think clearly, a distributed version control system will
dominate. And clearly Git has won.

I'm not ready to call a vote, because I don't think it's critical we switch
yet. But I wanted to continue the discussion, as obviously, plenty of it
will be needed over time before we made such a switch.

It's not about one thing being better than the other. It's about using what
everyone else uses so you don't provide a barrier to contribution. It's
about the post I linked to when I started this thread.

I personally don't care about pull requests and Github. I don't think any of
it's features are that great, other than it acts as a central repo. Git is
not good because of Github IMO. But Git and Github are eating the world.

Most of the patches I have processed now are made against Git. Jumping from
SVN to Git and back is very annoying IMO though. There are plenty of tools
and workflows for it and they all suck.

Anyway, as the trend continues, it will become even more obvious that
Lucene/Solr will start looking stale on SVN. We have enough image problems
in terms of being modern at Apache. We will need to manage the ones we can.

We should not choose the tools that simply make us fuzzy and comfortable.
We should choose the tools that are best for the project and future
contributions in the long term.

- Mark




The idea that this has anything to do with contributors is misleading.

Today contributors can use either SVN or GIT. They have their choice.
How can it be any better than that for contributors?

As demonstrated over the weekend, its also possible today for
contributors to use svn+jira or git+pull request workflow.

As i said earlier, why not spend our time trying to make it easier on
contributors and support

Re: Iterating BinaryDocValues

2014-01-07 Thread Michael McCandless

Going sequentially should help, if the pages are not hot (in the OS's IO cache).

You can also use a different DVFormat, e.g. Direct, but this holds all
bytes in RAM.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Joel,

I tried to hack it straightforwardly, but found no free gain there. The only
attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
right now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?

Have a good hack!

On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote:

Hi,

I'm looking for a faster way to perform large scale docId - bytesRef
lookups for BinaryDocValues.

I'm finding that I can't get the performance that I need from the random
access seek in the BinaryDocValues interface.

I'm wondering if sequentially scanning the docValues would be a faster
approach. I have a BitSet of matching docs, so if I sequentially moved
through the docValues I could test each one against that bitset.

Wondering if that approach would be faster for bulk extracts and how
tricky it would be to add an iterator to the BinaryDocValues interface?

Thanks,
Joel

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864683#comment-13864683
 ] 

Michael McCandless commented on LUCENE-5354:


Woops, sorry, this fell below the event horizon of my TODO list.  I'll look at 
your new patch soon.

There is an existing performance test, LookupBenchmarkTest, but it's a bit 
tricky to run.  See the comment on LUCENE-5030: 
https://issues.apache.org/jira/browse/LUCENE-5030?focusedCommentId=13689155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13689155

 Blended score in AnalyzingInfixSuggester
 

 Key: LUCENE-5354
 URL: https://issues.apache.org/jira/browse/LUCENE-5354
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 4.4
Reporter: Remi Melisson
Priority: Minor
  Labels: suggester
 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch


 I'm working on a custom suggester derived from the AnalyzingInfix. I require 
 what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) 
 to transform the suggestion weights depending on the position of the searched 
 term(s) in the text.
 Right now, I'm using an easy solution :
 If I want 10 suggestions, then I search against the current ordered index for 
 the 100 first results and transform the weight :
 bq. a) by using the term position in the text (found with TermVector and 
 DocsAndPositionsEnum)
 or
 bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
 searching
 and return the updated 10 most weighted suggestions.
 Since we usually don't need to suggest so many things, the bigger search + 
 rescoring overhead is not so significant but I agree that this is not the 
 most elegant solution.
 We could include this factor (here the position of the term) directly into 
 the index.
 So, I can contribute to this if you think it's worth adding it.
 Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
 dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864689#comment-13864689
]

Joel Bernstein commented on SOLR-5244:
--

I'll do some testing of the performance of this. Unless I'm missing something
though, it looks like you have go through a PagedBytes.Reader,
PackedInts.Reader to get the BytesRef. I think would perform with similar
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc
IntObjectOpenHashMap, which I should been able to do 10 million+ read
operations per second.

Full Search Result Export
-

Attachments: SOLR-5244.patch

queryParser name=export
class=org.apache.solr.export.ExportQParserPlugin/

queryResponseWriter name=xbin
class=org.apache.solr.export.BinaryExportWriter/

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: The Old Git Discussion

2014-01-07 Thread Mark Miller

I don’t really buy the fad argument, but as I’ve said, I’m willing to wait a 
little longer for others to catch on. I try and follow the stats and reports 
and articles on this pretty closely.

As I mentioned early in the thread, by all appearances, the shift from SVN to 
GIT looks much like the shift from CVS to SVN. This was not a fad change, nor 
is the next mass movement likely to be.

Just like no one starts a project on CVS anymore, we are almost already to the 
point where new projects start exclusive on GIT - especially open source.

I’m happy to sit back and watch the trend continue though. The number of GIT 
users in the committee and among the committers only grows every time the 
discussion comes up.

If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad 
argument. But it just doesn’t jive in 2014.

- Mark

On Jan 7, 2014, at 3:33 PM, Lajos la...@protulae.com wrote:

 I've followed this thread with interest, and although I'm (sadly) a lapsed 
 Apache committer (not Lucene/Solr), I finally had to comment as I've just 
 gone through the pain of learning git after many happy years with svn.
 
 In my long experience in IT I've learned one incontrovertible fact: most 
 times, the technical merits of one technology over another are not nearly as 
 important as everyone thinks. It is really all about how WELL you use a given 
 technology to get the job done. The stuff I do in git now, I could do in SVN, 
 and vice versa. I'd wager I could do the same in CVS or even older 
 technologies. It like Ant versus Maven versus Gradle. I can do the same in 
 each of these. Each has their own good and bad points. I'll stick with Ant 
 and SVN to the end but hey, if a client works only with Gradle and Git and 
 XYZ technology and has an intellectual investment there, I'm not gonna argue 
 the point on technical merits.
 
 That being said, I think the worst argument one could make about anything is 
 that we should move to it because everyone else is. People will flock to 
 fads as much (I could argue: more) than to genuine technical improvements 
 (anyone remember the 70s? 80s? 90s?). Git feels a bit faddish to me, and is 
 definitely immature. I get some of the advantages, but I don't think I should 
 have to be a gitk expert to use the damn software - its over-engineered and 
 actually opens up the door to more convoluted development processes.
 
 Whether Git is a fad or not, the issue, as pointed out below, is supporting 
 the way contributors work. The win-win situation would be to keep the core 
 based on SVN but support git contributions (as I know someone else 
 suggested). SVN is a technology that is stable and which all core committers 
 know like the back of their hands - no sense in wasting time learning git 
 when people are donating time and that time is better spent on JIRAs. What I 
 don't know is how this GIT integration would work, but I'd hope it could be 
 done.
 
 Just to push home the point, I'll bet most of us who have been around a while 
 have plenty of stories of IT shops moving from one technology to another ... 
 and then in a few years to another ... and then to another - all because some 
 manager got a burr up his rear or was wined and dined by a vendor. Why? Why 
 hurt productivity for the sake of keep up with the times? How about setting 
 an example of sticking with what works despite the made rush to github?
 
 My €.02.
 
 Lajos Moczar
 
 
 
 
 On 06/01/2014 17:01, Robert Muir wrote:
 On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller markrmil...@gmail.com wrote:
 My point here is not really to discuss the merits of Git VS SVN on a feature
 / interface basis. We might as well talk about MySQL vs Postgres.
 
 Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
 That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
 and like everyone, it's pretty much second nature.
 
 The problem is the world is moving. It may not be clear to everyone yet, but
 give it a bit more time and it will be.
 
 Git already owns the open source world. It rivals SVN by most guesses in the
 proprietary world. This is a strong hard trend. The same trend that saw SVN
 eat CVS. I think clearly, a distributed version control system will
 dominate. And clearly Git has won.
 
 I'm not ready to call a vote, because I don't think it's critical we switch
 yet. But I wanted to continue the discussion, as obviously, plenty of it
 will be needed over time before we made such a switch.
 
 It's not about one thing being better than the other. It's about using what
 everyone else uses so you don't provide a barrier to contribution. It's
 about the post I linked to when I started this thread.
 
 I personally don't care about pull requests and Github. I don't think any of
 it's features are that great, other than it acts as a central repo. Git is
 not good because of Github IMO. But Git and Github are eating the world.
 
 Most of the patches I have processed now are

[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight


 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but what if you have common resources like included config files that are 
outside instanceDir but are still fully inside the solr home?

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
automatically.  If I need to define a system property to make this happen, I'm 
OK with that -- as long as I don't have to turn off the safety checking 
entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5617) Default classloader restrictions may be too tight


[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864505#comment-13864505
 ] 

Shawn Heisey edited comment on SOLR-5617 at 1/7/14 9:44 PM:


Here's a stacktrace from my attempted start on 4.6.0 without the option to 
allow unsafe resource loading.  The solr home is /index/solr4:

{noformat}
ERROR - 2014-01-07 14:37:05.493; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: SolrCore 's1build' is not available 
due to init failure: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:825)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:293)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1476)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:532)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:599)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:245)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
systemId: solrres:/solrconfig.xml; lineNumber: 7; columnNumber: 70; An include 
with href '../../../config/common/luceneMatchVersion.xml'failed, and no 
fallback element was found.
at org.apache.solr.core.Config.init(Config.java:148)
at org.apache.solr.core.Config.init(Config.java:86)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:129)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:529)
... 11 more
Caused by: org.xml.sax.SAXParseException; systemId: solrres:/solrconfig.xml; 
lineNumber: 7; columnNumber: 70; An include with href 
'../../../config/common/luceneMatchVersion.xml'failed,

[jira] [Created] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Benson Margulies (JIRA)

Benson Margulies created LUCENE-5388:


 Summary: Eliminate construction over readers for Tokenizer
 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies


In the modern world, Tokenizers are intended to be reusable, with input 
supplied via #setReader. The constructors that take Reader are a vestige. Worse 
yet, they invite people to make mistakes in handling the reader that tangle 
them up with the state machine in Tokenizer. The sensible thing is to eliminate 
these ctors, and force setReader usage.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)

Benson Margulies created LUCENE-5389:


 Summary: Even more doc for construction of TokenStream components
 Key: LUCENE-5389
 URL: https://issues.apache.org/jira/browse/LUCENE-5389
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies


There are more useful things to tell would-be authors of tokenizers. Let's tell 
them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-5170:
--

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt

Adds recipDistance scoring, lat/long is one param.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864738#comment-13864738
 ] 

Jeff Wartes commented on SOLR-5170:
---

I've been using this patch with some minor tweaks and solr 4.3.1 in production 
for about six months now. Since I was applying it again against 4.6 this 
morning, I figured I should attach my tweaks, and mention it passes tests 
against 4.6.

This does NOT address the design issues David raises in the initial comment. 
The changes vs the initial patchfile allow it to be applied against a greater 
range of solr versions, and brings it a little closer to feeling the same as 
geofilt's params.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight


[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864741#comment-13864741
 ] 

Shawn Heisey commented on SOLR-5617:


I have figured out a workaround.  I've got a config structure that heavily uses 
xinclude and symlinks.  By changing things around so that only the symlinks 
traverse upwards and xinclude only refers to local files, I no longer need to 
enable unsafe loading.

I still think that it would be useful to fix this issue, but the urgency is 
gone.

 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight


 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Priority: Minor  (was: Major)

 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
Priority: Minor
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864742#comment-13864742
 ] 

Robert Muir commented on LUCENE-5388:
-

+1, its really silly its this way. I guess its the right thing to do this for 
5.0 only: i wish we had done it for 4.0, but it is what it is.

Should be a rather large and noisy change unfortunately. I can help, let me 
know.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864689#comment-13864689
]

Joel Bernstein edited comment on SOLR-5244 at 1/7/14 10:12 PM:
---

I'll do some testing of the performance of this. Unless I'm missing something
though, it looks like you have go through a PagedBytes.Reader,
PackedInts.Reader to get the BytesRef. I think would have similar performance
to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc
IntObjectOpenHashMap, which I should been able to do 10 million+ read
operations per second.

was (Author: joel.bernstein):
I'll do some testing of the performance of this. Unless I'm missing something
though, it looks like you have go through a PagedBytes.Reader,
PackedInts.Reader to get the BytesRef. I think would perform with similar
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc
IntObjectOpenHashMap, which I should been able to do 10 million+ read
operations per second.

Full Search Result Export
-

Attachments: SOLR-5244.patch

queryParser name=export
class=org.apache.solr.export.ExportQParserPlugin/

queryResponseWriter name=xbin
class=org.apache.solr.export.BinaryExportWriter/

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

oom in documentation-lint

2014-01-07 Thread Benson Margulies

Is there a recipe to avoid this?

-documentation-lint:
 [echo] checking for broken html...
[ivy:cachepath] downloading
http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
...
[ivy:cachepath]
..
(244kB)
[ivy:cachepath] .. (0kB)
[ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
[jtidy] Checking for broken html (such as invalid tags)...

BUILD FAILED
/Users/benson/asf/lucene-solr/build.xml:57: The following error
occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
at java.io.BufferedWriter.write(BufferedWriter.java:230)
at java.io.PrintWriter.write(PrintWriter.java:456)
at java.io.PrintWriter.write(PrintWriter.java:473)
at java.io.PrintWriter.print(PrintWriter.java:603)
at java.io.PrintWriter.println(PrintWriter.java:739)
at org.w3c.tidy.Report.printMessage(Report.java:754)
at org.w3c.tidy.Report.errorSummary(Report.java:1572)
at org.w3c.tidy.Tidy.parse(Tidy.java:608)
at org.w3c.tidy.Tidy.parse(Tidy.java:263)
at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)

Total time: 3 minutes 35 seconds

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: LUCENE-5389: more analysis advice.

2014-01-07 Thread benson-basis

GitHub user benson-basis opened a pull request:

https://github.com/apache/lucene-solr/pull/14

LUCENE-5389: more analysis advice.

Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benson-basis/lucene-solr 
lucene-5389-more-analysis-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/14.patch


commit 1ddc14c97396183ac99fb9ee5a40bdc09b3994c5
Author: Benson Margulies ben...@basistech.com
Date:   2014-01-07T22:52:11Z

LUCENE-5389: more analysis advice.
Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864825#comment-13864825
 ] 

Benson Margulies commented on LUCENE-5389:
--

https://github.com/apache/lucene-solr/pull/14



 Even more doc for construction of TokenStream components
 

 Key: LUCENE-5389
 URL: https://issues.apache.org/jira/browse/LUCENE-5389
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies

 There are more useful things to tell would-be authors of tokenizers. Let's 
 tell them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: oom in documentation-lint

2014-01-07 Thread Robert Muir

The jtidy-macro we use is not very efficient. It just uses the
built-in jtidytask.

I think this is a real problem, last i checked it seemed impossible to
fix without writing a custom task to integrate with jtidy.

we could either disable it, or you could try setting a large Xmx in
ANT_OPTS as a workaround, but I do think we need to fix or disable
this.

On Tue, Jan 7, 2014 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote:
 Is there a recipe to avoid this?

 -documentation-lint:
  [echo] checking for broken html...
 [ivy:cachepath] downloading
 http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
 ...
 [ivy:cachepath]
 ..
 (244kB)
 [ivy:cachepath] .. (0kB)
 [ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
 [jtidy] Checking for broken html (such as invalid tags)...

 BUILD FAILED
 /Users/benson/asf/lucene-solr/build.xml:57: The following error
 occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
 error occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
 error occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
 at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
 at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
 at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
 at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
 at java.io.BufferedWriter.write(BufferedWriter.java:230)
 at java.io.PrintWriter.write(PrintWriter.java:456)
 at java.io.PrintWriter.write(PrintWriter.java:473)
 at java.io.PrintWriter.print(PrintWriter.java:603)
 at java.io.PrintWriter.println(PrintWriter.java:739)
 at org.w3c.tidy.Report.printMessage(Report.java:754)
 at org.w3c.tidy.Report.errorSummary(Report.java:1572)
 at org.w3c.tidy.Tidy.parse(Tidy.java:608)
 at org.w3c.tidy.Tidy.parse(Tidy.java:263)
 at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
 at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
 at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)

 Total time: 3 minutes 35 seconds

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-07 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864855#comment-13864855
 ] 

Anshum Gupta commented on SOLR-5477:


bq. in my experience, when implementing an async callback API like this, it can 
be handy to require the client to specify the magical...

Considering that we have a 1-n relationship between calls made by the client to 
the OCP and OCP to Cores, we can't really use the client generated id. We would 
anyways need multiple ids be generated at the OCP-Core call level.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1201 - Failure!

2014-01-07 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1201/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 9939 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140107_235447_516.syserr
   [junit4]  JVM J0: stderr (verbatim) 
   [junit4] java(208,0x149d18000) malloc: *** error for object 0x149d06ad1: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4]  JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=6B057318ACC0851A -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 -classpath

[jira] [Created] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

Hoss Man created SOLR-5618:
--

 Summary: Reproducible failure from 
TestFiltering.testRandomFiltering
 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


uwe's jenkins found this in java8...

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
   [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
{noformat}

The seed fails consistently for me on trunk using java7, and on 4x using both 
java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering


[ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864911#comment-13864911
 ] 

Hoss Man commented on SOLR-5618:


Relevant log snipper from jenkins...

{noformat}
   [junit4]   2 558586 T3202 C2360 oasc.SolrCore.execute [collection1] 
webapp=null path=null 
params={q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}fq={!frange+v%3Dval_i+l%3D0+u%3D1}fq={!+cost%3D92}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}fq={!+cache%3Dtrue+tag%3Dt}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}}
 hits=0 status=0 QTime=1 
   [junit4]   2 558586 T3202 oas.SolrTestCaseJ4.assertJQ ERROR query failed 
JSON validation. error=mismatch: '1'!='0' @ response/numFound
   [junit4]   2 expected =/response/numFound==1
   [junit4]   2 response = {
   [junit4]   2  responseHeader:{
   [junit4]   2status:0,
   [junit4]   2QTime:1},
   [junit4]   2  response:{numFound:0,start:0,docs:[]
   [junit4]   2  }}
   [junit4]   2
   [junit4]   2 request = 
q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}fq={!frange+v%3Dval_i+l%3D0+u%3D1}fq={!+cost%3D92}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}fq={!+cache%3Dtrue+tag%3Dt}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}
   [junit4]   2 558587 T3202 oasc.SolrException.log ERROR 
java.lang.RuntimeException: mismatch: '1'!='0' @ response/numFound
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:732)
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:679)
   [junit4]   2at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:316)
...
   [junit4]   2 558588 T3202 oass.TestFiltering.testRandomFiltering ERROR 
FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 
tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange 
v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! 
cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]   2 558588 T3202 oas.SolrTestCaseJ4.tearDown ###Ending 
testRandomFiltering
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
   [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
   [junit4]at java.lang.Thread.run(Thread.java:744)
{noformat}
{noformat}



 Reproducible failure from TestFiltering.testRandomFiltering
 ---

 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 uwe's jenkins found this in java8...
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
 -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
 -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
[junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
 v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
 {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
 tag=t}-_query_:{!frange v=val_i l=1 u=1}]
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
[junit4]  at 
 org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
 {noformat}
 The seed fails consistently for me on trunk using java7, and on 4x using both 
 java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: The Old Git Discussion

2014-01-07 Thread David Smiley (@MITRE.org)

+1, Mark.

Git isn't perfect; I sympathize with the annoyances pointed out by Rob et.
all.  But I think we would be better off for it -- a net win considering the
upsides.  In the end I'd love to track changes via branches (which includes
forks people make to add changes), not with attaching patch files to an
issue tracker.  The way we do things here sucks for collaboration and it's a
higher bar for people to get involved than it can and should be.

~ David


Mark Miller-3 wrote
 I don’t really buy the fad argument, but as I’ve said, I’m willing to wait
 a little longer for others to catch on. I try and follow the stats and
 reports and articles on this pretty closely.
 
 As I mentioned early in the thread, by all appearances, the shift from SVN
 to GIT looks much like the shift from CVS to SVN. This was not a fad
 change, nor is the next mass movement likely to be.
 
 Just like no one starts a project on CVS anymore, we are almost already to
 the point where new projects start exclusive on GIT - especially open
 source.
 
 I’m happy to sit back and watch the trend continue though. The number of
 GIT users in the committee and among the committers only grows every time
 the discussion comes up.
 
 If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad
 argument. But it just doesn’t jive in 2014.
 
 - Mark





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering