[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-27 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882621#comment-13882621
 ] 

Shalin Shekhar Mangar commented on SOLR-5477:
-

Thanks Anshum.

# Why is it called a taskQueue in CoreAdminHandler? There is no queueing 
happening here.
# Why is the taskQueue defined as a MapString, MapString, TaskObject? It 
can simply be a MapString, TaskObject. The task object itself can contain a 
volatile status flag to indicate running/completed/failure.
# The CoreAdminHandler.addTask with limit=true just removes a random (first?) 
entry if the limit is reached. It should remove the oldest entry instead.
# OverseerCollectionProcessor.requestStatus returns response with “success” 
even if requestid is found in “running” or “failure” map
# The ‘migrate’ api doesn’t use async core admin requests
# In all places where synchronous calls have been replaced with 
waitForAsyncCallsToComplete calls, we need to ensure that the correct response 
messages are returned on failures. Right now, the waitForAsyncCallToComplete 
method returns silently on detecting failure.
# Although there is a provision to clear the overseer status maps by passing 
requestid=1, it is never actually called. When do you intend to call this api?
# I don’t understand why we need three different maps for 
running/completed/failure for overseer collection processor. My comment #2 
applies here too. We can store the status in the value bytes instead of keeping 
three different maps and moving the key around. What do you think?

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-27 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882627#comment-13882627
 ] 

Anshum Gupta commented on SOLR-5477:


bq. Why is it called a taskQueue in CoreAdminHandler? There is no queueing 
happening here.
Changed it. Had that change on my machine before you mentioned :)

bq. Why is the taskQueue defined as a MapString, MapString, TaskObject? It 
can simply be a MapString, TaskObject.
{quote} I don’t understand why we need three different maps for 
running/completed/failure for overseer collection processor. My comment #2 
applies here too. We can store the status in the value bytes instead of keeping 
three different maps and moving the key around. What do you think? {quote}
It takes away the ability (or atleast makes it too complicated) to limit number 
of tasks in a particular state e.g. limiting storage of 50 completed tasks only.

bq. The CoreAdminHandler.addTask with limit=true just removes a random (first?) 
entry if the limit is reached.
It removes the first element. Its a synchronized LinkedHashMap so the iterator 
preserves order and returns the first element.

bq. OverseerCollectionProcessor.requestStatus returns response with “success” 
even if requestid is found in “running” or “failure” map
Success was supposed to mean that the task was found in a status map. It might 
actually make sense to change it. Thanks for the suggestion.

bq. Although there is a provision to clear the overseer status maps by passing 
requestid=1, it is never actually called. 
The intention is for the user to explicitly call the API. There's no concept of 
a map/queue in zk that maintains insertion state.
you'd have to check it, order it and then delete the apt one every time the 
numChildren exceeds the limit. I thought it was best left to the user.

Will upload a patch with the following:
* Migrate API to also use the ASYNC CoreAdmin requests.
* Store the failed tasks information from CoreAdmin async calls in case of 
Collection API requests.
* Tests for 
** migratekey (and other calls) in ASYNC mode.
** Failing Collection API calls.


 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-27 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5477:
---

Attachment: SOLR-5477.patch

Fixed the following:
* Changed the var name from Queue to Map.
* Response structure from OCP async calls changed. Now it's:
{code:xml}
status
  state
  msg
/status
{code}


 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-27 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882641#comment-13882641
 ] 

Anshum Gupta edited comment on SOLR-5477 at 1/27/14 8:53 AM:
-

Fixed the following:
* Changed the var name from Queue to Map.
* Response structure from OCP async calls changed. Now it's:
{code:xml}
status
  staterunning|failed|completed|notfound/state
  msgapt message/msg
/status
{code}



was (Author: anshumg):
Fixed the following:
* Changed the var name from Queue to Map.
* Response structure from OCP async calls changed. Now it's:
{code:xml}
status
  state
  msg
/status
{code}


 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1088: POMs out of sync

2014-01-27 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1088/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.OverseerTest.testOverseerFailure

Error Message:
KeeperErrorCode = NodeExists for /collections/collection1/leaders/shard1

Stack Trace:
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /collections/collection1/leaders/shard1
at 
__randomizedtesting.SeedInfo.seed([2BCA93FBB0E2264:6B426CCA9ABCD45]:0)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:428)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:425)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:382)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:369)
at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:156)
at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289)
at 
org.apache.solr.cloud.OverseerTest$MockZKController.publishState(OverseerTest.java:153)
at 
org.apache.solr.cloud.OverseerTest.testOverseerFailure(OverseerTest.java:584)




Build Log:
[...truncated 52851 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:476: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:176: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/extra-targets.xml:77:
 Java returned: 1

Total time: 132 minutes 55 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Span Not Queries

2014-01-27 Thread Gopal Agarwal
Hi,

Any news on this?


On Fri, Jan 17, 2014 at 1:54 AM, Gopal Agarwal gopal.agarw...@gmail.comwrote:

 Sounds perfect. Hopefully one of the committer picks this up and adds this
 to 4.7.

 Will keep checking the updates...


 On Fri, Jan 17, 2014 at 1:17 AM, Allison, Timothy B. 
 talli...@mitre.orgwrote:

  And don’t forget analysis! J



 The code is non-trivial, and it will take a generous committer to help me
 get it into shape for committing.  Once I push my mods to jira (end of next
 week), you should be able to compile it and run it at least for dev/testing
 to confirm that it meets your needs.



 *From:* Gopal Agarwal [mailto:gopal.agarw...@gmail.com]
 *Sent:* Thursday, January 16, 2014 1:21 PM
 *To:* dev@lucene.apache.org
 *Subject:* Re: Span Not Queries



 Thanks Tim. This exactly fits my requirements of recursion, SpanNot and
 ComplexParser combination with Boolean Parser.



 Since I would end up doing the exact same changes to my QueryParserBase
 class, I would be locked with the current version of SOLR for forseeable
 future.



 Can you comment on when is the possible release if it gets reviewed by
 next week?





 On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. talli...@mitre.org
 wrote:

 Apologies for the self-promotion…LUCENE-5205 and its Solr cousin
 (SOLR-5410) might help.  I’m hoping to post updates to both by the end of
 next week.  Then, if a committer would be willing to review and add these
 to Lucene/Solr, you should be good to go.



 Take a look at the description for LUCENE-5205and see if that capability
 will meet your needs.  Thank you.



   Best,



  Tim



 *From:* Gopal Agarwal [mailto:gopal.agarw...@gmail.com]
 *Sent:* Thursday, January 16, 2014 4:10 AM
 *To:* dev@lucene.apache.org
 *Subject:* Fwd: Span Not Queries



 Please help me out with earlier query.



 In short:

 1. Can we change the QueryParser.jj file to identify the SpanNot query
 as a boolean clause?



 2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT
 queries also?



 For further explanation, following are the examples.



 On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar ankitthemight...@gmail.com
 wrote:

 *I have a business use case in which i need to use Span Not and
 other ordered proximity queries . And they can be nested upto any level
 A Boolean inside a ordered query or ordered query inside a Boolean
  . Currently i am thinking of changing the QuerParser.jj file to identify
 the SpanNot query and use Complex Phrase Query Parser of Lucene for
 parsing
 complex queries . Can you suggest better way of achieving this.*

 *Following are the list of additions that i need to do in SOLR.*

 *1. Span NOT Operator*  .

 2.Adding Recursive and Range Proximity

   *Recursive Proximity *is a proximity query within a proximity query

 Ex:   “ “income tax”~5   statement” ~4  The recursion can be up to any
 level.

 * Range Proximity*: Currently we can only define number as a range we
 want interval as a range .

 Ex: “profit income”~3,5,  “United America”~-5,4



 3. Complex  Queries

 A complex query is a query formed with a combination of Boolean operators
 or proximity queries or range queries or any possible combination of
 these.

 Ex:“(income AND tax) statement”~4

   “ “income tax”~4  (statement OR period) ”~3

   (“ income” SPAN NOT  “income tax” ) source ~3,5

  Can anyone suggest us some way of achieving these 3 functionalities in
 SOLR
  ???


 On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky j...@basetechnology.com
 wrote:


  Nope. But the LucidWorks Search product query parser does support
 SpanNot
  if you use their BEFORE, AFTER, and NEAR span operators.
 
  See:

  http://docs.lucidworks.com/**display/lweug/Proximity+**Operations
 http://docs.lucidworks.com/display/lweug/Proximity+Operations

 
  For example: George BEFORE:2 Bush NOT H to match George anything
 Bush,
  but not George H. W. Bush.
 
  What is your specific use case?
 
  -- Jack Krupansky
 
  -Original Message- From: Ankit Kumar
  Sent: Tuesday, October 15, 2013 3:58 AM
  To: solr-u...@lucene.apache.org
  Subject: Span Not Queries
 
 
  I need to add Span Not queries in solr . Ther's a parser Surround Query
  Parser  i went through this (

  http://lucene.472066.n3.**nabble.com/Surround-query-**
  parser-not-working-td4075066.**html
 http://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html
 

  )
  to discover that surround query parser does not analyze text
 
  Does DisMaxQueryParser supports SpanNot Queries ??
 











[jira] [Updated] (SOLR-5473) Make one state.json per collection

2014-01-27 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473.patch

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2014-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882719#comment-13882719
 ] 

Jan Høydahl commented on SOLR-2366:
---

bq. So for a facet.range.start=0, facet.range.end=1000, 
facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 
TO 100}, [100 TO 1000}.

[~tedsullivan], I am not in favor of a list of relative gaps, I think it is 
user unfriendly. That's why I suggested a new facet.range.spec or something 
like Hoss' facet.range.buckets. But if you for some reason wish to extend the 
gap parameter, I guess it needs to remain relative gaps since that is kind of 
implied in the wording?

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Jan Høydahl
Welcome Benson!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. jan. 2014 kl. 22:40 skrev Michael McCandless luc...@mikemccandless.com:

 I'm pleased to announce that Benson Margulies has accepted to join our
 ranks as a committer.
 
 Benson has been involved in a number of Lucene/Solr issues over time
 (see 
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
 ), most recently on debugging tricky analysis issues.
 
 Benson, it is tradition that you introduce yourself with a brief bio.
 I know you're heavily involved in other Apache projects already...
 
 Once your account is set up, you should then be able to add yourself
 to the who we are page on the website as well.
 
 Congratulations and welcome!
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Dana Sava (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882723#comment-13882723
 ] 

Dana Sava commented on SOLR-4470:
-

Hello,
We are currently using SOLR 4.5.1 in our production environment and we tried to 
setup security on a SOLR cloud configuration. I have read all the 4470 issue 
activity and it will be very useful for us to be able to download the 
SOLR-4470_branch_4x_r1452629.patch already compiled from some place, until the 
4.7 version is released. Can somebody help me with this issue?
Thank you,
Dana





 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 4.6.1 RC4

2014-01-27 Thread Simon Willnauer
I guess this vote passed!

On Sat, Jan 25, 2014 at 1:15 AM, Andi Vajda va...@osafoundation.org wrote:

 On Thu, 23 Jan 2014, Mark Miller wrote:

 Sorry - watch out for that link - I?m seeing the text correctly, but the
 underlying link incorrectly when I look at it in my send folder. The evils
 of html mail I guess.


 +1

 PyLucene built from branch_4x's rev 1560866 passes all its tests.

 Andi..


 To be sure you have the right artifacts, make sure you are looking at the
 following location:

 http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/

 - Mark

 On Jan 23, 2014, at 9:57 PM, Mark Miller markrmil...@gmail.com wrote:

 Here we go, hopefully for that last time now?thanks everyone for bearing
 with us.


 Please vote to release the following artifacts:

 http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/

 Here is my +1.

 SUCCESS! [0:56:37.409716]

 --
 - Mark




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Is there any way to lucene index incremented indexing or updated

2014-01-27 Thread mugeesh
I had made index around 1 TB data. the problem is that i want to update or
add more data in my lucene database . is there any way to add or re-index
lucene Db ..Please give me some suggestion.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-any-way-to-lucene-index-incremented-indexing-or-updated-tp4113691.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Is there any way to lucene index incremented indexing or updated

2014-01-27 Thread Michael McCandless
Could you re-ask this on java-u...@lucene.apache.org?  This list is
for making changes to Lucene/Solr's source code ... thanks.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 27, 2014 at 6:15 AM, mugeesh muge...@hitechpeople.in wrote:
 I had made index around 1 TB data. the problem is that i want to update or
 add more data in my lucene database . is there any way to add or re-index
 lucene Db ..Please give me some suggestion.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-any-way-to-lucene-index-incremented-indexing-or-updated-tp4113691.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters

2014-01-27 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5418:
--

 Summary: Don't use .advance on costly (e.g. distance range facets) 
filters
 Key: LUCENE-5418
 URL: https://issues.apache.org/jira/browse/LUCENE-5418
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.7


If you use a distance filter today (see 
http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html 
), then drill down on one of those ranges, under the hood Lucene is using 
.advance on the Filter, which is very costly because we end up computing 
distance on (possibly many) hits that don't match the query.

It's better performance to find the hits matching the Query first, and then 
check the filter.

FilteredQuery can already do this today, when you use its 
QUERY_FIRST_FILTER_STRATEGY.  This essentially accomplishes the same thing as 
Solr's post filters (I think?) but with a far simpler/better/less code 
approach.

E.g., I believe ElasticSearch uses this API when it applies costly filters.

Longish term, I think  Query/Filter ought to know itself that it's expensive, 
and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. 
ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed 
to IndexSearcher.search, we should also be smart here and not call .advance 
on such clauses.  But that'd be a biggish change ... so for today the 
workaround is the user must carefully construct the FilteredQuery themselves.

In the mean time, as another workaround, I want to fix DrillSideways so that 
when you drill down on such filters it doesn't use .advance; this should give a 
good speedup for the normal path API usage with a costly filter.

I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I 
plan to merge this back to trunk / 4.7.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Liberating DirectPostingFormat from Codec

2014-01-27 Thread Benson Margulies
What do we have for a benchmark framework that is used to
justify/qualify speed-related things? One way forward would be to see
what a quantified measurement shows from the idea I have in mind, and
use that to facilitate deciding if this belongs in the tree.

On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com wrote:
 Keeping things in memory and not re-reading them from disk is what
 really sang the song for us. Even if the initial read-in was more
 costly due to decompression, the long-term amortized benefit of not
 re-reading would still be a big winner.


 On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote:
 well the Directory layer likely isnt what probably makes DirectPF faster for
 you. Its probably the fact it does no compression at all...


 On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com
 wrote:

 On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote:
  That would be Directory :)

 Oh,  how embarrassing. I could have written a custom directory to begin
 with.

 Would a Directory class for this purpose be an interesting patch, in
 that case? I'm not discontented about building a Directory into our
 application, but it seems like I might not be the only person to find
 this useful.

 
 
  On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies
  bimargul...@gmail.com
  wrote:
 
  I've had very gratifying results using the DirectPostingFormat to
  speed up queries when I had a read-only index with plenty of memory.
  The only downside was the need to specify it within the Codec, and
  thus write it into the index.
 
  Ever since, I've wondered if we could change things to introduce the
  same goodness without building it into the codec.
 
  Very roughly, I'm imagining an option in the IndexReader to provide an
  object that can surround the codec that is called for in the stored
  format.
 
  Is this an old question? Is it worth sketching a patch?
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Liberating DirectPostingFormat from Codec

2014-01-27 Thread Michael McCandless
Hi Benson,

I use the code from luceneutil
(https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I
run those scripts nightly for the nightly benchmarks:
http://people.apache.org/~mikemccand/lucenebench

But, that's the Wikipedia corpus, and has no real queries, and the
scripts are quite challenging to get working ... if you have access to
more realistic corpus + queries, even if you can't share it, those
results are also interesting to share.

I think it would be neat if an app could retroactively pick DirectPF
at search time, or more generally pass search-time parameters when
initializing codec components (I think there was a discussion about
this at some point but I can't remember what the use case was).
Today, any and all choices must be written into the index and cannot
be changed at search time, which is somewhat silly/restrictive for
DirectPF since it can wrap any other PF and act as simply a fast
cache on top of the postings.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies bimargul...@gmail.com wrote:
 What do we have for a benchmark framework that is used to
 justify/qualify speed-related things? One way forward would be to see
 what a quantified measurement shows from the idea I have in mind, and
 use that to facilitate deciding if this belongs in the tree.

 On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 Keeping things in memory and not re-reading them from disk is what
 really sang the song for us. Even if the initial read-in was more
 costly due to decompression, the long-term amortized benefit of not
 re-reading would still be a big winner.


 On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote:
 well the Directory layer likely isnt what probably makes DirectPF faster for
 you. Its probably the fact it does no compression at all...


 On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com
 wrote:

 On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote:
  That would be Directory :)

 Oh,  how embarrassing. I could have written a custom directory to begin
 with.

 Would a Directory class for this purpose be an interesting patch, in
 that case? I'm not discontented about building a Directory into our
 application, but it seems like I might not be the only person to find
 this useful.

 
 
  On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies
  bimargul...@gmail.com
  wrote:
 
  I've had very gratifying results using the DirectPostingFormat to
  speed up queries when I had a read-only index with plenty of memory.
  The only downside was the need to specify it within the Codec, and
  thus write it into the index.
 
  Ever since, I've wondered if we could change things to introduce the
  same goodness without building it into the codec.
 
  Very roughly, I'm imagining an option in the IndexReader to provide an
  object that can surround the codec that is called for in the stored
  format.
 
  Is this an old question? Is it worth sketching a patch?
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Liberating DirectPostingFormat from Codec

2014-01-27 Thread Benson Margulies
On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Hi Benson,

 I use the code from luceneutil
 (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I
 run those scripts nightly for the nightly benchmarks:
 http://people.apache.org/~mikemccand/lucenebench

 But, that's the Wikipedia corpus, and has no real queries, and the
 scripts are quite challenging to get working ... if you have access to
 more realistic corpus + queries, even if you can't share it, those
 results are also interesting to share.

 I think it would be neat if an app could retroactively pick DirectPF
 at search time, or more generally pass search-time parameters when
 initializing codec components (I think there was a discussion about
 this at some point but I can't remember what the use case was).
 Today, any and all choices must be written into the index and cannot
 be changed at search time, which is somewhat silly/restrictive for
 DirectPF since it can wrap any other PF and act as simply a fast
 cache on top of the postings.

Well, that's where I thought I was starting: an API into the reader
that allows DirectPF to be injected as a wrapper around others. I
haven't had time to follow Rob's bread-crumb trail to see if this is
straightforward by customizing Directory -- thought it occurs to me
that we have many directories, and it would useful to be able to do
this regardless.

I may be able to share a data set, I'll check into that today.




 Mike McCandless

 http://blog.mikemccandless.com


 On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies bimargul...@gmail.com 
 wrote:
 What do we have for a benchmark framework that is used to
 justify/qualify speed-related things? One way forward would be to see
 what a quantified measurement shows from the idea I have in mind, and
 use that to facilitate deciding if this belongs in the tree.

 On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 Keeping things in memory and not re-reading them from disk is what
 really sang the song for us. Even if the initial read-in was more
 costly due to decompression, the long-term amortized benefit of not
 re-reading would still be a big winner.


 On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote:
 well the Directory layer likely isnt what probably makes DirectPF faster 
 for
 you. Its probably the fact it does no compression at all...


 On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com
 wrote:

 On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote:
  That would be Directory :)

 Oh,  how embarrassing. I could have written a custom directory to begin
 with.

 Would a Directory class for this purpose be an interesting patch, in
 that case? I'm not discontented about building a Directory into our
 application, but it seems like I might not be the only person to find
 this useful.

 
 
  On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies
  bimargul...@gmail.com
  wrote:
 
  I've had very gratifying results using the DirectPostingFormat to
  speed up queries when I had a read-only index with plenty of memory.
  The only downside was the need to specify it within the Codec, and
  thus write it into the index.
 
  Ever since, I've wondered if we could change things to introduce the
  same goodness without building it into the codec.
 
  Very roughly, I'm imagining an option in the IndexReader to provide an
  object that can surround the codec that is called for in the stored
  format.
 
  Is this an old question? Is it worth sketching a patch?
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Stefan Matheis
Welcome Benson :)  


On Monday, January 27, 2014 at 10:57 AM, Jan Høydahl wrote:

 Welcome Benson!
  
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com (http://www.cominvent.com)
  
 25. jan. 2014 kl. 22:40 skrev Michael McCandless luc...@mikemccandless.com 
 (mailto:luc...@mikemccandless.com):
  
  I'm pleased to announce that Benson Margulies has accepted to join our
  ranks as a committer.
   
  Benson has been involved in a number of Lucene/Solr issues over time
  (see 
  http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
  ), most recently on debugging tricky analysis issues.
   
  Benson, it is tradition that you introduce yourself with a brief bio.
  I know you're heavily involved in other Apache projects already...
   
  Once your account is set up, you should then be able to add yourself
  to the who we are page on the website as well.
   
  Congratulations and welcome!
   
  Mike McCandless
   
  http://blog.mikemccandless.com
   
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
  (mailto:dev-unsubscr...@lucene.apache.org)
  For additional commands, e-mail: dev-h...@lucene.apache.org 
  (mailto:dev-h...@lucene.apache.org)
   
  
  
  
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
 (mailto:dev-unsubscr...@lucene.apache.org)
 For additional commands, e-mail: dev-h...@lucene.apache.org 
 (mailto:dev-h...@lucene.apache.org)
  
  




Re: Liberating DirectPostingFormat from Codec

2014-01-27 Thread Michael McCandless
On Mon, Jan 27, 2014 at 7:23 AM, Benson Margulies bimargul...@gmail.com wrote:
 On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Hi Benson,

 I use the code from luceneutil
 (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I
 run those scripts nightly for the nightly benchmarks:
 http://people.apache.org/~mikemccand/lucenebench

 But, that's the Wikipedia corpus, and has no real queries, and the
 scripts are quite challenging to get working ... if you have access to
 more realistic corpus + queries, even if you can't share it, those
 results are also interesting to share.

 I think it would be neat if an app could retroactively pick DirectPF
 at search time, or more generally pass search-time parameters when
 initializing codec components (I think there was a discussion about
 this at some point but I can't remember what the use case was).
 Today, any and all choices must be written into the index and cannot
 be changed at search time, which is somewhat silly/restrictive for
 DirectPF since it can wrap any other PF and act as simply a fast
 cache on top of the postings.

 Well, that's where I thought I was starting: an API into the reader
 that allows DirectPF to be injected as a wrapper around others. I
 haven't had time to follow Rob's bread-crumb trail to see if this is
 straightforward by customizing Directory -- thought it occurs to me
 that we have many directories, and it would useful to be able to do
 this regardless.

I'm not sure how a custom Directory applies here ... maybe Rob can clarify?

 I may be able to share a data set, I'll check into that today.

Cool!

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Areek Zillur as Lucene/Solr committer!

2014-01-27 Thread Stefan Matheis
Welcome Areek :) 


On Tuesday, January 21, 2014 at 8:26 PM, Robert Muir wrote:

 I'm pleased to announce that Areek Zillur has accepted to join our ranks as a 
 committer.
 
 Areek has been improving suggester support in Lucene and Solr, including a 
 revamped Solr component slated for the 4.7 release. [1]
 
 Areek, it is tradition that you introduce yourself with a brief bio.
 
 Once your account is setup, you should then be able to add yourself to the 
 who we are page on the website as well.
 
 Congratulations and welcome!
 
 [1] https://issues.apache.org/jira/browse/SOLR-5378
 



[jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Resolved a memory leak when the bjoin is used with cache autowarming.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will 

Jetty version should go in CHANGES.TXT

2014-01-27 Thread Jan Høydahl
Hi,

I'd argue that Jetty can be said to be a major component of Solr, so I suggest 
we add Jetty version under the section Versions of Major Components in Solr's 
CHANGES.TXT ?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 4.6.1 RC4

2014-01-27 Thread Mark Miller
Thanks everyone for voting. It’s been 72 hours, the vote has passed.

- Mark

http://about.me/markrmiller

On Jan 23, 2014, at 9:57 PM, Mark Miller markrmil...@gmail.com wrote:

 Here we go, hopefully for that last time now…thanks everyone for bearing with 
 us.
 
 Please vote to release the following artifacts:
 
 http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/
 
 Here is my +1.
 
 SUCCESS! [0:56:37.409716]
 
 --
 - Mark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

2014-01-27 Thread Dorin Oltean (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\\ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
code 400

To make it work i have to put in fornt of the query nother '\'
\\ujb
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}\\ujb{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

2014-01-27 Thread Dorin Oltean (JIRA)
Dorin Oltean created SOLR-5669:
--

 Summary: queries containing \u  return error: Truncated unicode 
escape sequence.
 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor


When I do the following query:
/select?q=\ujb

I get 
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
code 400

To make it work i have to put in fornt of the query nother '\'
\\ujb
wich in fact leads to a different query in solr.

I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

2014-01-27 Thread Dorin Oltean (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{noformat}\\ujb{noformat}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}{noformat}\\ujb{noformat}{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {noformat}\\ujb{noformat}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

2014-01-27 Thread Dorin Oltean (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}{noformat}\\ujb{noformat}{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\ \ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}{noformat}\\ujb{noformat}{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

2014-01-27 Thread Dorin Oltean (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\ \ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\\ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}\ \ujb{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Per Steffensen (JIRA)
Per Steffensen created SOLR-5670:


 Summary: _version_ either indexed OR docvalue
 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen


As far as I can see there is no good reason to require that _version_ field 
has to be indexed if it is docvalued. So I guess it will be ok with a rule 
saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-5670:
-

Attachment: SOLR-5670.patch

Simple patch attached. No testes of it added, but I have seen it working 
locally.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882811#comment-13882811
 ] 

Per Steffensen edited comment on SOLR-5670 at 1/27/14 3:38 PM:
---

Simple patch attached. No testes of it added, but I have seen it working 
locally. 4.4.0 test-suite is green with this change. Do not know if branch_4x 
test-suite is.


was (Author: steff1193):
Simple patch attached. No testes of it added, but I have seen it working 
locally.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5670:
---

Attachment: SOLR-5670.patch

From a design perspective, I can't claim to know whether this is an acceptable 
patch or not.  Consistency in configs across multiple users and multiple 
versions does have some value, which is a very minor argument against this 
change.

Is there any benchmark data? If docValues provides better performance for 
_version_ than indexed when it is used for its intended purpose, it might be 
worth changing the example config ... but people should know that if they *do* 
change the config on this field, they will have to completely reindex.

This patch is functionally identical to the previous one, it just modifies an 
error message.  I didn't check to see what branch Per's patch was created on, 
but it did apply cleanly to branch_4x.  This patch is against that branch.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882899#comment-13882899
 ] 

Shawn Heisey edited comment on SOLR-5670 at 1/27/14 3:41 PM:
-

From a design perspective, I can't claim to know whether this is an acceptable 
patch or not.  Consistency in configs across multiple users and multiple 
versions does have some value, which is a very minor argument against this 
change.

Is there any benchmark data? If docValues provides better performance for 
\_version\_ than indexed when it is used for its intended purpose, it might be 
worth changing the example config ... but people should know that if they *do* 
change the config on this field, they will have to completely reindex.

This patch is functionally identical to the previous one, it just modifies an 
error message.  I didn't check to see what branch Per's patch was created on, 
but it did apply cleanly to branch_4x.  This patch is against that branch.


was (Author: elyograg):
From a design perspective, I can't claim to know whether this is an acceptable 
patch or not.  Consistency in configs across multiple users and multiple 
versions does have some value, which is a very minor argument against this 
change.

Is there any benchmark data? If docValues provides better performance for 
_version_ than indexed when it is used for its intended purpose, it might be 
worth changing the example config ... but people should know that if they *do* 
change the config on this field, they will have to completely reindex.

This patch is functionally identical to the previous one, it just modifies an 
error message.  I didn't check to see what branch Per's patch was created on, 
but it did apply cleanly to branch_4x.  This patch is against that branch.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Kranti Parisa
does this also applicable for the hjoin?


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote:


  [
 https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Joel Bernstein updated SOLR-4787:
 -

 Attachment: SOLR-4787.patch

 Resolved a memory leak when the bjoin is used with cache autowarming.

  Join Contrib
  
 
  Key: SOLR-4787
  URL: https://issues.apache.org/jira/browse/SOLR-4787
  Project: Solr
   Issue Type: New Feature
   Components: search
 Affects Versions: 4.2.1
 Reporter: Joel Bernstein
 Priority: Minor
  Fix For: 4.7
 
  Attachments: SOLR-4787-deadlock-fix.patch,
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4797-hjoin-multivaluekeys-trunk.patch
 
 
  This contrib provides a place where different join implementations can
 be contributed to Solr. This contrib currently includes 3 join
 implementations. The initial patch was generated from the Solr 4.3 tag.
 Because of changes in the FieldCache API this patch will only build with
 Solr 4.2 or above.
  *HashSetJoinQParserPlugin aka hjoin*
  The hjoin provides a join implementation that filters results in one
 core based on the results of a search in another core. This is similar in
 functionality to the JoinQParserPlugin but the implementation differs in a
 couple of important ways.
  The first way is that the hjoin is designed to work with int and long
 join keys only. So, in order to use hjoin, int or long join keys must be
 included in both the to and from core.
  The second difference is that the hjoin builds memory structures that
 are used to quickly connect the join keys. So, the hjoin will need more
 memory then the JoinQParserPlugin to perform the join.
  The main advantage of the hjoin is that it can scale to join millions of
 keys between cores and provide sub-second response time. The hjoin should
 work well with up to two million results from the fromIndex and tens of
 millions of results from the main query.
  The hjoin supports the following features:
  1) Both lucene query and PostFilter implementations. A *cost*  99
 will turn on the PostFilter. The PostFilter will typically outperform the
 Lucene query when the main query results have been narrowed down.
  2) With the lucene query implementation there is an option to build the
 filter with threads. This can greatly improve the performance of the query
 if the main query index is very large. The threads parameter turns on
 threading. For example *threads=6* will use 6 threads to build the filter.
 This will setup a fixed threadpool with six threads to handle all hjoin
 requests. Once the threadpool is created the hjoin will always use it to
 build the filter. Threading does not come into play with the PostFilter.
  3) The *size* local parameter can be used to set the initial size of the
 hashset used to perform the join. If this is set above the number of
 results from the fromIndex then the you can avoid hashset resizing which
 improves performance.
  4) Nested filter queries. The local parameter fq can be used to nest a
 filter query within the join. The nested fq will filter the results of the
 join query. This can point to another join to support nested joins.
  5) Full caching support for the lucene query implementation. The
 filterCache and queryResultCache should work properly even with deep
 nesting of joins. Only the queryResultCache comes into play with the
 PostFilter implementation because PostFilters are not cacheable in the
 filterCache.
  The syntax of the hjoin is similar to the JoinQParserPlugin except that
 the plugin is referenced by the string hjoin rather then join.
  fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
 fq=$qq\}user:customer1qq=group:5
  The example filter query above will search the fromIndex (collection2)
 for user:customer1 applying the local fq parameter to filter the results.
 The lucene filter query will be built using 6 threads. This query will
 generate a list of values from the from field that will be used to filter
 the main query. Only records from the main query, where the to field is
 present in the from list will be included in the results.
  The solrconfig.xml in the main query core must contain the reference to
 the hjoin.
  queryParser name=hjoin
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
  And the join contrib lib jars must be registed in the solrconfig.xml.
   lib dir=../../../contrib/joins/lib regex=.*\.jar /
  After issuing the ant dist command from inside 

[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added

2014-01-27 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882849#comment-13882849
 ] 

Erik Hatcher commented on SOLR-5658:


[~markmil...@gmail.com] Is this ticket complete as of Solr 4.6.1?  Just 
wondering if it can be closed.  Thanks!

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 384553 

Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Joel Bernstein
Kranti,

The memory leak in the bjoin dealt with the multi-value field joins.
Specifically how the new UninvertedIntField cache was used in the bjoin. In
a quick review of the hjoin I'm not seeing the same issue but it would be
good to confirm through testing.

Joel

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa kranti.par...@gmail.comwrote:

 does this also applicable for the hjoin?


 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote:


  [
 https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Joel Bernstein updated SOLR-4787:
 -

 Attachment: SOLR-4787.patch

 Resolved a memory leak when the bjoin is used with cache autowarming.

  Join Contrib
  
 
  Key: SOLR-4787
  URL: https://issues.apache.org/jira/browse/SOLR-4787
  Project: Solr
   Issue Type: New Feature
   Components: search
 Affects Versions: 4.2.1
 Reporter: Joel Bernstein
 Priority: Minor
  Fix For: 4.7
 
  Attachments: SOLR-4787-deadlock-fix.patch,
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4797-hjoin-multivaluekeys-trunk.patch
 
 
  This contrib provides a place where different join implementations can
 be contributed to Solr. This contrib currently includes 3 join
 implementations. The initial patch was generated from the Solr 4.3 tag.
 Because of changes in the FieldCache API this patch will only build with
 Solr 4.2 or above.
  *HashSetJoinQParserPlugin aka hjoin*
  The hjoin provides a join implementation that filters results in one
 core based on the results of a search in another core. This is similar in
 functionality to the JoinQParserPlugin but the implementation differs in a
 couple of important ways.
  The first way is that the hjoin is designed to work with int and long
 join keys only. So, in order to use hjoin, int or long join keys must be
 included in both the to and from core.
  The second difference is that the hjoin builds memory structures that
 are used to quickly connect the join keys. So, the hjoin will need more
 memory then the JoinQParserPlugin to perform the join.
  The main advantage of the hjoin is that it can scale to join millions
 of keys between cores and provide sub-second response time. The hjoin
 should work well with up to two million results from the fromIndex and tens
 of millions of results from the main query.
  The hjoin supports the following features:
  1) Both lucene query and PostFilter implementations. A *cost*  99
 will turn on the PostFilter. The PostFilter will typically outperform the
 Lucene query when the main query results have been narrowed down.
  2) With the lucene query implementation there is an option to build the
 filter with threads. This can greatly improve the performance of the query
 if the main query index is very large. The threads parameter turns on
 threading. For example *threads=6* will use 6 threads to build the filter.
 This will setup a fixed threadpool with six threads to handle all hjoin
 requests. Once the threadpool is created the hjoin will always use it to
 build the filter. Threading does not come into play with the PostFilter.
  3) The *size* local parameter can be used to set the initial size of
 the hashset used to perform the join. If this is set above the number of
 results from the fromIndex then the you can avoid hashset resizing which
 improves performance.
  4) Nested filter queries. The local parameter fq can be used to nest
 a filter query within the join. The nested fq will filter the results of
 the join query. This can point to another join to support nested joins.
  5) Full caching support for the lucene query implementation. The
 filterCache and queryResultCache should work properly even with deep
 nesting of joins. Only the queryResultCache comes into play with the
 PostFilter implementation because PostFilters are not cacheable in the
 filterCache.
  The syntax of the hjoin is similar to the JoinQParserPlugin except that
 the plugin is referenced by the string hjoin rather then join.
  fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
 fq=$qq\}user:customer1qq=group:5
  The example filter query above will search the fromIndex (collection2)
 for user:customer1 applying the local fq parameter to filter the results.
 The lucene filter query will be built using 6 threads. This query will
 generate a list of values from the from field that will be used to filter
 the main query. Only records from the main query, where the to field is
 

[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882884#comment-13882884
 ] 

ASF subversion and git services commented on SOLR-5671:
---

Commit 1561711 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561711 ]

SOLR-5671: increase logging to try and track down test failure (merged trunk 
r1561709)

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Kranti Parisa
Congratulations Benson!

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Jan 27, 2014 at 9:32 AM, Alan Woodward a...@flax.co.uk wrote:

 Congratulations and welcome, Benson!

  Alan Woodward
 www.flax.co.uk


 On 26 Jan 2014, at 17:43, Shawn Heisey wrote:

 On 1/25/2014 2:40 PM, Michael McCandless wrote:

 I'm pleased to announce that Benson Margulies has accepted to join our

 ranks as a committer.


 Benson has been involved in a number of Lucene/Solr issues over time

 (see
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies

 ), most recently on debugging tricky analysis issues.


 Congratulations and welcome!  One more to try and keep me in line.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882879#comment-13882879
 ] 

ASF subversion and git services commented on LUCENE-5414:
-

Commit 1561708 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561708 ]

LUCENE-5414: intellij config (merged trunk r1561707)

 suggest module should not depend on expression module
 -

 Key: LUCENE-5414
 URL: https://issues.apache.org/jira/browse/LUCENE-5414
 Project: Lucene - Core
  Issue Type: Wish
Affects Versions: 4.6, 5.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, 
 LUCENE-5414.patch


 Currently our suggest module depends on the expression module just because 
 the DocumentExpressionDictionary provides some util ctor to pass in an 
 expression directly. That is a lot of dependency for little value IMO and 
 pulls in lots of JARs. DocumentExpressionDictionary should only take a 
 ValueSource instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Alan Woodward
Congratulations and welcome, Benson!

Alan Woodward
www.flax.co.uk


On 26 Jan 2014, at 17:43, Shawn Heisey wrote:

 On 1/25/2014 2:40 PM, Michael McCandless wrote:
 I'm pleased to announce that Benson Margulies has accepted to join our
 ranks as a committer.
 
 Benson has been involved in a number of Lucene/Solr issues over time
 (see 
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
 ), most recently on debugging tricky analysis issues.
 
 Congratulations and welcome!  One more to try and keep me in line.
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 



[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882881#comment-13882881
 ] 

ASF subversion and git services commented on SOLR-5671:
---

Commit 1561709 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1561709 ]

SOLR-5671: increase logging to try and track down test failure

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Jetty version should go in CHANGES.TXT

2014-01-27 Thread Koji Sekiguchi

+1

koji
--
http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html

(14/01/27 21:44), Jan Høydahl wrote:

Hi,

I'd argue that Jetty can be said to be a major component of Solr, so I suggest we add 
Jetty version under the section Versions of Major Components in Solr's 
CHANGES.TXT ?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5652:
-

Description: 
Several times now, Uwe's jenkins has encountered a walk already seen ... 
assertion failure from DistribCursorPagingTest that I've been unable to fathom, 
let alone reproduce (although sarowe was able to trigger a similar, 
non-reproducible seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)

  was:
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Kranti Parisa
Thanks Joel. I shall look into that.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Jan 27, 2014 at 10:19 AM, Joel Bernstein joels...@gmail.com wrote:

 Kranti,

 The memory leak in the bjoin dealt with the multi-value field joins.
 Specifically how the new UninvertedIntField cache was used in the bjoin. In
 a quick review of the hjoin I'm not seeing the same issue but it would be
 good to confirm through testing.

 Joel

 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa 
 kranti.par...@gmail.comwrote:

 does this also applicable for the hjoin?


 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) 
 j...@apache.orgwrote:


  [
 https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Joel Bernstein updated SOLR-4787:
 -

 Attachment: SOLR-4787.patch

 Resolved a memory leak when the bjoin is used with cache autowarming.

  Join Contrib
  
 
  Key: SOLR-4787
  URL: https://issues.apache.org/jira/browse/SOLR-4787
  Project: Solr
   Issue Type: New Feature
   Components: search
 Affects Versions: 4.2.1
 Reporter: Joel Bernstein
 Priority: Minor
  Fix For: 4.7
 
  Attachments: SOLR-4787-deadlock-fix.patch,
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
 SOLR-4797-hjoin-multivaluekeys-trunk.patch
 
 
  This contrib provides a place where different join implementations can
 be contributed to Solr. This contrib currently includes 3 join
 implementations. The initial patch was generated from the Solr 4.3 tag.
 Because of changes in the FieldCache API this patch will only build with
 Solr 4.2 or above.
  *HashSetJoinQParserPlugin aka hjoin*
  The hjoin provides a join implementation that filters results in one
 core based on the results of a search in another core. This is similar in
 functionality to the JoinQParserPlugin but the implementation differs in a
 couple of important ways.
  The first way is that the hjoin is designed to work with int and long
 join keys only. So, in order to use hjoin, int or long join keys must be
 included in both the to and from core.
  The second difference is that the hjoin builds memory structures that
 are used to quickly connect the join keys. So, the hjoin will need more
 memory then the JoinQParserPlugin to perform the join.
  The main advantage of the hjoin is that it can scale to join millions
 of keys between cores and provide sub-second response time. The hjoin
 should work well with up to two million results from the fromIndex and tens
 of millions of results from the main query.
  The hjoin supports the following features:
  1) Both lucene query and PostFilter implementations. A *cost*  99
 will turn on the PostFilter. The PostFilter will typically outperform the
 Lucene query when the main query results have been narrowed down.
  2) With the lucene query implementation there is an option to build
 the filter with threads. This can greatly improve the performance of the
 query if the main query index is very large. The threads parameter turns
 on threading. For example *threads=6* will use 6 threads to build the
 filter. This will setup a fixed threadpool with six threads to handle all
 hjoin requests. Once the threadpool is created the hjoin will always use it
 to build the filter. Threading does not come into play with the PostFilter.
  3) The *size* local parameter can be used to set the initial size of
 the hashset used to perform the join. If this is set above the number of
 results from the fromIndex then the you can avoid hashset resizing which
 improves performance.
  4) Nested filter queries. The local parameter fq can be used to nest
 a filter query within the join. The nested fq will filter the results of
 the join query. This can point to another join to support nested joins.
  5) Full caching support for the lucene query implementation. The
 filterCache and queryResultCache should work properly even with deep
 nesting of joins. Only the queryResultCache comes into play with the
 PostFilter implementation because PostFilters are not cacheable in the
 filterCache.
  The syntax of the hjoin is similar to the JoinQParserPlugin except
 that the plugin is referenced by the string hjoin rather then join.
  fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
 fq=$qq\}user:customer1qq=group:5
  The example filter query above will search the fromIndex (collection2)
 for user:customer1 applying the local fq parameter to filter the results.
 The lucene filter query 

[jira] [Commented] (LUCENE-5416) Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros()

2014-01-27 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882919#comment-13882919
 ] 

Paul Elschot commented on LUCENE-5416:
--

The last benchmark output is here: 
https://github.com/PaulElschot/lucene-solr/commit/772b55ad3c3d94752b37aa81b2e96cb50b321cf6
 ,
see from line 313 in this output, the comparisons and loads are given in 10log 
numbers.

In short:
- for advance() this is a factor of 1.7 to 4 times faster, and
- for nextDoc() this is up to 2.5 times faster, but for load factors higher 
than about 0.25 it is up to about 5 times slower.

 Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros()
 ---

 Key: LUCENE-5416
 URL: https://issues.apache.org/jira/browse/LUCENE-5416
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.0
Reporter: Paul Elschot
Priority: Minor
 Fix For: 5.0


 On my machine the current byte index used in OpenBitSetIterator is slower 
 than Long.numberOfTrailingZeros() for advance().
 The pull request contains the code for benchmarking this taken from an early 
 stage of DocBlocksIterator.
 In case the benchmark shows improvements on more machines, well, we know what 
 to do...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882873#comment-13882873
 ] 

ASF subversion and git services commented on LUCENE-5414:
-

Commit 1561707 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1561707 ]

LUCENE-5414: intellij config

 suggest module should not depend on expression module
 -

 Key: LUCENE-5414
 URL: https://issues.apache.org/jira/browse/LUCENE-5414
 Project: Lucene - Core
  Issue Type: Wish
Affects Versions: 4.6, 5.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, 
 LUCENE-5414.patch


 Currently our suggest module depends on the expression module just because 
 the DocumentExpressionDictionary provides some util ctor to pass in an 
 expression directly. That is a lot of dependency for little value IMO and 
 pulls in lots of JARs. DocumentExpressionDictionary should only take a 
 ValueSource instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread Steve Rowe (JIRA)
Steve Rowe created SOLR-5671:


 Summary: Heisenbug #2 in DistribCursorPagingTest: full walk 
returns one fewer doc than expected 
 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe


Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882885#comment-13882885
 ] 

Steve Rowe commented on SOLR-5671:
--

I committed a change to DistribCursorPagingTest that will print the details of 
the indexed doc(s) not returned by deep paging.

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2014-01-27 Thread Ted Sullivan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882890#comment-13882890
 ] 

Ted Sullivan commented on SOLR-2366:


Right. I'm following with [~shalinmangar] suggestion to split out your/Hoss's 
facet.range.spec / facet.sequence idea as a separate issue. I don't think of 
this as extending the gap parameter - I am just providing more explicit 
information in the response as to what gaps you actually get (as per your 
suggestion of Sept/2011) - similar to what you would get if you implemented 
this using facet.query. Looking at the current code, it is pretty easy to add 
the range information to the response (right now the response labels are just 
the gap starts). This may be user-unfriendly as you say, but I would argue that 
it is more friendly than what we have right now - it is certainly more 
developer-friendly because it provides better feedback. There is a lot of 
interest in this feature (it has been advertised on the SimpleFacetsParameter 
Wiki for some time now) as evidenced by earlier comments in this thread. My 
original desire was just to make (the patch) usable for those that want to use 
it by upgrading Grant's original patch so that it would work with the new(?) 
modular class organization. The work required to spiff up the facet.range.gap 
response is not large. I haven't impacted the facet.range.spec/buckets approach 
but that would seem to require more effort.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882933#comment-13882933
 ] 

Per Steffensen commented on SOLR-5670:
--

bq. Is there any benchmark data? If docValues provides better performance for 
_version_ than indexed

I do not think it will in most cases.
* Indexed: When you want to get the _version_ for a particular doc-no (found by 
id), you can make a lookup in FieldCache holding the reversed term-index - this 
is in memory and constant time. If you have a very rapidly changing data-set 
(so that FieldCache-entries will be invalidated often due to merging) you might 
get better performance (response-time) with doc-values - but not in general, I 
think.
* DocValues: You will read the _version_ from doc-values which in not 
necessarily in memory

We are prepared to take a small performance hit, to avoid having all that data 
in FieldCache. In general we do not allow putting anything in FieldCache, 
because we have so many documents, that is always creates issues with too much 
memory usage. The problem with FieldCache is that it is all or nothing - for a 
good reasons! - we just cannot live with it.

We havnt made the change on _version_ (going from indexed to doc-value) in 
production yet. We will do some performance testing on it first, and depending 
on how much we decide to do, I can get back with some numbers.

bq. when it is used for its intended purpose, it might be worth changing the 
example config 

Do not think you should do that. Using FieldCache is probably the best 
default. But writing something somewhere about the option of using doc-values 
instead of indexed, and when that is a good idea, would be nice.

bq. ... but people should know that if they do change the config on this field, 
they will have to completely reindex.

Or just start using it from now on in new collections. We create a new 
collection every month and keep a history of data by keeping the latest 24 
collections. One of many reasons for doing this, is that it provides us the 
option of changing indexing-strategy etc every month. For us re-indexing is 
completely out of the question - we have billions and billions of records in 
Solr and re-indexing them all in a fairly short service-window is not possible. 
Therefore we built this new-collection-every-month thingy in order to have some 
flexibility.

bq. This patch is functionally identical to the previous one, it just modifies 
an error message.

Nicely spotted

bq. I didn't check to see what branch Per's patch was created on, but it did 
apply cleanly to branch_4x.

It was branch_4x

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Ryan Ernst
Welcome!
On Jan 25, 2014 1:41 PM, Michael McCandless
luc...@mikemccandless.comjavascript:_e({}, 'cvml',
'luc...@mikemccandless.com');
wrote:

 I'm pleased to announce that Benson Margulies has accepted to join our
 ranks as a committer.

 Benson has been involved in a number of Lucene/Solr issues over time
 (see
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
 ), most recently on debugging tricky analysis issues.

 Benson, it is tradition that you introduce yourself with a brief bio.
 I know you're heavily involved in other Apache projects already...

 Once your account is set up, you should then be able to add yourself
 to the who we are page on the website as well.

 Congratulations and welcome!

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.orgjavascript:_e({}, 
 'cvml', 'dev-unsubscr...@lucene.apache.org');
 For additional commands, e-mail: dev-h...@lucene.apache.orgjavascript:_e({}, 
 'cvml', 'dev-h...@lucene.apache.org');




[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882832#comment-13882832
 ] 

Per Steffensen commented on SOLR-4470:
--

bq. We are currently using SOLR 4.5.1 in our production environment and we 
tried to setup security on a SOLR cloud configuration.

Container managed authentication and authorization I presume?

bq. I have read all the 4470 issue activity and it will be very useful for us 
to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled 
from some place, until the 4.7 version is released.

Guess you are looking at Fix Version/s: 4.7 on this issue, and expect that 
this means that the fix will be in 4.7. I do not believe it will - 
unfortunately. So if you want the feature, you need to change the patch 
yourself to fit the version of Solr you are using, or you can download code for 
Solr 4.4 plus numerous improvements (including SOLR-4470) here: 
https://github.com/steff1193/lucene-solr. You will have to build a Solr 
distribution yourself - and maven artifacts if you need those
* Building distribution from source
{code}
checkout
cd solr
ant -Dversion=4.4.0.myversion clean create-package
{code}
* Building and deploying artifacts is a little more complicated. Let me know if 
you need that.

*Please note* that https://github.com/steff1193/lucene-solr is only a place 
where we keep our version of Lucene/Solr, including the changes we have made 
which has not yet been committed in Apache Solr regi. You are free to use it, 
but there is no guarantee that there will ever be a version based on a Apache 
Solr version higher than 4.4. It is very likely that there will be, but no 
guarantee and you never know when it will happen. Of course it is all open 
source so if you really want you can fork it yourself.

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5658) commitWithin does not reflect the new documents added

2014-01-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5658.
---

Resolution: Fixed

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 384553 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@36137e08 
 

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5652:
-

Description: 
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)

  was:
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far (in 3 failures):
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far only seen on MacOSX
* So far only seen on branch 4x
* So far seen on both Java6 and Java7
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were both when doing a desc sorting on one of the \*_dv_last or 
\*_dv_first fields (docValues=true, either sortMissingLast=true OR 
sortMissingFirst=true) 
** sort on same field asc has always worked fine just before this (fields are 
in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked


(specifics of each failure seen in the wild recorded in comments)


Updated summary

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
 failure from DistribCursorPagingTest that I've been unable to fathom, let 
 alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing

2014-01-27 Thread Gregg Donovan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882944#comment-13882944
 ] 

Gregg Donovan commented on SOLR-797:


I'm interested in this as well. We had an custom API that was similar to the 
attached patch. When we switched to EmbeddedSolrServer we noticed an increase 
in time spent deserializing the Solr response, memory allocated, and GC 
spikiness.

One issue with the current EmbeddedSolrServer code is that it starts with 
ByteArrayOutputStream of 32 bytes and resizes repeatedly it to fit the results. 
We have large responses and we notice the GC hit. We experimented with a 
ThreadLocalByteBuffer, but avoiding serializing and parsing altogether for 
EmbeddedSolrServer seems like an even better idea.

If there's interest, we'd be happy to revive/update/test this patch.

 Construct EmbeddedSolrServer response without serializing/parsing
 -

 Key: SOLR-797
 URL: https://issues.apache.org/jira/browse/SOLR-797
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Jonathan Lee
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-797.patch, SOLR-797.patch


 Currently, the EmbeddedSolrServer serializes the response and reparses in 
 order to create the final NamedList response.  From the comment in 
 EmbeddedSolrServer.java, the goal is to:
 * convert the response directly into a named list



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882958#comment-13882958
 ] 

Steve Rowe commented on SOLR-5652:
--

bq. It looks to me like there are two problems here: 1) the same doc is showing 
up on different pages when deep paging; and 2) missing docvalue docs are sorted 
incorrectly.

I think I understand problem #2: non-multi-valued numeric and string fields are 
created (by TrieField's and StrField's createFields() methods) as 
NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these 
require each doc to have a value, which apparently defaults to zero for 
NumericDocValuesField-s and the empty string for SortedDocValueField-s.

Here are the declarations for the field types that have this problem in 
DistribCursorPagingTest (from schema-sorts.xml):

{code:xml}
fieldtype name=str_dv_last class=solr.StrField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=str_dv_first class=solr.StrField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=int_dv_last class=solr.TrieIntField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=int_dv_first class=solr.TrieIntField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=long_dv_last class=solr.TrieLongField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=long_dv_first class=solr.TrieLongField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=float_dv_last class=solr.TrieFloatField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=float_dv_first class=solr.TrieFloatField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=double_dv_last class=solr.TrieDoubleField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=double_dv_first class=solr.TrieDoubleField stored=true 
indexed=false docValues=true sortMissingFirst=true/
{code}

I think that the above declarations should by disallowed by Solr, because they 
contain docValues=true + sortMissingLast|First=true; the user is asking 
for a particular sorting behavior for missing values, when there never will be 
missing values.

Also, the Solr Ref Guide 
[says|https://cwiki.apache.org/confluence/display/solr/DocValues] about 
docvalue fields If this type is used, the field must be either required or 
have a default value, meaning every document must have a value for this field. 
 However, neither the above field types nor the fields using them are required 
or have a default specified.  Maybe this should be enforced by schema parsing?

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882963#comment-13882963
 ] 

ASF subversion and git services commented on SOLR-5666:
---

Commit 1561751 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1561751 ]

SOLR-5666: Using the hdfs write cache can result in appearance of corrupted 
index.

 Using the hdfs write cache can result in appearance of corrupted index.
 ---

 Key: SOLR-5666
 URL: https://issues.apache.org/jira/browse/SOLR-5666
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.7






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.

2014-01-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882964#comment-13882964
 ] 

ASF subversion and git services commented on SOLR-5666:
---

Commit 1561752 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561752 ]

SOLR-5666: Using the hdfs write cache can result in appearance of corrupted 
index.

 Using the hdfs write cache can result in appearance of corrupted index.
 ---

 Key: SOLR-5666
 URL: https://issues.apache.org/jira/browse/SOLR-5666
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.7






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882976#comment-13882976
 ] 

Yonik Seeley commented on SOLR-5652:


bq. NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these 
require each doc to have a value, 

Although that used to be true, it should no longer be the case: LUCENE-5178

Now one thing that does look a little fishy to me that might cause a 
problem is how things like IntComparator deals with missing values...
it simply substitutes in MAX_INT or MIN_INT when the value is missing.

If the tests here are generating random values, you might try taking out 
MAX_numeric_type, MIN_numeric_type and see if it makes a difference?


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread David Webster (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883010#comment-13883010
 ] 

David Webster commented on SOLR-4470:
-

I have to admit I find the content of this issue to be disturbing coming from 
such a major Open Source project as Solr.  I came here looking for a viable 
security solution that did not involve segmenting off the system or otherwise 
using IPsec and other IP-address centric forms of security.  For most truly 
Enterprise worthy solutions the products, themselves simply must address 
security, internally, to ever be considered truly Enterprise worth solutions.  
This product does not, and even worse, the core Dev team seems intent on NEVER 
doing so!

As the lead Java architect for Distributed Systems Engineering at a fortune 100 
company, security is my single most important concern.  I don't care how fast a 
product is, or how many slick features it has, if it isn't secure, at the core, 
it is worthless as an Enterprise solution (at least for any Enterprise that 
gives a whit about REAL security).  Solr is doomed to use as a lab experiment 
for any serious Enterprise implementation where security is more than an 
afterthought.

I like Solr.  I like what it does and how it does it.  However, it's lack of 
internal security hooks is a complete show stopper for use at my firm. So my 
choices are to internalize the code, using this patch as our starting point, 
and have our own Solr-like engine, or move on to something like ElasticSearch 
which actually cares about real security at the node to node level.

Also, Mavenize the damned thing!  Modern projects still use Ant?  I haven't 
opened a build.xml script in half a decade or more

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883023#comment-13883023
 ] 

Hoss Man commented on SOLR-5652:


bq. Although that used to be true, it should no longer be the case: LUCENE-5178

Right, see also: SOLR-5165  SOLR-5222

On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and 
he pointed out that DocValuesMissingTest uses the following...

bq. @SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot 
represent missing values

...so this may be the smoking gun to explain what's going wrong here, since we 
don't do anything like this in the cursor tests. (yet ... i'm going to fix that 
now)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4072) CharFilter that Unicode-normalizes input

2014-01-27 Thread David Goldfarb (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Goldfarb updated LUCENE-4072:
---

Attachment: 4072.patch

Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

 CharFilter that Unicode-normalizes input
 

 Key: LUCENE-4072
 URL: https://issues.apache.org/jira/browse/LUCENE-4072
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Ippei UKAI
 Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, 
 LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, 
 LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip


 I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J.
 The benefit of having this process as CharFilter is that tokenizer can work 
 on normalised text while offset-correction ensuring fast vector highlighter 
 and other offset-dependent features do not break.
 The implementation is available at following repository:
 https://github.com/ippeiukai/ICUNormalizer2CharFilter
 Unfortunately this is my unpaid side-project and cannot spend much time to 
 merge my work to Lucene to make appropriate patch. I'd appreciate it if 
 anyone could give it a go. I'm happy to relicense it to whatever that meets 
 your needs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4072) CharFilter that Unicode-normalizes input

2014-01-27 Thread David Goldfarb (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883032#comment-13883032
 ] 

David Goldfarb edited comment on LUCENE-4072 at 1/27/14 6:10 PM:
-

Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

\[edit\]
And the len = 2 clause wasn't meant to be part of the patch, ignore that.
{noformat}
if (normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
{noformat}


was (Author: dgoldfarb):
Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

 CharFilter that Unicode-normalizes input
 

 Key: LUCENE-4072
 URL: https://issues.apache.org/jira/browse/LUCENE-4072
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Ippei UKAI
 Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, 
 LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, 
 LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip


 I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J.
 The benefit of having this process as CharFilter is that tokenizer can work 
 on normalised text while offset-correction ensuring fast vector highlighter 
 and other offset-dependent features do not break.
 The implementation is available at following repository:
 https://github.com/ippeiukai/ICUNormalizer2CharFilter
 Unfortunately this is my unpaid side-project and cannot spend much time to 
 merge my work to Lucene to make appropriate patch. I'd appreciate it if 
 anyone could give it a go. I'm happy to relicense it to whatever that meets 
 your needs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883044#comment-13883044
 ] 

Hoss Man commented on SOLR-5652:


To clarify one thing: steve mentioned that it seems like there are two 
problems...

bq. It looks to me like there are two problems here: 1) the same doc is showing 
up on different pages when deep paging; and 2) missing docvalue docs are sorted 
incorrectly.

As far as #2 goes, now that we log every doc on every page, i can confirm that 
when i try some of these failed seeds (for example steves #129 log), i also see 
the incorrect ordering even though the test passes for me -- so #2 is almost 
certainly the codec issue.

that still leaves the question about #1, and what it isn't completely 
reproducible -- but that may just be an artifact of #2 (ie: if these codecs 
have non-deterministic behavior when trying to access missing values, there 
could be arbitrary data in a reused bytebuffer)

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883048#comment-13883048
 ] 

Hoss Man commented on SOLR-5652:


bq. Also, the Solr Ref Guide says about docvalue fields...

fixed.

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883053#comment-13883053
 ] 

Shawn Heisey commented on SOLR-4470:


bq. This product does not, and even worse, the core Dev team seems intent on 
NEVER doing so!

I don't know that we *never* intend on adding security.  We face a major 
problem with doing so at this time, though:  We have absolutely no idea what 
servlet container the user is going to use for running the solr war.  The 
example includes jetty, but aside from a few small edits in the stock config 
file, it is unmodified.  Solr has no control over the server-side HTTP layer 
right now, so anything we try to do will almost certainly be wrong as soon as 
the user changes containers or decides to modify their container config.

Solr 5.0 will not ship as a .war file.  The work hasn't yet been done that will 
turn it into an actual application, but it will be done before 5.0 gets 
released.  Once Solr is a real application that owns and fully controls the 
HTTP layer, security will not be such a nightmare.  You mention ElasticSearch 
and its ability to deal with security.  ES is already a standalone application, 
which means they can do a lot of things that Solr currently can't.  It's a 
legitimate complaint with Solr, one that we are trying to rectify.

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't 
opened a build.xml script in half a decade or more

I can't say anything about maven vs. ant.  I don't have enough experience with 
either.


 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread David Webster (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883085#comment-13883085
 ] 

David Webster commented on SOLR-4470:
-

Thanks, for the update, Shawn.  The move to a stand-alone implementation should 
be a good one, with hope that a robust security implementation will be at the 
very top of the priority list.  Not sure what the timeline for that is, but 
I've got a fairly short one for laying down the foundation of our Enterprise 
Search by 3rd Qtr.  That will have to pass IA muster (mainstream Solr does 
not), which still leaves me in a bit of quandary as to how to proceed.  I don't 
want the added TOC of maintaining our own search engine, but cannot wait around 
very long for viable solutions to surface, either.  I'm either going to have to 
implement this patch branch, or move on to other engine choices...

I know JBoss, JBPM specifically, used to be ant based but they've gone full 
Maven now.  This is the first big Open Source project I've run across in some 
time that still uses Ant.  Not many devs on our staff can still read a 
build.xml file anymore...and those that can would rather not...

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-01-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883114#comment-13883114
 ] 

Hoss Man commented on SOLR-5463:


bq. Some further thoughts: ...

Yonik: no disagreement from me, but since what we've got so far has already 
been committed and backported to 4x, i think it would make sense to track your 
enhancement ideas in new issues for tracking purposes (unless you think you can 
help bang these out before 4.7).


 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.7

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-01-27 Thread Steven Bower (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883117#comment-13883117
 ] 

Steven Bower commented on SOLR-5488:


I finally got a linux box at home to repro this issue (well at least a similar 
one).. I think the issue in how it identifies individual components of a query 
so that they are not duplicated throughout the query execution.. i think its 
just associating the wrong stats collectors with query components.. i've 
narrowed it down to that but not quite sure exactly where this is or why it is 
so ephemeral..

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, 
 SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Adds logParamsList parameter to support reduced l...

2014-01-27 Thread cpoerschke
GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/23

Adds logParamsList parameter to support reduced logging.

For https://issues.apache.org/jira/i#browse/SOLR-5672 add logParamsList 
parameter to support reduced logging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
branch_4x-fewer-params-logged

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/23.patch


commit e6f82c935d5f8ee6b225be41b5a6615833fc3029
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-01-24T13:17:44Z

Adds logParamsList parameter to support reduced logging.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5672) add logParamsList parameter to support reduced logging

2014-01-27 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-5672:
-

 Summary: add logParamsList parameter to support reduced logging
 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke


The use case we have is that logging full requests in each shard is just 'too 
much' but at the same time we wish to be able to tie together requests across 
shards. In certain circumstances we also wish to fully log some requests.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging

2014-01-27 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883146#comment-13883146
 ] 

Christine Poerschke commented on SOLR-5672:
---

The change https://github.com/apache/lucene-solr/pull/23 adds an new parameter.

If it is missing then behaviour will be as it is now. If it is supplied the 
following use cases are possible:
{code}
...logParamsList= # don't log any parameters
...logParamsList=q,fq # log only the q and fq parameters
{code}


 add logParamsList parameter to support reduced logging
 --

 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke

 The use case we have is that logging full requests in each shard is just 'too 
 much' but at the same time we wish to be able to tie together requests across 
 shards. In certain circumstances we also wish to fully log some requests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-01-27 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883171#comment-13883171
 ] 

Shalin Shekhar Mangar commented on SOLR-5473:
-

Some comments on the latest patch:

# AbstractFullDistribZkTestBase has a useExternalCollection() which is hard 
coded to false. Why? Can we randomize using external collections in the base 
test to have better test coverage?
# ClusterState.getCollections has a todo which says “fix later JUnit is 
failing”. Which test is failing?
# What is _stateVer_ used for? I guess it is for SOLR-5474 and not this issue?
# This patch has only whitespace related changes to CloudSolrServer.
# There is wrong formatting and incorrect spacing in the new code such as 
Overseer.createCollection, new methods in ClusterState etc. You should 
re-format all the new/modified code blocks
# There was one forbidden-api check failure where new String(byte[]) 
constructor is used in a log message. Run ant check-forbidden-apis from inside 
the solr directory.
# There are three javadoc errors (run ant precommit):
{code}
[ecj-lint] 1. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java
 (at line 199)
 [ecj-lint] /** @deprecated
 [ecj-lint]  ^^
 [ecj-lint] Javadoc: Description expected after @deprecated
 [ecj-lint] --
 [ecj-lint] 2. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java
 (at line 297)
 [ecj-lint] * @deprecated
 [ecj-lint]^^
 [ecj-lint] Javadoc: Description expected after @deprecated
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 3. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java
 (at line 759)
 [ecj-lint] * @param coll
 [ecj-lint]  
 [ecj-lint] Javadoc: Description expected after this reference
 [ecj-lint] --
 [ecj-lint] 3 problems (3 errors)
{code}

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging

2014-01-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883169#comment-13883169
 ] 

Mark Miller commented on SOLR-5672:
---

+1

 add logParamsList parameter to support reduced logging
 --

 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke

 The use case we have is that logging full requests in each shard is just 'too 
 much' but at the same time we wish to be able to tie together requests across 
 shards. In certain circumstances we also wish to fully log some requests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-01-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883195#comment-13883195
 ] 

Mark Miller commented on SOLR-5473:
---

I'm fairly busy in the short term - going out of town for a few days. But I 
intend to review this as well.

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



REMINDER: Call For Papers: ApacheCon North America 2014 -- ends Feb 1st

2014-01-27 Thread Chris Hostetter


(Note: cross posted, please keep any replies to general@lucene)

Quick reminder that the CFP for ApacheCon (Denver) ends on Saturday...

http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


Ladies and Gentlemen, start writing your proposals. The Call For Papers 
for ApacheCon North America 2014 is now open, and is open until February 
1st, 2014. Note that we are on a very short timeline this year, so don't 
assume that we'll extend the CFP, just because we've done so every time 
before.




-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883288#comment-13883288
 ] 

Shawn Heisey commented on SOLR-5670:


Reducing heap requirements by not requiring data to go into the FieldCache is a 
major win for huge indexes.  GC can be a major source of performance issues 
even if you've got garbage collection superbly tuned, and I doubt that my 
tuning parameters are perfect.


 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290
 ] 

Per Steffensen commented on SOLR-4470:
--

bq.  This product does not, and even worse, the core Dev team seems intent on 
NEVER doing so!

At least most of them, yes. It is really a shame.

bq. As the lead Java architect for Distributed Systems Engineering at a fortune 
100 company, security is my single most important concern

As the tech lead on the largest REAL SolrCloud installation on the planet, I 
agree :-) I believe I can say that we have the largest installation in the 
world for two reasons
* Upgrading from one version of SolrCloud to the next is not something that 
seem to be very important in this product. At least it is hard to do, and there 
seem to be no testing of it when a new release 4.y comes out - no testing that 
you can actually upgrade to it from 4.x. This makes me believe that no-one or 
at least only a few, have so big installations that just installing 4.y and 
store/index all data from the old 4.x installation from scratch is not an 
option. If others actually had to do upgrades where this is not possible, lots 
of complaints would pop up - and they dont
* Our biggest system stores and indexes 1-2 billion documents per day, and have 
2 years of history. That is about 1000 billion documents in Solr at any time 
with 1-2 billion going in every day (and 30-60 billion going out every month). 
To be able to run such a system we needed to do numerous optimizations, and in 
general without optimizations you will never get such a big system working. I 
do not see much talk around here about optimizations of that kind - probably 
because people have not run into the problems yet.

bq. I like Solr. I like what it does and how it does it.

Me too. On that part it actually has numerous advantages over e.g. 
ElasticSearch. We used ES to begin with, and we liked it, but for political 
reasons we where not allowed to keep using it, and we turned to find an 
alternative. At that point in time SolrCloud (4.x) where only in its startup 
phase (a year before 4.0 was released), but we believed so much in the idea 
behind, that we decided to go for it.

bq. However, it's lack of internal security hooks is a complete show stopper 
for use at my firm

For us, too. That is why we made our own fix to it - provided as a patch here 
and also available at https://github.com/steff1193/lucene-solr

bq. Using this patch as our starting point

I am happy to hear that. Please feel free to contact me if you have any 
problems making it work or understanding what it does. I might also be able to 
provide a few tips on making it extra secure :-)

bq. and have our own Solr-like engine

We made the same decision years ago. We have had our own version of Solr in our 
own VCS for years. Just recently I put the code on 
https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts) 
yet. But that will come soon. Until then you will have to build it yourself 
from source.

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't 
opened a build.xml script in half a decade or more

Already done. 
{code}
ant [-Dversion=$VERSION] get-maven-poms
{code}
Will build the maven structure in folder maven-build
E.g. if you use Eclipse
{code}
ant eclipse
{code}
In Eclipse right-click the root-folder, chose Import... and Existing Maven 
Project. Import all Maven pom.xmls from maven-build folder

bq. We have absolutely no idea what servlet container the user is going to use 
for running the solr war.

It isnt important for this issue. Protecting the HTTP endpoints with 
authentication and authorization is standardized in the servlet-spec. All 
web-containers have to live up to that standard (to be certified). Only place 
where the standardization is not very clear is how to install a realm (the 
thingy knowing about user-credentials and roles), but all containers have 
plenty of documentation on how to do it.

It is very important to understand that this issue, and the patch I provided 
will work for any web-container. This issue is not about enforcing the 
protection - let the web-container do that. This issue and the patch is ONLY 
about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, 
so that things will keep working, if/when you make the obvious security 
decision and make usage of the security-features provided to you for free by 
the container.

bq. Solr has no control over the server-side HTTP layer right now, so anything 
we try to do will almost certainly be wrong as soon as the user changes 
containers or decides to modify their container config.

NO!

bq. Solr 5.0 will not ship as a .war file

Bad idea. This is one of the points where Solr did a better decision that ES

bq.  Once Solr is a real application that owns and fully controls the HTTP 
layer, 

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290
 ] 

Per Steffensen edited comment on SOLR-4470 at 1/27/14 9:09 PM:
---

bq.  This product does not, and even worse, the core Dev team seems intent on 
NEVER doing so!

At least most of them, yes. It is really a shame.

bq. As the lead Java architect for Distributed Systems Engineering at a fortune 
100 company, security is my single most important concern

As the tech lead on the largest REAL SolrCloud installation on the planet, I 
agree :-) I believe I can say that we have the largest installation in the 
world for two reasons
* Upgrading from one version of SolrCloud to the next is not something that 
seem to be very important in this product. At least it is hard to do, and there 
seem to be no testing of it when a new release 4.y comes out - no testing that 
you can actually upgrade to it from 4.x. This makes me believe that no-one or 
at least only a few, have so big installations that just installing 4.y and 
store/index all data from the old 4.x installation from scratch is not an 
option. If others actually had to do upgrades where this is not possible, lots 
of complaints would pop up - and they dont
* Our biggest system stores and indexes 1-2 billion documents per day, and have 
2 years of history. That is about 1000 billion documents in Solr at any time 
with 1-2 billion going in every day (and 30-60 billion going out every month). 
To be able to run such a system we needed to do numerous optimizations, and in 
general without optimizations you will never get such a big system working. I 
do not see much talk around here about optimizations of that kind - probably 
because people have not run into the problems yet.

bq. I like Solr. I like what it does and how it does it.

Me too. On that part it actually has numerous advantages over e.g. 
ElasticSearch. We used ES to begin with, and we liked it, but for political 
reasons we where not allowed to keep using it, and we turned to find an 
alternative. At that point in time SolrCloud (4.x) where only in its startup 
phase (a year before 4.0 was released), but we believed so much in the idea 
behind, that we decided to go for it.

bq. However, it's lack of internal security hooks is a complete show stopper 
for use at my firm

For us, too. That is why we made our own fix to it - provided as a patch here 
and also available at https://github.com/steff1193/lucene-solr

bq. Using this patch as our starting point

I am happy to hear that. Please feel free to contact me if you have any 
problems making it work or understanding what it does. I might also be able to 
provide a few tips on making it extra secure :-)

bq. and have our own Solr-like engine

We made the same decision years ago. We have had our own version of Solr in our 
own VCS for years. Just recently I put the code on 
https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts) 
yet. But that will come soon. Until then you will have to build it yourself 
from source.

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't 
opened a build.xml script in half a decade or more

Already done. 
{code}
ant [-Dversion=$VERSION] get-maven-poms
{code}
Will build the maven structure in folder maven-build
E.g. if you use Eclipse
{code}
ant eclipse
{code}
In Eclipse right-click the root-folder, chose Import... and Existing Maven 
Project. Import all Maven pom.xmls from maven-build folder

bq. We have absolutely no idea what servlet container the user is going to use 
for running the solr war.

It isnt important for this issue. Protecting the HTTP endpoints with 
authentication and authorization is standardized in the servlet-spec. All 
web-containers have to live up to that standard (to be certified). Only place 
where the standardization is not very clear is how to install a realm (the 
thingy knowing about user-credentials and roles), but all containers have 
plenty of documentation on how to do it.

It is very important to understand that this issue, and the patch I provided 
will work for any web-container. This issue is not about enforcing the 
protection - let the web-container do that. This issue and the patch is ONLY 
about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, 
so that things will keep working, if/when you make the obvious security 
decision and make usage of the security-features provided to you for free by 
the container.

bq. Solr has no control over the server-side HTTP layer right now, so anything 
we try to do will almost certainly be wrong as soon as the user changes 
containers or decides to modify their container config.

NO!

bq. Solr 5.0 will not ship as a .war file

Bad idea. This is one of the points where Solr did a better decision that ES

bq.  Once Solr is a real application 

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5652:
---

Attachment: SOLR-5652.codec.skip.dv.patch

rather then just use SupressCodec in this test, here's a patch that checks to 
see if the codec supports docvalues with sort missing, and if not then it skips 
those fields -- but the other fields are still checked.

you can see it working by comparing the logs messages (showing the fields 
tested) between things like...

{noformat}
ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene40
   vs
ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene45
{noformat}

Before i commit this though, i really want to add an explicit sanity checking 
that the docs are in the expected order so we can see a definitive and 
consistent fail from the problem this tries to prevent ... i'm going to work on 
that this afternoon.

(I also want to docValue fields to the test schema that don't use either 
sortMissingLast _or_ sortMissingFirst, and just rely on the default behavior 
... not sure why i didn't think to include that in the first place)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5671:
-

Description: 
Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.

All the failures were using either Lucene41 or Lucene42 codec

  was:
Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.


 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.
 All the failures were using either Lucene41 or Lucene42 codec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883317#comment-13883317
 ] 

Robert Muir commented on SOLR-5652:
---

{quote}
On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and 
he pointed out that DocValuesMissingTest uses the following...

@SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot 
represent missing values

...so this may be the smoking gun to explain what's going wrong here, since we 
don't do anything like this in the cursor tests. (yet ... i'm going to fix that 
now)
{quote}

Dammit, I feel pretty terrible. You guys have been debugging this thing for a 
long time, and I've been trying to stay up to date on the issue, but not once 
did I even think about this...



 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5670.


   Resolution: Fixed
Fix Version/s: 4.7
   5.0

Committed.  Thanks!

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Fix For: 5.0, 4.7

 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883402#comment-13883402
 ] 

Steve Rowe commented on SOLR-5652:
--

bq. rather then just use SupressCodec in this test, here's a patch that checks 
to see if the codec supports docvalues with sort missing, and if not then it 
skips those fields – but the other fields are still checked.

+1, looks good, though on trunk Lucene3x and Appending can be removed from 
the blacklist in LTC.defaultCodecSupportsMissingDocValues().  I see these 
elsewhere on trunk (Solr tests only), though, so maybe they're not just 
vestiges?

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883426#comment-13883426
 ] 

Mark Miller commented on SOLR-4470:
---

The bulk of this patch was not that contentious. The rest seemed to mostly be 
hashed out. The missing piece has been a committer with the skill and time to 
put it in, take responsibility for it, and support it. 

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-01-27 Thread Arcadius Ahouansou (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883452#comment-13883452
 ] 

Arcadius Ahouansou commented on LUCENE-5376:


Hello.

I have checked out this branch and did in the lucene directory an 
ant clean package-zip
The build was successful and many artefacts were created including:

- lucene-xml-query-demo.war
- lucene-demo-5.0-SNAPSHOT.jar
- lucene-server-5.0-SNAPSHOT.jar

I dropped the war into a fresh jetty 9 install and jetty was not happy (see 
stacktrace below).

My questions is:
- How the demo and the new server package fit together?
- How to run the demo?

Thanks.

{code}
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:297)
at org.eclipse.jetty.start.Main.start(Main.java:724)
at org.eclipse.jetty.start.Main.main(Main.java:103)
2014-01-27 22:21:36.288:WARN:lucene-xml-query-demo:main: unavailable
javax.servlet.UnavailableException: 
org.apache.lucene.xmlparser.webdemo.FormBasedXmlQueryDemo
at org.eclipse.jetty.servlet.BaseHolder.doStart(BaseHolder.java:102)
at 
org.eclipse.jetty.servlet.ServletHolder.doStart(ServletHolder.java:294)
{code}

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



maven build issues with non-numeric custom version

2014-01-27 Thread Ryan McKinley
From:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/dev-tools/maven/README.maven

It says we can get a custom build number using:

ant -Dversion=my-special-version get-maven-poms


but this fails with:

BUILD FAILED

/Users/ryan/workspace/apache/lucene_4x/build.xml:141: The following error
occurred while executing this line:

/Users/ryan/workspace/apache/lucene_4x/lucene/common-build.xml:1578: The
following error occurred while executing this line:

/Users/ryan/workspace/apache/lucene_4x/lucene/tools/custom-tasks.xml:122:
Malformed module dependency from
'lucene-analyzers-phonetic.internal.test.dependencies':
'lucene/build/analysis/common/lucene-analyzers-common-my-special-version.jar'



Using a numeric version number things work OK.


Any ideas?


ryan


[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883459#comment-13883459
 ] 

Jan Høydahl commented on SOLR-4470:
---

I started the port to trunk along with some other changes last summer, but did 
not get to finalize it within the time available at that time. I also realized 
I need some help moving along as I'm quite novice on servlet security.

Implementing this patch for 5.0 and 4.x would still be worth the effort, should 
we choose to replace the container with Netty or something else, since most of 
the internal inter-node communication will stay the same - is that correct?

When I dived into this last time around the intent was to commit a working impl 
to trunk first, let it bake for a few weeks (perhaps with the test framework 
randomizin security on/off) and then backport. This is best practice for big 
changes, and this patch is HUGE. So here is one committer willing to 
contribute, but I need some help from someone willing to take a look at 
https://github.com/cominvent/lucene-solr/tree/SOLR-4470 and finding out out 
what 1% is missing for it to work, and then get it up to date with current 
trunk...

 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Lucene 5092 pull 1

2014-01-27 Thread PaulElschot
GitHub user PaulElschot opened a pull request:

https://github.com/apache/lucene-solr/pull/24

Lucene 5092 pull 1

DocBlocksIterator extends DocIdSetIterator.
FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator.
The join module ToParent/ToChild queries use DocBlocksIterator instead of 
FixedBitSet.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PaulElschot/lucene-solr LUCENE-5092-pull-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/24.patch


commit 0b4c85b1b30426f34f65a03c32bb2618e1d03f99
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T19:31:14Z

Ignore *.*~ and *.jar files

commit 9a3c80013219b986340cd5a470fb30d20d35504a
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T20:35:54Z

Add first version of DocBlockIterator

commit 77341eed771facde8cf89bc85c99fe0ccd6bd257
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T20:53:00Z

OpenBitSetIterator extends DocBlockIterator, advanceToJustBefore() not yet 
implemented.

commit d920b8e6f2fbf39da42a5eff19301c4ca92647c6
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T21:46:48Z

Initial implementation of OpenBitSetIterator.advanceToJustBefore()

commit ebff7763d31518989882909da56e0b9be22a4f89
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T21:57:38Z

The OpenBitSetIterator constructor not using an OpenBitSet can not easily 
be deleted

commit 4166b0e4fa44b10f7c25158a811ff8593d540957
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T22:16:30Z

More detailed plan

commit 807f98db323ee78454d6bb7d76a9d40d89e8126b
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:11:17Z

Rename to DocBlocksIterator

commit 7ea28b0443e62d4e02458943a06cd97a9c8ad843
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:17:09Z

Rename to class DocBlocksIterator

commit 42e4bbc18769f7f91a6dfd730cc5d7d51582cb6c
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:52:21Z

Adapted ToParentBlockJoinQuery to use DocBlocksIterator directly from FBS, 
tests pass

commit 3d7819bc9e3b8754e6f882e60a0920800ba09954
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T21:19:53Z

Remove some commented code

commit 4b2a7a4a529810dbf742958463c3f9327444f3b1
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T22:26:27Z

Getting closer with ToChildBJQ

commit 24032392ede9b8b2997152f4f6aec3af03a6e550
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T15:16:21Z

Merge branch 'trunk' into docblocksiter

commit 8fde265979ba8913045a3f9cd87a15482739cc43
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T16:49:48Z

Always set OpenBitSet attribute in OpenBitSetIterator

commit b7627dd4f41aff421af6d9a0781fcc13fe668995
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T16:51:06Z

Added a test for advanceToJustBefore in BaseDocIdSetTestCase, 
TestFixedBitSet fails

commit f1966ae5b4f375c7451ff083288e409a0b41b9ef
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T21:14:11Z

Previous test seed passes, next one fails

commit c198cd8b6b06187c65477f088dad918974721099
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-22T23:49:52Z

Added OpenBitSetDocBlocksIterator

commit c29094ceba3bec8773e51c17fe3c80abab5ae526
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-22T23:53:00Z

Merge branch 'trunk' of https://github.com/apache/lucene-solr into 
docblocksiter

commit 7f7d8901bb396b82a0e874ca1f3c4264806fcd8e
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T20:37:49Z

Improve ignoring lib directories

commit e8abc6f30060ac10de886b6fcc225d561e4758b5
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:20:13Z

Added FixedBitSetDBI, tests pass.
FixedBitSet.java from trunk, made some private things protected.

commit f78dca9bdf2b79fe3fbb7b80898fb88420891418
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:30:05Z

Remove some unused imports

commit 273a7e80767252f9748878878b0e9d742d2df669
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:33:17Z

Remove commented println lines

commit 3f93aa8d76422844d141fc2070a236e780e577f8
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:24:09Z

Add TestDocIdSetBenchMark.java. Note: no APL 2.0

commit 3ca778ffee79cc9bd549e4b0dd37e00f16ba6320
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:06Z

Add assert message

commit 50f0175fda3637b88e982f285021921c69fe4dff
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:22Z

Correct comment

commit d07201d00dada7d3c4bde33471dac3accdb9b1e8
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:52Z

Remove 

[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2014-01-27 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883518#comment-13883518
 ] 

Paul Elschot commented on LUCENE-5092:
--

I have opened this pull request:
https://github.com/apache/lucene-solr/pull/24
In case a patch is preferred, please let me know.

In the pull request:
DocBlocksIterator extends DocIdSetIterator.
FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator, so 
EliasFanoDocIdSet could also be used for joins.
The join module ToParent/ToChild queries use DocBlocksIterator instead of 
FixedBitSet.
In the join module, FixedBitSetCachingWrapperFilter.java is replaced by 
DocBlocksCachingWrapperFilter which uses FixedBitSetDBI for now.

LUCENE-5416 is open for FixedBitSetDBI.



 join: don't expect all filters to be FixedBitSet instances
 --

 Key: LUCENE-5092
 URL: https://issues.apache.org/jira/browse/LUCENE-5092
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5092.patch


 The join module throws exceptions when the parents filter isn't a 
 FixedBitSet. The reason is that the join module relies on prevSetBit to find 
 the first child document given a parent ID.
 As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by 
 exposing methods in the iterators to iterate backwards. When the join modules 
 gets an iterator which isn't able to iterate backwards, it would just need to 
 dump its content into another DocIdSet that supports backward iteration, 
 FixedBitSet for example.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2014-01-27 Thread Vassil Velichkov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883528#comment-13883528
 ] 

Vassil Velichkov commented on SOLR-2242:


I really hope that this issue will be resolved in SOLR 4.7...Fingers crossed :-)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0-ALPHA
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 Parameters:
 facet.numTerms or f.field.facet.numTerms = true (default is false) - turn 
 on distinct counting of terms
 facet.field - the field to count the terms
 It creates a new section in the facet section...
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=falsefacet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields.../lst
 lst name=facet_numTerms
 lst name=localhost:8983/solr/
 int name=price14/int
 /lst
 lst name=localhost:8080/solr/
 int name=price14/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 OR with no sharding-
 lst name=facet_numTerms
 int name=price14/int
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5652:
---

Attachment: SOLR-5652.nocommit.patch

Ok, this new patch has the following...
* new {{\*_dv}} fields in the schema for all the various types w/o using any of 
the sort missing options
* tweaked the simple testing in both the single node and distrib test so that:
** one doc is missing an int value
** we randomly pick either int or int_dv as a field to use in explicit sorts
*** currently a nocommit in place to force this to be int_dv
** we explicitly sort on all 3 missing sub-variants (, _first, _last) 
and check the doc order exactly matches our expectations
* includes everything from SOLR-5652.codec.skip.dv.patch...
** ...but there is a nocommit bypassing hte codec check so docvalues are always 
used.

With this patch, and these nocommits, it's pretty trivial to reliably reproduce 
failing seeds that pop up when running...

{code}
ant test  -Dtests.class=\*Cursor\* -Dtests.codec=Lucene40
{code}

...and likewise, my limted testing so far hasn't seen any failures when running 
this patch with Lucene45 codec...

{code}
ant test  -Dtests.class=\*Cursor\* -Dtests.codec=Lucene45
{code}


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 SOLR-5652.nocommit.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5652:
---

Attachment: SOLR-5652.patch

patch i think is commitable - same as SOLR-5652.nocommit.patch but with the 
nocommits removed, and the (in hindsight) obvious change needed to my new 
intsort field ranomization so that when the codec's docvalues support can't 
handle missing values, we use the non-docvalues version of that field for the 
explicit checks of \*_last and \*_first sorting

I'm currently bash loop hammering on this patch -- would appreciate it if 
others could try the same.

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 SOLR-5652.nocommit.patch, SOLR-5652.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread David Webster (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883656#comment-13883656
 ] 

David Webster commented on SOLR-4470:
-

Again, appreciate the input, looks like the issue is at least alive.  We are 
meeting Friday on this issue to plot our strategy.  I am getting familiar with 
the specifics of the issue, and am coming to realize the type of HTTP container 
is largely irrelevant, so long as it is spec-compliant servlet container (as 
Tomcat and Jetty are).

I do not particularly agree with the need for a container, however.  We are 
gradually moving away from pre-packaged containers ourselves, instead moving 
towards framework tools like Spring Web and Grizzly2.  We write all our own 
JAAS LoginModules today and have a deep bench when it comes to managing service 
side security, be those servlet (RESTful/HTTP), JMS, or anything else. There 
are pluses and minuses in whether or not to use standard containers or roll 
your own Servlet implementation.  Another discussion for another day 

We have had the same issue present in Solr in our RESTful service 
implementations in making them secure.  We have a maturing RESTful/HTTP 
security standard, and that requires our REST client code to do very specific 
things when making down stream requests to secure services that expect a very 
specific secured request. For instance, I can add a valve to Tomcat to have it 
check for a user's SiteMinder cookie and then validate it with a call to a 
Policy server.  I could also implement a secret key (kerberos type thing). I 
can implement that capability on the service side via a JAAS LoginModule, and 
Tomcat Valve configuration without digging into Tomcat core code.  But on the 
client side I have to write actual core code to place the SiteMinder 
token/Secret key encryption, etc.. in a cookie or header, etc, and send it 
downstream. 

I imagine the same must be true in the SolrCloud.  I can lock down the receiver 
side via configuration and standard Container plugins, but it's the sender side 
that we can do nothing about without some core code modification that would 
allow us to send whatever security artifacts downstream we deem appropriate.  
My main fear is performance within the cloud during the sharding processes.



 Support for basic http auth in internal solr requests
 -

 Key: SOLR-4470
 URL: https://issues.apache.org/jira/browse/SOLR-4470
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, multicore, replication (java), SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Jan Høydahl
  Labels: authentication, https, solrclient, solrcloud, ssl
 Fix For: 4.7

 Attachments: SOLR-4470.patch, SOLR-4470.patch, 
 SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, 
 SOLR-4470_branch_4x_r145.patch


 We want to protect any HTTP-resource (url). We want to require credentials no 
 matter what kind of HTTP-request you make to a Solr-node.
 It can faily easy be acheived as described on 
 http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes 
 also make internal request to other Solr-nodes, and for it to work 
 credentials need to be provided here also.
 Ideally we would like to forward credentials from a particular request to 
 all the internal sub-requests it triggers. E.g. for search and update 
 request.
 But there are also internal requests
 * that only indirectly/asynchronously triggered from outside requests (e.g. 
 shard creation/deletion/etc based on calls to the Collection API)
 * that do not in any way have relation to an outside super-request (e.g. 
 replica synching stuff)
 We would like to aim at a solution where original credentials are 
 forwarded when a request directly/synchronously trigger a subrequest, and 
 fallback to a configured internal credentials for the 
 asynchronous/non-rooted requests.
 In our solution we would aim at only supporting basic http auth, but we would 
 like to make a framework around it, so that not to much refactoring is 
 needed if you later want to make support for other kinds of auth (e.g. digest)
 We will work at a solution but create this JIRA issue early in order to get 
 input/comments from the community as early as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5409) ToParentBlockJoinCollector.getTopGroups returns empty Groups

2014-01-27 Thread Peng Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peng Cheng updated LUCENE-5409:
---

Attachment: local_history.patch

patch file

 ToParentBlockJoinCollector.getTopGroups returns empty Groups
 

 Key: LUCENE-5409
 URL: https://issues.apache.org/jira/browse/LUCENE-5409
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6
 Environment: Ubuntu 12.04
Reporter: Peng Cheng
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.7

 Attachments: local_history.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 A bug is observed to cause unstable results returned by the getTopGroups 
 function of class ToParentBlockJoinCollector.
 In the scorer generation stage, the ToParentBlockJoinCollector will 
 automatically rewrite all the associated ToParentBlockJoinQuery (and their 
 subqueries), and save them into its in-memory Look-up table, namely 
 joinQueryID (see enroll() method for detail). Unfortunately, in the 
 getTopGroups method, the new ToParentBlockJoinQuery parameter is not 
 rewritten (at least users are not expected to do so). When the new one is 
 searched in the old lookup table (considering the impact of rewrite() on 
 hashCode()), the lookup will largely fail and eventually end up with a 
 topGroup collection consisting of only empty groups (their hitCounts are 
 guaranteed to be zero).
 An easy fix would be to rewrite the original BlockJoinQuery before invoking 
 getTopGroups method. However, the computational cost of this is not optimal. 
 A better but slightly more complex solution would be to save unrewrited 
 Queries into the lookup table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5409) ToParentBlockJoinCollector.getTopGroups returns empty Groups

2014-01-27 Thread Peng Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883693#comment-13883693
 ] 

Peng Cheng commented on LUCENE-5409:


Finally got your test case: it only appears in larger scale, this is really 
excruciating as I'm not a software architect.

To run the failed test case, please apply the attached patch or manually copy 
the unit test function into testBlockJoin.java

 ToParentBlockJoinCollector.getTopGroups returns empty Groups
 

 Key: LUCENE-5409
 URL: https://issues.apache.org/jira/browse/LUCENE-5409
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.6
 Environment: Ubuntu 12.04
Reporter: Peng Cheng
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.7

 Attachments: local_history.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 A bug is observed to cause unstable results returned by the getTopGroups 
 function of class ToParentBlockJoinCollector.
 In the scorer generation stage, the ToParentBlockJoinCollector will 
 automatically rewrite all the associated ToParentBlockJoinQuery (and their 
 subqueries), and save them into its in-memory Look-up table, namely 
 joinQueryID (see enroll() method for detail). Unfortunately, in the 
 getTopGroups method, the new ToParentBlockJoinQuery parameter is not 
 rewritten (at least users are not expected to do so). When the new one is 
 searched in the old lookup table (considering the impact of rewrite() on 
 hashCode()), the lookup will largely fail and eventually end up with a 
 topGroup collection consisting of only empty groups (their hitCounts are 
 guaranteed to be zero).
 An easy fix would be to rewrite the original BlockJoinQuery before invoking 
 getTopGroups method. However, the computational cost of this is not optimal. 
 A better but slightly more complex solution would be to save unrewrited 
 Queries into the lookup table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_51) - Build # 9162 - Still Failing!

2014-01-27 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9162/
Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:49255 within 3 ms

Stack Trace:
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
connect to ZooKeeper 127.0.0.1:49255 within 3 ms
at 
__randomizedtesting.SeedInfo.seed([563CD1B6D5724F09:D7DA5FAEA22D2F35]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:147)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:98)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:93)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:198)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at