AW: release votes

2014-04-24 Thread Thomas Koch
Hi Andi,

I don't agree that it is unimportant to make PyLucene releases. Without a
ready-to-run software package the hurdles to use PyLucene are raised. It is
already not quite simple (for beginners) to install PyLucene on the various
platforms. Having a packaged release that is tested by some users provides a
benefit to the community in my opinion. 

However I can understand your arguments - there has been little feedback on
your release announcements on the list recently. On the other hand there are
frequent discussions about PyLucene on the list so I don't think the
interest has declined. Did you check the number of downloads of the PyLucene
distributions (if this is possible at all - due to the distributed releases
on the apache mirrors ...)? This would be a more accurate indicator from my
point of view.

I must also admit that I did never understand the voting process in detail -
i.e. who are the PMC members and what impact have  votes of non PMC users.
Maybe some more transparency and another call for action would help to
raise awareness in the community. 

Just my thoughts...


regards,
Thomas 
--
OrbiTeam Software GmbH  Co. KG
http://www.orbiteam.de


 -Ursprüngliche Nachricht-
 Von: Andi Vajda [mailto:va...@apache.org]
 Gesendet: Donnerstag, 24. April 2014 02:28
 An: pylucene-dev@lucene.apache.org
 Betreff: release votes
 
 
   Hi all,
 
 Given the tiny amount of interest the pylucene releases create, it's maybe
 become unimportant to actually make PyLucene releases ?
 
 The release votes have had an increasingly difficult time to garner the
three
 required PMC votes to pass. Non PMC users are also eerily quiet.
 
 Maybe the time has come to switch to a different model:
 
   - when a Lucene release happens, a PyLucene branch gets created with all
 the necessary changes to build successfully and pass all tests against
 this Lucene release
   - users interested in PyLucene check out that branch
   - done
 
   - no more releases, no more votes
 
 JCC can continue to be released to PyPI independently as it is today.
 That doesn't require any voting anyway (?).
 
 What do readers of this list think ?
 
 Andi..



Re: AW: release votes

2014-04-24 Thread Andi Vajda


On Thu, 24 Apr 2014, Thomas Koch wrote:


I don't agree that it is unimportant to make PyLucene releases. Without a
ready-to-run software package the hurdles to use PyLucene are raised. It is
already not quite simple (for beginners) to install PyLucene on the various
platforms. Having a packaged release that is tested by some users provides a
benefit to the community in my opinion.


I agree with you that making releases is important. However, when votes are 
called to actually make them, it's been hard to get voters to respond.


Anyone can vote. Anyone with an interest should vote. Three PMC votes are 
required to make a release happen, though. But any vote for or against is 
important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes 
needed to make a release happen (since mine is cast when I cut the release 
candidate). I think this is in part _because_ no one else is showing an 
interest in the release and casting a vote either.



However I can understand your arguments - there has been little feedback on
your release announcements on the list recently. On the other hand there are
frequent discussions about PyLucene on the list so I don't think the
interest has declined. Did you check the number of downloads of the PyLucene
distributions (if this is possible at all - due to the distributed releases
on the apache mirrors ...)? This would be a more accurate indicator from my
point of view.


I have no idea about the number of downloads of PyLucene. JCC, however, has 
gotten over 2700 downloads in the past month:

  https://pypi.python.org/pypi/JCC/2.19


I must also admit that I did never understand the voting process in detail -
i.e. who are the PMC members and what impact have  votes of non PMC users.
Maybe some more transparency and another call for action would help to
raise awareness in the community.


There are at least three classes in the Apache meritocracy:
  - users, developers, contributors but not committers
  - committers, ie developers who can commit patches to the project
  - PMC members, ie project committers that sit on the PMC (project
management committee)
For more information, please see:
  https://www.apache.org/foundation/how-it-works.html

By the rules guiding the release of Apache projects, three PMC votes are 
necessary to release a tarball to the world.

The list of Lucene committers is visible here:
  http://lucene.apache.org/whoweare.html
Scroll down that list for the PMC membership.

Andi..



Just my thoughts...


regards,
Thomas
--
OrbiTeam Software GmbH  Co. KG
http://www.orbiteam.de



-Ursprüngliche Nachricht-
Von: Andi Vajda [mailto:va...@apache.org]
Gesendet: Donnerstag, 24. April 2014 02:28
An: pylucene-dev@lucene.apache.org
Betreff: release votes


  Hi all,

Given the tiny amount of interest the pylucene releases create, it's maybe
become unimportant to actually make PyLucene releases ?

The release votes have had an increasingly difficult time to garner the

three

required PMC votes to pass. Non PMC users are also eerily quiet.

Maybe the time has come to switch to a different model:

  - when a Lucene release happens, a PyLucene branch gets created with all
the necessary changes to build successfully and pass all tests against
this Lucene release
  - users interested in PyLucene check out that branch
  - done

  - no more releases, no more votes

JCC can continue to be released to PyPI independently as it is today.
That doesn't require any voting anyway (?).

What do readers of this list think ?

Andi..




Re: release votes

2014-04-24 Thread Aric Coady
On Apr 24, 2014, at 11:40 AM, Andi Vajda va...@apache.org wrote:
 On Thu, 24 Apr 2014, Thomas Koch wrote:
 I don't agree that it is unimportant to make PyLucene releases. Without a
 ready-to-run software package the hurdles to use PyLucene are raised. It is
 already not quite simple (for beginners) to install PyLucene on the various
 platforms. Having a packaged release that is tested by some users provides a
 benefit to the community in my opinion.
 
 I agree with you that making releases is important. However, when votes are 
 called to actually make them, it's been hard to get voters to respond.
 
 Anyone can vote. Anyone with an interest should vote. Three PMC votes are 
 required to make a release happen, though. But any vote for or against is 
 important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes 
 needed to make a release happen (since mine is cast when I cut the release 
 candidate). I think this is in part _because_ no one else is showing an 
 interest in the release and casting a vote either.

Oh, well I for one had no idea votes from the community at large were 
encouraged.  In that case…

+1.  I tested 4.7.2 against my downstream project.  No issues.

 However I can understand your arguments - there has been little feedback on
 your release announcements on the list recently. On the other hand there are
 frequent discussions about PyLucene on the list so I don't think the
 interest has declined. Did you check the number of downloads of the PyLucene
 distributions (if this is possible at all - due to the distributed releases
 on the apache mirrors ...)? This would be a more accurate indicator from my
 point of view.
 
 I have no idea about the number of downloads of PyLucene. JCC, however, has 
 gotten over 2700 downloads in the past month:
  https://pypi.python.org/pypi/JCC/2.19
 
 I must also admit that I did never understand the voting process in detail -
 i.e. who are the PMC members and what impact have  votes of non PMC users.
 Maybe some more transparency and another call for action would help to
 raise awareness in the community.
 
 There are at least three classes in the Apache meritocracy:
  - users, developers, contributors but not committers
  - committers, ie developers who can commit patches to the project
  - PMC members, ie project committers that sit on the PMC (project
management committee)
 For more information, please see:
  https://www.apache.org/foundation/how-it-works.html
 
 By the rules guiding the release of Apache projects, three PMC votes are 
 necessary to release a tarball to the world.
 The list of Lucene committers is visible here:
  http://lucene.apache.org/whoweare.html
 Scroll down that list for the PMC membership.
 
 Andi..



Re: AW: release votes

2014-04-24 Thread Robert Muir
On Thu, Apr 24, 2014 at 2:40 PM, Andi Vajda va...@apache.org wrote:

 I agree with you that making releases is important. However, when votes are
 called to actually make them, it's been hard to get voters to respond.

 Anyone can vote. Anyone with an interest should vote. Three PMC votes are
 required to make a release happen, though. But any vote for or against is
 important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes
 needed to make a release happen (since mine is cast when I cut the release
 candidate). I think this is in part _because_ no one else is showing an
 interest in the release and casting a vote either.

I don't think thats necessarily the case, for me (as someone who tries
to vote for pylucene releases), the problem was a combination of two
things, as I did try to actually test it over the weekend:

1. being on travel, meaning stuck with a mac os X computer.
2. release candidate not compiling on my mac os X computer, because
something tries to apply -mno-fused-madd when compiling, apparently
this is a common issue with python and mavericks?

Two things that may have nothing to do with pylucene, but was pretty
annoying specially for a non-python developer :)

I am happy to try it on my linux machine tonight!


Re: AW: release votes

2014-04-24 Thread Andi Vajda

 On Apr 24, 2014, at 15:44, Robert Muir rcm...@gmail.com wrote:
 
 On Thu, Apr 24, 2014 at 2:40 PM, Andi Vajda va...@apache.org wrote:
 
 I agree with you that making releases is important. However, when votes are
 called to actually make them, it's been hard to get voters to respond.
 
 Anyone can vote. Anyone with an interest should vote. Three PMC votes are
 required to make a release happen, though. But any vote for or against is
 important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes
 needed to make a release happen (since mine is cast when I cut the release
 candidate). I think this is in part _because_ no one else is showing an
 interest in the release and casting a vote either.
 
 I don't think thats necessarily the case, for me (as someone who tries
 to vote for pylucene releases), the problem was a combination of two
 things, as I did try to actually test it over the weekend:

I sure didn't mean to blame you (or Mike) as you both usually provide the two 
extra PMC votes needed.
No, I'm just trying to gauge what to do given the dwindling interest of the 
other readers of this list and the lack of interest of the 25 other PMC members.
Doing releases is work and so is testing a build for a release vote. If the 
process can be streamlined, I'm all for it.

 1. being on travel, meaning stuck with a mac os X computer.
 2. release candidate not compiling on my mac os X computer, because
 something tries to apply -mno-fused-madd when compiling, apparently
 this is a common issue with python and mavericks?

To build any Python extension such as PyLucene and JCC you need to use the same 
compiler that was used to build Python.
On Mac, it seems that many people are running a gcc-built python but then use 
clang when building extensions. This is most likely because Apple switched 
compilers sometimes along the way. 
The simplest route is to build python from sources, it's simple and easy.

Andi..

 
 Two things that may have nothing to do with pylucene, but was pretty
 annoying specially for a non-python developer :)
 
 I am happy to try it on my linux machine tonight!


Re: [VOTE] Release PyLucene 4.7.2-1

2014-04-24 Thread Andi Vajda


This vote has passed !
Thank you to all, PMC or not PMC, who cast a vote.

Andi..

On Tue, 15 Apr 2014, Andi Vajda wrote:



The PyLucene 4.7.2-1 release tracking today's release of Apache Lucene 4.7.2 
is ready.


A release candidate is available from:
http://people.apache.org/~vajda/staging_area/

A list of changes in this release can be seen at:
http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_7/CHANGES

PyLucene 4.7.2 is built with JCC 2.19 included in these release artifacts:
http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES

A list of Lucene Java changes can be seen at:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_7_2/lucene/CHANGES.txt

Please vote to release these artifacts as PyLucene 4.7.2-1.

Thanks !

Andi..

ps: the KEYS file for PyLucene release signing is at:
http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
http://people.apache.org/~vajda/staging_area/KEYS

pps: here is my +1



[jira] [Commented] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979419#comment-13979419
 ] 

Simon Willnauer commented on LUCENE-5628:
-

I think it's worth doing the optimization here. A couple of comments

 * can we put the exit condition into the while block instead of at the end 
with a break I think it can just be while(string.length  0) 
 * looking at the impl of State I think we can just use an identity hashset or 
maybe even an array since the Ids are within known bounds to check the 
pathStates? You could even just us a bitset and mark the state ID as visited? 
Hmm now that I wrote it I see your comment :) I will leave it here for 
dicsussion.
 * Somewhat unrelated but I think the State implementation has a problem since 
it doen't override equlas but it should since it has an hashcode impl. I wonder 
if we either should remove the hashCode or add equals just for consistency?
 * should we rather throw IllegalState than IllegalArgument :D 
 * just for readability it might be good to s/strings/finiteStrings/ I had a 
hard time to see when you do things on the string vs. strings
 * is this a leftover == // a.getNumberedStates();




 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979420#comment-13979420
 ] 

Elran Dvir commented on SOLR-2894:
--

Brett, thanks for your response.

Having a mincount of -1 for the shards is correct. The reason is that while a 
given shard may have a count lower than mincount for a given term, the 
aggregate total count for that value when combined with the other shards 
could exceed the mincount, so we do need to know about it. For example, 
consider a mincount of 10. If we have 3 shards with a count of 5 for a term 
of Boston, we would still need to know about these because the total 
count would be 15, and would be higher than the mincount.
If mincount of 1 is asked for a field, couldn't it be more efficient? Is 
mincount of -1 necessary in this case?
I would expect the skipRefinementAtThisLevel to be false for the top level 
pivot facet, and true for each other level. Are you seeing otherwise? 
No. You are right.
If you were to set a facet.limit of 10 for all levels of the pivot, what is 
the memory usage like?
The memory usage in this case is about 200 MB.

Thanks again. 

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979445#comment-13979445
 ] 

Robert Muir commented on LUCENE-5628:
-

Can we reduce the number of lines of code in the new method? It's not even 
comparable to the current code. How much of the loc is the cycle detection? 
Given the cost, this may not be worth it. This is expert shit and the user can 
add assert.isfinite to their code. How much of the loc is code optimization? 
Can the old code please be added to automatontestutil as slowxxx and compared 
against the new one with random automata?


 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979452#comment-13979452
 ] 

Robert Muir commented on LUCENE-5628:
-

If we want to baby the user, and in not sure what user we have in mind here, 
just invoke isfinite. I don't like the code dup nor the precedence that 
unrelated code needs to deal with this.

This thing needs to get much shorter and simpler to go in. Can we make it 
slower to achieve that? I would make it 10 times slower if it removed 1/2 the 
code... Without hesitation.

 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5622) Fail tests if they print, and tests.verbose is not set

2014-04-24 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979493#comment-13979493
 ] 

Dawid Weiss commented on LUCENE-5622:
-

While annotating tests that do sysouts I came to the conclusion that it 
shouldn't be an all or nothing threshold. It should be much like the memory 
leak detector -- some sysouts per suite should be fine (say, 1kb), then it 
should start failing and suggest to change some of the sysouts to if (VERBOSE) 
or raise the limit by annotating the suite with a higher threshold.

This would make sense in that we could enable those checks by default without 
additional jenkins jobs, special properties, etc. What do you think?

 Fail tests if they print, and tests.verbose is not set
 --

 Key: LUCENE-5622
 URL: https://issues.apache.org/jira/browse/LUCENE-5622
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-5622.patch, LUCENE-5622.patch, LUCENE-5622.patch, 
 LUCENE-5622.patch


 Some tests print so much stuff they are now undebuggable (see LUCENE-5612).
 I think its bad that the testrunner hides this stuff, we used to stay on top 
 of it. Instead, whne tests.verbose is false, we should install a printstreams 
 (system.out/err) that fail the test instantly because they are noisy. 
 This will ensure that our tests don't go out of control.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: release votes

2014-04-24 Thread Michael McCandless
+1

Mike McCandless

http://blog.mikemccandless.com


On Wed, Apr 23, 2014 at 8:27 PM, Andi Vajda va...@apache.org wrote:

  Hi all,

 Given the tiny amount of interest the pylucene releases create, it's maybe
 become unimportant to actually make PyLucene releases ?

 The release votes have had an increasingly difficult time to garner the
 three required PMC votes to pass. Non PMC users are also eerily quiet.

 Maybe the time has come to switch to a different model:

  - when a Lucene release happens, a PyLucene branch gets created with all
the necessary changes to build successfully and pass all tests against
this Lucene release
  - users interested in PyLucene check out that branch
  - done

  - no more releases, no more votes

 JCC can continue to be released to PyPI independently as it is today.
 That doesn't require any voting anyway (?).

 What do readers of this list think ?

 Andi..


[jira] [Commented] (LUCENE-5622) Fail tests if they print, and tests.verbose is not set

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979503#comment-13979503
 ] 

ASF subversion and git services commented on LUCENE-5622:
-

Commit 1589645 from [~dawidweiss] in branch 'dev/branches/LUCENE-5622'
[ https://svn.apache.org/r1589645 ]

Branch for LUCENE-5622

 Fail tests if they print, and tests.verbose is not set
 --

 Key: LUCENE-5622
 URL: https://issues.apache.org/jira/browse/LUCENE-5622
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-5622.patch, LUCENE-5622.patch, LUCENE-5622.patch, 
 LUCENE-5622.patch


 Some tests print so much stuff they are now undebuggable (see LUCENE-5612).
 I think its bad that the testrunner hides this stuff, we used to stay on top 
 of it. Instead, whne tests.verbose is false, we should install a printstreams 
 (system.out/err) that fail the test instantly because they are noisy. 
 This will ensure that our tests don't go out of control.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979519#comment-13979519
 ] 

Michael McCandless commented on LUCENE-5628:


Good feedback, thanks!

bq. can we put the exit condition into the while block instead of at the end 
with a break I think it can just be while(string.length  0)

Fixed.

bq. looking at the impl of State I think we can just use an identity hashset or 
maybe even an array since the Ids are within known bounds to check the 
pathStates? You could even just us a bitset and mark the state ID as visited? 
Hmm now that I wrote it I see your comment  I will leave it here for dicsussion.

I switched to IdentityHashSet.

Yeah I struggled w/ this, but the original method didn't set the state
numbers so I didn't want to change that.  Setting the numbers does a
DFS on the automaton...

bq. Somewhat unrelated but I think the State implementation has a problem since 
it doen't override equlas but it should since it has an hashcode impl. I wonder 
if we either should remove the hashCode or add equals just for consistency?

I removed State.hashCode.

bq. should we rather throw IllegalState than IllegalArgument 

Hmm, IAE felt right since you passed it an invalid (cyclic) argument?

bq. just for readability it might be good to s/strings/finiteStrings/ I had a 
hard time to see when you do things on the string vs. strings

I changed to results.

bq. is this a leftover == // a.getNumberedStates();

Removed.


 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch, LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5628:
---

Attachment: LUCENE-5628.patch

New patch.

 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch, LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979520#comment-13979520
 ] 

Michael McCandless commented on LUCENE-5628:


bq. Can we reduce the number of lines of code in the new method? It's not even 
comparable to the current code. 

I'll see if I can simplify it somehow ...

bq. How much of the loc is the cycle detection?

This is really a miniscule part of it: just look for whoever touches
the pathStates.

bq. How much of the loc is code optimization?

It's not optimization; in fact I imagine this impl is slower.

bq. Can the old code please be added to automatontestutil as slowxxx and 
compared against the new one with random automata?

Great idea, I did that.


 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch, LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5629) Comparing the Version of Lucene , the Analyzer and the similarity function that are being used for indexing and searching.

2014-04-24 Thread Isabel Mendonca (JIRA)
Isabel Mendonca created LUCENE-5629:
---

 Summary: Comparing the Version of Lucene , the Analyzer and the 
similarity function that are being used for indexing and searching.
 Key: LUCENE-5629
 URL: https://issues.apache.org/jira/browse/LUCENE-5629
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index, core/queryparser, core/search
Affects Versions: 4.7.1, 4.7
 Environment: Operating system : Windows 8.1
Software platform : Eclipse Kepler 4.3.2
Reporter: Isabel Mendonca
Priority: Minor
 Fix For: 4.7.1, 4.7


We have observed that Lucene does not check if the same Similarity function is 
used during indexing and searching. The same problem exists for the Analyzer 
that is used. This may lead to poor or misleading results.

So we decided to create an xml file during indexing that will store information 
such as the Analyzer and the Similarity function that were used as well as the 
version of Lucene that was used. This xml file will always be available to the 
users.

At search time , we will retrieve this information using SAX parsing and check 
if the utils used for searching , match those used for indexing. If not , a 
warning message will be displayed to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5340) Add support for named snapshots

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979552#comment-13979552
 ] 

Noble Paul commented on SOLR-5340:
--

Sorry to raise this concern now. in deletebackup , Isn't possible to do the 
check of whether the snapshot exists etc to be done in the same thread and give 
a response back right away . It is much better than polling the status later

I guess , even  the backup command should do basic checks of the location etc 
before the call returns

 Add support for named snapshots
 ---

 Key: SOLR-5340
 URL: https://issues.apache.org/jira/browse/SOLR-5340
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Mike Schrag
Assignee: Noble Paul
 Attachments: SOLR-5340.patch, SOLR-5340.patch


 It would be really nice if Solr supported named snapshots. Right now if you 
 snapshot a SolrCloud cluster, every node potentially records a slightly 
 different timestamp. Correlating those back together to effectively restore 
 the entire cluster to a consistent snapshot is pretty tedious.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5629) Comparing the Version of Lucene , the Analyzer and the similarity function that are being used for indexing and searching.

2014-04-24 Thread Isabel Mendonca (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Mendonca updated LUCENE-5629:


Affects Version/s: (was: 4.7.1)
   (was: 4.7)

 Comparing the Version of Lucene , the Analyzer and the similarity function 
 that are being used for indexing and searching.
 --

 Key: LUCENE-5629
 URL: https://issues.apache.org/jira/browse/LUCENE-5629
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index, core/queryparser, core/search
 Environment: Operating system : Windows 8.1
 Software platform : Eclipse Kepler 4.3.2
Reporter: Isabel Mendonca
Priority: Minor
  Labels: features, patch
 Fix For: 4.8, 4.9, 5.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 We have observed that Lucene does not check if the same Similarity function 
 is used during indexing and searching. The same problem exists for the 
 Analyzer that is used. This may lead to poor or misleading results.
 So we decided to create an xml file during indexing that will store 
 information such as the Analyzer and the Similarity function that were used 
 as well as the version of Lucene that was used. This xml file will always be 
 available to the users.
 At search time , we will retrieve this information using SAX parsing and 
 check if the utils used for searching , match those used for indexing. If not 
 , a warning message will be displayed to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5629) Comparing the Version of Lucene , the Analyzer and the similarity function that are being used for indexing and searching.

2014-04-24 Thread Isabel Mendonca (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Mendonca updated LUCENE-5629:


Fix Version/s: (was: 4.7.1)
   (was: 4.7)
   5.0
   4.9
   4.8

 Comparing the Version of Lucene , the Analyzer and the similarity function 
 that are being used for indexing and searching.
 --

 Key: LUCENE-5629
 URL: https://issues.apache.org/jira/browse/LUCENE-5629
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index, core/queryparser, core/search
 Environment: Operating system : Windows 8.1
 Software platform : Eclipse Kepler 4.3.2
Reporter: Isabel Mendonca
Priority: Minor
  Labels: features, patch
 Fix For: 4.8, 4.9, 5.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 We have observed that Lucene does not check if the same Similarity function 
 is used during indexing and searching. The same problem exists for the 
 Analyzer that is used. This may lead to poor or misleading results.
 So we decided to create an xml file during indexing that will store 
 information such as the Analyzer and the Similarity function that were used 
 as well as the version of Lucene that was used. This xml file will always be 
 available to the users.
 At search time , we will retrieve this information using SAX parsing and 
 check if the utils used for searching , match those used for indexing. If not 
 , a warning message will be displayed to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Simon Willnauer
+1

Smoketester says:

SUCCESS! [1:26:32.631579]

Elasticsearch is happy with the RC as well

thanks uwe

On Wed, Apr 23, 2014 at 11:08 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 +1

 SUCCESS! [0:44:41.170815]

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Apr 22, 2014 at 2:47 PM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 I prepared the first release candidate of Lucene and Solr 4.8.0. The 
 artifacts can be found here:

 = 
 http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/

 It took a bit longer, because we had to fix some remaining bugs regarding 
 NativeFSLockFactory, which did not work correctly and leaked file handles. I 
 also updated the instructions about the preferred Java update versions. See 
 also Mike's blog post: 
 http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/

 Please check the artifacts and give your vote in the next 72 hrs.

 My +1 will hopefully come a little bit later because Solr tests are failing 
 constantly on my release build and smoke tester machine. The reason: it 
 seems to be lack of file handles. A standard Ubuntu configuration has 1024 
 file handles and I want a release to pass with that common default 
 configuration. Instead, 
 org.apache.solr.cloud.TestMiniSolrCloudCluster.testBasics fails always with 
 crazy error messages (not about too less file handles, more that Jetty 
 cannot start up or not bind ports or various other stuff). This did not 
 happen on smoking 4.7.x releases.

 I will run now the smoker again without HDFS (via build.properties) and if 
 that also fails then once again with more file handles. But we really have 
 to fix our tests that they succeed with the default config of 1024 file 
 handles. We can configure that in Jenkins (so the Jenkins job first sets and 
 then runs ANT ulimit -n 1024). But this should not block the release, I 
 just say: I gave up running those Solr tests, sorry! Anybody else can test 
 that stuff!

 Uwe

 P.S.: Here's my smoker command line:
 $  JAVA_HOME=$HOME/jdk1.7.0_55 JAVA7_HOME=$HOME/jdk1.7.0_55 python3.2 -u 
 smokeTestRelease.py ' 
 http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/'
  1589150 4.8.0 tmp

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979580#comment-13979580
 ] 

Noble Paul commented on SOLR-5681:
--

 workQueue.peekTopN(10);  If a task is being processed , this will always 
return immedietly with one item and the loop would continue without a pause , 
hogging CPU/ZK-traffic. You will need to ensure that the call returns if the 
available items in the queue are different from the ones being processed. 
 

 Make the OverseerCollectionProcessor multi-threaded
 ---

 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-5681.patch


 Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
 anything long running would have it block processing of other mutually 
 exclusive tasks.
 When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
 truly non-blocking behavior by multi-threading the OCP itself.
 For example, a ShardSplit call on Collection1 would block the thread and 
 thereby, not processing a create collection task (which would stay queued in 
 zk) though both the tasks are mutually exclusive.
 Here are a few of the challenges:
 * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
 easy way to handle that is to only let 1 task per collection run at a time.
 * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
 The task from the workQueue is only removed on completion so that in case of 
 a failure, the new Overseer can re-consume the same task and retry. A queue 
 is not the right data structure in the first place to look ahead i.e. get the 
 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
 which are not at the head of a queue is not really an 'intuitive' thing.
 Proposed solutions for task management:
 * Task funnel and peekAfter(): The parent thread is responsible for getting 
 and passing the request to a new thread (or one from the pool). The parent 
 method uses a peekAfter(last element) instead of a peek(). The peekAfter 
 returns the task after the 'last element'. Maintain this request information 
 and use it for deleting/cleaning up the workQueue.
 * Another (almost duplicate) queue: While offering tasks to workQueue, also 
 offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
 soon as a task from this is picked up for processing by the thread, it's 
 removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979580#comment-13979580
 ] 

Noble Paul edited comment on SOLR-5681 at 4/24/14 11:26 AM:


 workQueue.peekTopN(10);  If a task is being processed , this will always 
return immediately with one item and the loop would continue without a pause , 
hogging CPU/ZK-traffic. You will need to ensure that the call returns if the 
available items in the queue are different from the ones being processed. 
 


was (Author: noble.paul):
 workQueue.peekTopN(10);  If a task is being processed , this will always 
return immedietly with one item and the loop would continue without a pause , 
hogging CPU/ZK-traffic. You will need to ensure that the call returns if the 
available items in the queue are different from the ones being processed. 
 

 Make the OverseerCollectionProcessor multi-threaded
 ---

 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-5681.patch


 Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
 anything long running would have it block processing of other mutually 
 exclusive tasks.
 When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
 truly non-blocking behavior by multi-threading the OCP itself.
 For example, a ShardSplit call on Collection1 would block the thread and 
 thereby, not processing a create collection task (which would stay queued in 
 zk) though both the tasks are mutually exclusive.
 Here are a few of the challenges:
 * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
 easy way to handle that is to only let 1 task per collection run at a time.
 * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
 The task from the workQueue is only removed on completion so that in case of 
 a failure, the new Overseer can re-consume the same task and retry. A queue 
 is not the right data structure in the first place to look ahead i.e. get the 
 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
 which are not at the head of a queue is not really an 'intuitive' thing.
 Proposed solutions for task management:
 * Task funnel and peekAfter(): The parent thread is responsible for getting 
 and passing the request to a new thread (or one from the pool). The parent 
 method uses a peekAfter(last element) instead of a peek(). The peekAfter 
 returns the task after the 'last element'. Maintain this request information 
 and use it for deleting/cleaning up the workQueue.
 * Another (almost duplicate) queue: While offering tasks to workQueue, also 
 offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
 soon as a task from this is picked up for processing by the thread, it's 
 removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6010) Wrong highlighting while querying by date range with wild card in the end range

2014-04-24 Thread Mohammad Abul Khaer (JIRA)
Mohammad Abul Khaer created SOLR-6010:
-

 Summary: Wrong highlighting while querying by date range with wild 
card in the end range
 Key: SOLR-6010
 URL: https://issues.apache.org/jira/browse/SOLR-6010
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 4.0
 Environment: java version 1.7.0_45
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 
x86_64 x86_64 GNU/Linux
Reporter: Mohammad Abul Khaer


Solr is returning wrong highlights when I have a date range query with wild 
card *in the end range*. For example my query *q* is

{noformat}
(story)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO *]
{noformat}

In the above query activatedate, expiredate are date fields. Their definition 
in schema file is as follows

{code}
field name=activatedate type=date indexed=true stored=false
   omitNorms=true/
field name=expiredate type=date indexed=true stored=false
   omitNorms=true/
{code}

In the query result I am getting wrong highlighting information. Only 
highlighting result is show below

{code}
 highlighting: {
article:3605: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em.
  ]
},
article:3604: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em..
  ]
}
}
{code}

It should highlight only *story* word but it is highlighting lot other words 
also. What I noticed that this happens only if I have a wildcard * in the end 
range. If I change the above query and set a fixed date in the end range 
instead of * then solr return correct highlights. Modified query is shown below 
- 

{noformat}
(story)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO 3014-04-24T09:55:00Z]
{noformat}

I guess its a bug in SOLR. If I use filter query *fq* instead of normal query 
*q* then highlighting result is OK for both queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6010) Wrong highlighting while querying by date range with wild card in the end range

2014-04-24 Thread Mohammad Abul Khaer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Abul Khaer updated SOLR-6010:
--

Description: 
Solr is returning wrong highlights when I have a date range query with wild 
card *in the end range*. For example my query *q* is

{noformat}
(porta)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO *]
{noformat}

In the above query activatedate, expiredate are date fields. Their definition 
in schema file is as follows

{code}
field name=activatedate type=date indexed=true stored=false
   omitNorms=true/
field name=expiredate type=date indexed=true stored=false
   omitNorms=true/
{code}

In the query result I am getting wrong highlighting information. Only 
highlighting result is show below

{code}
 highlighting: {
article:3605: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em.
  ]
},
article:3604: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em..
  ]
}
}
{code}

It should highlight only *story* word but it is highlighting lot other words 
also. What I noticed that this happens only if I have a wildcard * in the end 
range. If I change the above query and set a fixed date in the end range 
instead of * then solr return correct highlights. Modified query is shown below 
- 

{noformat}
(porta)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO 3014-04-24T09:55:00Z]
{noformat}

I guess its a bug in SOLR. If I use filter query *fq* instead of normal query 
*q* then highlighting result is OK for both queries.

  was:
Solr is returning wrong highlights when I have a date range query with wild 
card *in the end range*. For example my query *q* is

{noformat}
(story)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO *]
{noformat}

In the above query activatedate, expiredate are date fields. Their definition 
in schema file is as follows

{code}
field name=activatedate type=date indexed=true stored=false
   omitNorms=true/
field name=expiredate type=date indexed=true stored=false
   omitNorms=true/
{code}

In the query result I am getting wrong highlighting information. Only 
highlighting result is show below

{code}
 highlighting: {
article:3605: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em.
  ]
},
article:3604: {
  title: [
The emcreative/em emheadline/em of this emstory/em 
emreally/em emsays/em it emall/em
  ],
  summary: [
emEtiam/em emporta/em emsem/em emmalesuada/em 
emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
emleo/em emquam/em. emPellentesque/em emornare/em emsem/em 
emlacinia/em emquam/em..
  ]
}
}
{code}

It should highlight only *story* word but it is highlighting lot other words 
also. What I noticed that this happens only if I have a wildcard * in the end 
range. If I change the above query and set a fixed date in the end range 
instead of * then solr return correct highlights. Modified query is shown below 
- 

{noformat}
(story)+activatedate:[* TO 
2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO 3014-04-24T09:55:00Z]
{noformat}

I guess its a bug in SOLR. If I use filter query *fq* instead of normal query 
*q* then highlighting result is OK for both queries.


 Wrong highlighting while querying by date range with wild card in the end 
 range
 ---

 Key: SOLR-6010
 URL: https://issues.apache.org/jira/browse/SOLR-6010
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 4.0
 Environment: java version 1.7.0_45
 Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
 Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 
 x86_64 x86_64 GNU/Linux
Reporter: Mohammad Abul Khaer
  Labels: date, highlighting, range, solr

 Solr is returning wrong 

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979630#comment-13979630
 ] 

Mark Miller commented on SOLR-5473:
---

bq. The latest patch is not a final one .

I have to stick with my veto - the API changes are way to crazy. There is no 
quick fix here. I'm going to insist on my code veto and that this gets on track 
in a branch.
 
bq. The objective of that patch was to get the naming right .

But it was barely even a start. However, the names are not even the problem, 
which is why these needs way more work. Even if we rename the methods, it's 
still all super crazy compared to what we have and straddling two worlds in a 
way that both are ugly and the combination is just whacked. I realize this was 
done because doing it where we keep sensible API's is harder given what you 
would like to do, but that's not a good enough reason.

I'm also not sold on the new watch approach. I am sold on splitting up 
clusterstate.json, but you have tied a lot of other stuff into this commit that 
is much more controversial.

I'm sticking to my code veto and this should be reverted and moved to a branch.

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
 ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Mark Miller
+1.

SUCCESS! [0:37:04.608776]

-- 
Mark Miller
about.me/markrmiller

On April 22, 2014 at 2:47:50 PM, Uwe Schindler (u...@thetaphi.de) wrote:

Hi,  

I prepared the first release candidate of Lucene and Solr 4.8.0. The artifacts 
can be found here:  

= 
http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/
  

It took a bit longer, because we had to fix some remaining bugs regarding 
NativeFSLockFactory, which did not work correctly and leaked file handles. I 
also updated the instructions about the preferred Java update versions. See 
also Mike's blog post: 
http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/  

Please check the artifacts and give your vote in the next 72 hrs.  

My +1 will hopefully come a little bit later because Solr tests are failing 
constantly on my release build and smoke tester machine. The reason: it seems 
to be lack of file handles. A standard Ubuntu configuration has 1024 file 
handles and I want a release to pass with that common default configuration. 
Instead, org.apache.solr.cloud.TestMiniSolrCloudCluster.testBasics fails always 
with crazy error messages (not about too less file handles, more that Jetty 
cannot start up or not bind ports or various other stuff). This did not happen 
on smoking 4.7.x releases.  

I will run now the smoker again without HDFS (via build.properties) and if that 
also fails then once again with more file handles. But we really have to fix 
our tests that they succeed with the default config of 1024 file handles. We 
can configure that in Jenkins (so the Jenkins job first sets and then runs ANT 
ulimit -n 1024). But this should not block the release, I just say: I gave 
up running those Solr tests, sorry! Anybody else can test that stuff!  

Uwe  

P.S.: Here's my smoker command line:  
$ JAVA_HOME=$HOME/jdk1.7.0_55 JAVA7_HOME=$HOME/jdk1.7.0_55 python3.2 -u 
smokeTestRelease.py ' 
http://people.apache.org/~uschindler/staging_area/lucene-solr-4.8.0-RC1-rev1589150/'
 1589150 4.8.0 tmp  

-  
Uwe Schindler  
H.-H.-Meier-Allee 63, D-28213 Bremen  
http://www.thetaphi.de  
eMail: u...@thetaphi.de  




-  
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org  
For additional commands, e-mail: dev-h...@lucene.apache.org  



[jira] [Commented] (LUCENE-5610) Add Terms min/max

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979743#comment-13979743
 ] 

ASF subversion and git services commented on LUCENE-5610:
-

Commit 1589729 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1589729 ]

LUCENE-5610: add Terms.getMin/Max

 Add Terms min/max 
 --

 Key: LUCENE-5610
 URL: https://issues.apache.org/jira/browse/LUCENE-5610
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5610.patch, LUCENE-5610.patch, LUCENE-5610.patch, 
 LUCENE-5610.patch, LUCENE-5610.patch


 Having upper/lower bounds on terms could be useful for various optimizations 
 in the future, e.g. to accelerate sorting (if a segment can't compete, don't 
 even search it), and so on.
 Its pretty obvious how to get the smallest term, but the maximum term for a 
 field is tricky, but worst case you can do it in ~ log(N) time by binary 
 searching term space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6010) Wrong highlighting while querying by date range with wild card in the end range

2014-04-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979749#comment-13979749
 ] 

Ahmet Arslan commented on SOLR-6010:


Cant you set {{hl.requireFieldMatch}} to true?

 Wrong highlighting while querying by date range with wild card in the end 
 range
 ---

 Key: SOLR-6010
 URL: https://issues.apache.org/jira/browse/SOLR-6010
 Project: Solr
  Issue Type: Bug
  Components: highlighter, query parsers
Affects Versions: 4.0
 Environment: java version 1.7.0_45
 Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
 Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 
 x86_64 x86_64 GNU/Linux
Reporter: Mohammad Abul Khaer
  Labels: date, highlighting, range, solr

 Solr is returning wrong highlights when I have a date range query with wild 
 card *in the end range*. For example my query *q* is
 {noformat}
 (porta)+activatedate:[* TO 
 2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO *]
 {noformat}
 In the above query activatedate, expiredate are date fields. Their definition 
 in schema file is as follows
 {code}
 field name=activatedate type=date indexed=true stored=false
omitNorms=true/
 field name=expiredate type=date indexed=true stored=false
omitNorms=true/
 {code}
 In the query result I am getting wrong highlighting information. Only 
 highlighting result is show below
 {code}
  highlighting: {
 article:3605: {
   title: [
 The emcreative/em emheadline/em of this emstory/em 
 emreally/em emsays/em it emall/em
   ],
   summary: [
 emEtiam/em emporta/em emsem/em emmalesuada/em 
 emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
 emleo/em emquam/em. emPellentesque/em emornare/em 
 emsem/em emlacinia/em emquam/em.
   ]
 },
 article:3604: {
   title: [
 The emcreative/em emheadline/em of this emstory/em 
 emreally/em emsays/em it emall/em
   ],
   summary: [
 emEtiam/em emporta/em emsem/em emmalesuada/em 
 emmagna/em emmollis/em emeuismod/em emaenean/em emeu/em 
 emleo/em emquam/em. emPellentesque/em emornare/em 
 emsem/em emlacinia/em emquam/em..
   ]
 }
 }
 {code}
 It should highlight only *story* word but it is highlighting lot other words 
 also. What I noticed that this happens only if I have a wildcard * in the end 
 range. If I change the above query and set a fixed date in the end range 
 instead of * then solr return correct highlights. Modified query is shown 
 below - 
 {noformat}
 (porta)+activatedate:[* TO 
 2014-04-24T09:55:00Z]+expiredate:[2014-04-24T09:55:00Z TO 
 3014-04-24T09:55:00Z]
 {noformat}
 I guess its a bug in SOLR. If I use filter query *fq* instead of normal query 
 *q* then highlighting result is OK for both queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5610) Add Terms min/max

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979788#comment-13979788
 ] 

ASF subversion and git services commented on LUCENE-5610:
-

Commit 1589749 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1589749 ]

LUCENE-5610: add Terms.getMin/Max

 Add Terms min/max 
 --

 Key: LUCENE-5610
 URL: https://issues.apache.org/jira/browse/LUCENE-5610
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5610.patch, LUCENE-5610.patch, LUCENE-5610.patch, 
 LUCENE-5610.patch, LUCENE-5610.patch


 Having upper/lower bounds on terms could be useful for various optimizations 
 in the future, e.g. to accelerate sorting (if a segment can't compete, don't 
 even search it), and so on.
 Its pretty obvious how to get the smallest term, but the maximum term for a 
 field is tricky, but worst case you can do it in ~ log(N) time by binary 
 searching term space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5610) Add Terms min/max

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979792#comment-13979792
 ] 

ASF subversion and git services commented on LUCENE-5610:
-

Commit 1589752 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1589752 ]

LUCENE-5610: improve CheckIndex checking; javadocs

 Add Terms min/max 
 --

 Key: LUCENE-5610
 URL: https://issues.apache.org/jira/browse/LUCENE-5610
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5610.patch, LUCENE-5610.patch, LUCENE-5610.patch, 
 LUCENE-5610.patch, LUCENE-5610.patch


 Having upper/lower bounds on terms could be useful for various optimizations 
 in the future, e.g. to accelerate sorting (if a segment can't compete, don't 
 even search it), and so on.
 Its pretty obvious how to get the smallest term, but the maximum term for a 
 field is tricky, but worst case you can do it in ~ log(N) time by binary 
 searching term space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5610) Add Terms min/max

2014-04-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5610.


   Resolution: Fixed
Fix Version/s: 5.0
   4.9

 Add Terms min/max 
 --

 Key: LUCENE-5610
 URL: https://issues.apache.org/jira/browse/LUCENE-5610
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5610.patch, LUCENE-5610.patch, LUCENE-5610.patch, 
 LUCENE-5610.patch, LUCENE-5610.patch


 Having upper/lower bounds on terms could be useful for various optimizations 
 in the future, e.g. to accelerate sorting (if a segment can't compete, don't 
 even search it), and so on.
 Its pretty obvious how to get the smallest term, but the maximum term for a 
 field is tricky, but worst case you can do it in ~ log(N) time by binary 
 searching term space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.8-Linux (64bit/jdk1.7.0_60-ea-b14) - Build # 57 - Still Failing!

2014-04-24 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.8-Linux/57/
Java: 64bit/jdk1.7.0_60-ea-b14 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch

Error Message:
Timeout occured while waiting response from server at: https://127.0.0.1:57149

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting 
response from server at: https://127.0.0.1:57149
at 
__randomizedtesting.SeedInfo.seed([4D372A03589DF004:CCD1A41B2FC29038]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:562)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at 
org.apache.solr.BaseDistributedSearchTestCase.indexDoc(BaseDistributedSearchTestCase.java:440)
at 
org.apache.solr.BaseDistributedSearchTestCase.index(BaseDistributedSearchTestCase.java:429)
at 
org.apache.solr.handler.clustering.DistributedClusteringComponentTest.doTest(DistributedClusteringComponentTest.java:36)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:871)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 

[jira] [Updated] (LUCENE-5628) SpecialOperations.getFiniteStrings should not recurse

2014-04-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5628:
---

Attachment: LUCENE-5628.patch

New patch, with some simplification: I moved all the hairy logic about next 
label/transition into the PathNode.  I think this helps.

I put a nocommit to use Stack instead of PathNode[] ... this would be simpler 
(push/pop instead of .get/.remove) ... the only backside is it would mean new 
Java object on each push vs now where it re-uses.

 SpecialOperations.getFiniteStrings should not recurse
 -

 Key: LUCENE-5628
 URL: https://issues.apache.org/jira/browse/LUCENE-5628
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5628.patch, LUCENE-5628.patch, LUCENE-5628.patch


 Today it consumes one Java stack frame per transition, which when used by 
 AnalyzingSuggester is per character in each token.  This can lead to stack 
 overflows if you have a long suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5611) Simplify the default indexing chain

2014-04-24 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5611:
---

Attachment: LUCENE-5611.patch

New patch, I think it's ready.  I fixed all nocommits, javadocs.

I removed the specialization for String/NumericField; these gave decent 
performance gains, but we should pursue it separately.

 Simplify the default indexing chain
 ---

 Key: LUCENE-5611
 URL: https://issues.apache.org/jira/browse/LUCENE-5611
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5611.patch, LUCENE-5611.patch


 I think Lucene's current indexing chain has too many classes /
 hierarchy / abstractions, making it look much more complex than it
 really should be, and discouraging users from experimenting/innovating
 with their own indexing chains.
 Also, if it were easier to understand/approach, then new developers
 would more likely try to improve it ... it really should be simpler.
 So I'm exploring a pared back indexing chain, and have a starting patch
 that I think is looking ok: it seems more approachable than the
 current indexing chain, or at least has fewer strange classes.
 I also thought this could give some speedup for tiny documents (a more
 common use of Lucene lately), and it looks like, with the evil
 optimizations, this is a ~25% speedup for Geonames docs.  Even without
 those evil optos it's a bit faster.
 This is very much a work in progress / nocommits, and there are some
 behavior changes e.g. the new chain requires all fields to have the
 same TV options (rather than auto-upgrading all fields by the same
 name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-24 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5681:
---

Attachment: SOLR-5681.patch

Another patch with a few things fixed, got rid off a bit of the hard coded 
logic and a multi threading probable race-condition.
Also, the main thread loop now continues if there's nothing new in the 
work-queue.

 Make the OverseerCollectionProcessor multi-threaded
 ---

 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Attachments: SOLR-5681.patch, SOLR-5681.patch


 Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
 anything long running would have it block processing of other mutually 
 exclusive tasks.
 When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
 truly non-blocking behavior by multi-threading the OCP itself.
 For example, a ShardSplit call on Collection1 would block the thread and 
 thereby, not processing a create collection task (which would stay queued in 
 zk) though both the tasks are mutually exclusive.
 Here are a few of the challenges:
 * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
 easy way to handle that is to only let 1 task per collection run at a time.
 * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
 The task from the workQueue is only removed on completion so that in case of 
 a failure, the new Overseer can re-consume the same task and retry. A queue 
 is not the right data structure in the first place to look ahead i.e. get the 
 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
 which are not at the head of a queue is not really an 'intuitive' thing.
 Proposed solutions for task management:
 * Task funnel and peekAfter(): The parent thread is responsible for getting 
 and passing the request to a new thread (or one from the pool). The parent 
 method uses a peekAfter(last element) instead of a peek(). The peekAfter 
 returns the task after the 'last element'. Maintain this request information 
 and use it for deleting/cleaning up the workQueue.
 * Another (almost duplicate) queue: While offering tasks to workQueue, also 
 offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
 soon as a task from this is picked up for processing by the thread, it's 
 removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 19681 - Failure!

2014-04-24 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/19681/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([3F2528AD7379C20F:7356A7D91FE5196A]:0)
at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:563)
at 
org.apache.lucene.codecs.lucene3x.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:187)
at 
org.apache.lucene.codecs.lucene3x.TermInfosWriter.add(TermInfosWriter.java:217)
at 
org.apache.lucene.codecs.lucene3x.PreFlexRWFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexRWFieldsWriter.java:209)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:548)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:518)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:628)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2942)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3101)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3068)
at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:320)
at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:257)
at 
org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 

[jira] [Created] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Yonik Seeley (JIRA)
Yonik Seeley created SOLR-6011:
--

 Summary: inOrder does not work with the complexphrase parser
 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical


{code}
{!complexphrase}vol* high*
does not match the Solr document containing ... high volume web ... (this is 
correct)

But adding inOrder=false still fails to make it match.
{!complexphrase inOrder=false}vol* high*
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5629) Comparing the Version of Lucene , the Analyzer and the similarity function that are being used for indexing and searching.

2014-04-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979875#comment-13979875
 ] 

Ahmet Arslan commented on LUCENE-5629:
--

bq. The same problem exists for the Analyzer that is used.
Can't we use different analyzers for indexing and searching? e.g. 
WordDelimiterFilter, SynonymFilter, NGramFilter, etc. 

 Comparing the Version of Lucene , the Analyzer and the similarity function 
 that are being used for indexing and searching.
 --

 Key: LUCENE-5629
 URL: https://issues.apache.org/jira/browse/LUCENE-5629
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index, core/queryparser, core/search
 Environment: Operating system : Windows 8.1
 Software platform : Eclipse Kepler 4.3.2
Reporter: Isabel Mendonca
Priority: Minor
  Labels: features, patch
 Fix For: 4.8, 4.9, 5.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 We have observed that Lucene does not check if the same Similarity function 
 is used during indexing and searching. The same problem exists for the 
 Analyzer that is used. This may lead to poor or misleading results.
 So we decided to create an xml file during indexing that will store 
 information such as the Analyzer and the Similarity function that were used 
 as well as the version of Lucene that was used. This xml file will always be 
 available to the users.
 At search time , we will retrieve this information using SAX parsing and 
 check if the utils used for searching , match those used for indexing. If not 
 , a warning message will be displayed to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 19681 - Failure!

2014-04-24 Thread Michael McCandless
I'll dig.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 24, 2014 at 11:51 AM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/19681/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom

 Error Message:


 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([3F2528AD7379C20F:7356A7D91FE5196A]:0)
 at 
 org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:563)
 at 
 org.apache.lucene.codecs.lucene3x.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:187)
 at 
 org.apache.lucene.codecs.lucene3x.TermInfosWriter.add(TermInfosWriter.java:217)
 at 
 org.apache.lucene.codecs.lucene3x.PreFlexRWFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexRWFieldsWriter.java:209)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:548)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:518)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:628)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2942)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3101)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3068)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:320)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:257)
 at 
 org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:84)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 

[jira] [Closed] (LUCENE-5629) Comparing the Version of Lucene , the Analyzer and the similarity function that are being used for indexing and searching.

2014-04-24 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson closed LUCENE-5629.
--

Resolution: Not a Problem

Closing, if you still think this is a problem we can re-open.

Allowing different analyzers at index and query time is a deliberate decision. 
Otherwise all the effort that went in to allowing independent index and query 
analysis chains could have been avoided. In particular synonyms are often 
defined at index time but not at query time.

 Comparing the Version of Lucene , the Analyzer and the similarity function 
 that are being used for indexing and searching.
 --

 Key: LUCENE-5629
 URL: https://issues.apache.org/jira/browse/LUCENE-5629
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index, core/queryparser, core/search
 Environment: Operating system : Windows 8.1
 Software platform : Eclipse Kepler 4.3.2
Reporter: Isabel Mendonca
Priority: Minor
  Labels: features, patch
 Fix For: 4.8, 4.9, 5.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 We have observed that Lucene does not check if the same Similarity function 
 is used during indexing and searching. The same problem exists for the 
 Analyzer that is used. This may lead to poor or misleading results.
 So we decided to create an xml file during indexing that will store 
 information such as the Analyzer and the Similarity function that were used 
 as well as the version of Lucene that was used. This xml file will always be 
 available to the users.
 At search time , we will retrieve this information using SAX parsing and 
 check if the utils used for searching , match those used for indexing. If not 
 , a warning message will be displayed to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5610) Add Terms min/max

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979885#comment-13979885
 ] 

ASF subversion and git services commented on LUCENE-5610:
-

Commit 1589782 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1589782 ]

LUCENE-5610: don't use Lucene3x codec (the test writes arbitrary binary terms)

 Add Terms min/max 
 --

 Key: LUCENE-5610
 URL: https://issues.apache.org/jira/browse/LUCENE-5610
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5610.patch, LUCENE-5610.patch, LUCENE-5610.patch, 
 LUCENE-5610.patch, LUCENE-5610.patch


 Having upper/lower bounds on terms could be useful for various optimizations 
 in the future, e.g. to accelerate sorting (if a segment can't compete, don't 
 even search it), and so on.
 Its pretty obvious how to get the smallest term, but the maximum term for a 
 field is tricky, but worst case you can do it in ~ log(N) time by binary 
 searching term space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 19681 - Failure!

2014-04-24 Thread Michael McCandless
I committed a fix, disabling Lucene3x for that test case.  I didn't
use @SuppressCodecs because the other tests should work fine w/
Lucene3x.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 24, 2014 at 11:57 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 I'll dig.

 Mike McCandless

 http://blog.mikemccandless.com


 On Thu, Apr 24, 2014 at 11:51 AM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/19681/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestTerms.testTermMinMaxRandom

 Error Message:


 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([3F2528AD7379C20F:7356A7D91FE5196A]:0)
 at 
 org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:563)
 at 
 org.apache.lucene.codecs.lucene3x.TermInfosWriter.compareToLastTerm(TermInfosWriter.java:187)
 at 
 org.apache.lucene.codecs.lucene3x.TermInfosWriter.add(TermInfosWriter.java:217)
 at 
 org.apache.lucene.codecs.lucene3x.PreFlexRWFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexRWFieldsWriter.java:209)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:548)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:518)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:628)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2942)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3101)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3068)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:320)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:257)
 at 
 org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:84)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Brett Lucey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979898#comment-13979898
 ] 

Brett Lucey commented on SOLR-2894:
---

Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount if the facet limit is set to something other than -1.  
In your case, this would be 10 for the top level pivot.  We know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Brett Lucey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979898#comment-13979898
 ] 

Brett Lucey edited comment on SOLR-2894 at 4/24/14 4:20 PM:


Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount to 0 if the facet limit is set to something other than 
-1.  If the facet limit is -1, we lower the mincount to 1.  In your case, this 
would the limit would be 10 for the top level pivot, so we know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  


was (Author: brett.lucey):
Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount if the facet limit is set to something other than -1.  
In your case, this would be 10 for the top level pivot.  We know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 

RE: svn commit: r1589782 - /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestTerms.java

2014-04-24 Thread Uwe Schindler
Why not use the @SuppressCodes annotation?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
 Sent: Thursday, April 24, 2014 6:08 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1589782 -
 /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/
 index/TestTerms.java
 
 Author: mikemccand
 Date: Thu Apr 24 16:07:30 2014
 New Revision: 1589782
 
 URL: http://svn.apache.org/r1589782
 Log:
 LUCENE-5610: don't use Lucene3x codec (the test writes arbitrary binary
 terms)
 
 Modified:
 
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 ndex/TestTerms.java
 
 Modified:
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 ndex/TestTerms.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/cor
 e/src/test/org/apache/lucene/index/TestTerms.java?rev=1589782r1=1589
 781r2=1589782view=diff
 ==
 
 ---
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 ndex/TestTerms.java (original)
 +++
 lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene
 +++ /index/TestTerms.java Thu Apr 24 16:07:30 2014
 @@ -20,6 +20,8 @@ package org.apache.lucene.index;  import java.util.*;
 
  import org.apache.lucene.analysis.CannedBinaryTokenStream;
 +import org.apache.lucene.codecs.Codec;
 +import org.apache.lucene.codecs.lucene3x.Lucene3xCodec;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.DoubleField;
  import org.apache.lucene.document.Field; @@ -51,6 +53,7 @@ public class
 TestTerms extends LuceneTes
}
 
public void testTermMinMaxRandom() throws Exception {
 +assumeFalse(test writes binary terms, Codec.getDefault()
 + instanceof Lucene3xCodec);
  Directory dir = newDirectory();
  RandomIndexWriter w = new RandomIndexWriter(random(), dir);
  int numDocs = atLeast(100);



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2014-04-24 Thread Brett Lucey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979898#comment-13979898
 ] 

Brett Lucey edited comment on SOLR-2894 at 4/24/14 4:21 PM:


Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount to 0 if the facet limit is set to something other than 
-1.  If the facet limit is -1, we lower the mincount to 1.  In your case, this 
would the limit would be 10 for the top level pivot, so we know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up memory.  


was (Author: brett.lucey):
Andrew actually raised that question to me yesterday as well and I spent a 
little bit of time looking into it.  For the initial request to a shard, we 
only lower the mincount to 0 if the facet limit is set to something other than 
-1.  If the facet limit is -1, we lower the mincount to 1.  In your case, this 
would the limit would be 10 for the top level pivot, so we know we will (at 
most) get back 15 terms from each shard in this case.  Because we are only 
faceting on a limited number of terms, having a mincount of 0 here provides us 
the benefit of potentially avoiding refinement.  In refinement requests, we 
still need to know when a shard has responded to us with it's count for a term, 
so the mincount is -1 in that case because we are interested in the term even 
if the count is zero.  It allows us to mark the shard as having responded and 
continue on.  It's possible that we might be able to change this, but at the 
point of refinement, it's a rather targeted request so I don't expect there to 
be a significant benefit to doing so.  In your case, with the facet limit being 
-1 on f2-f5, no refinement would be performed anyway.

When we designed this implementation, the most important factor for us was 
speed, and we were willing to get it at a cost of memory.  By making these 
changes, we reduced queries which previously took around 70 seconds for us down 
to around 600 milliseconds.  I suspect that the biggest factor in the poor 
memory utilization is the wide open nature of using a facet.limit of -1, 
especially on a pivot so deep.  Keep in mind that for each level of depth you 
add to a pivot, memory and time required will grow exponentially.

Don't forget that if you are querying a node and all of the shards are located 
within the same Java VM, you are incurring the memory cost of both shards plus 
the node responding to the user query all within the same heap.

I took a quick look at the code today while waiting for some other processes to 
finish, and I don't see any obvious low hanging fruit to free up a small amount 
of memory.  

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 

RE: svn commit: r1589782 - /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/inde x/TestTerms.java

2014-04-24 Thread Chris Hostetter

: Why not use the @SuppressCodes annotation?

mike mentioned this in his reply to the jenkins failure...

 I committed a fix, disabling Lucene3x for that test case.  I didn't
 use @SuppressCodecs because the other tests should work fine w/
 Lucene3x.

..perhaps putting this in the test as a comment owuld be useful?


: 
: -
: Uwe Schindler
: H.-H.-Meier-Allee 63, D-28213 Bremen
: http://www.thetaphi.de
: eMail: u...@thetaphi.de
: 
: 
:  -Original Message-
:  From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
:  Sent: Thursday, April 24, 2014 6:08 PM
:  To: comm...@lucene.apache.org
:  Subject: svn commit: r1589782 -
:  /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/
:  index/TestTerms.java
:  
:  Author: mikemccand
:  Date: Thu Apr 24 16:07:30 2014
:  New Revision: 1589782
:  
:  URL: http://svn.apache.org/r1589782
:  Log:
:  LUCENE-5610: don't use Lucene3x codec (the test writes arbitrary binary
:  terms)
:  
:  Modified:
:  
:  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
:  ndex/TestTerms.java
:  
:  Modified:
:  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
:  ndex/TestTerms.java
:  URL:
:  http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/cor
:  e/src/test/org/apache/lucene/index/TestTerms.java?rev=1589782r1=1589
:  781r2=1589782view=diff
:  ==
:  
:  ---
:  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
:  ndex/TestTerms.java (original)
:  +++
:  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene
:  +++ /index/TestTerms.java Thu Apr 24 16:07:30 2014
:  @@ -20,6 +20,8 @@ package org.apache.lucene.index;  import java.util.*;
:  
:   import org.apache.lucene.analysis.CannedBinaryTokenStream;
:  +import org.apache.lucene.codecs.Codec;
:  +import org.apache.lucene.codecs.lucene3x.Lucene3xCodec;
:   import org.apache.lucene.document.Document;
:   import org.apache.lucene.document.DoubleField;
:   import org.apache.lucene.document.Field; @@ -51,6 +53,7 @@ public class
:  TestTerms extends LuceneTes
: }
:  
: public void testTermMinMaxRandom() throws Exception {
:  +assumeFalse(test writes binary terms, Codec.getDefault()
:  + instanceof Lucene3xCodec);
:   Directory dir = newDirectory();
:   RandomIndexWriter w = new RandomIndexWriter(random(), dir);
:   int numDocs = atLeast(100);
: 
: 
: 
: -
: To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1589782 - /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/inde x/TestTerms.java

2014-04-24 Thread Michael McCandless
Yeah, I didn't want to disable the full test, just that one method,
because I want Terms.getMin/Max testing for Lucene3x too.

Would be nice if we could @SuppressCodecs for just one method ...

I'll add a comment.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Apr 24, 2014 at 12:26 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Why not use the @SuppressCodes annotation?

 mike mentioned this in his reply to the jenkins failure...

 I committed a fix, disabling Lucene3x for that test case.  I didn't
 use @SuppressCodecs because the other tests should work fine w/
 Lucene3x.

 ..perhaps putting this in the test as a comment owuld be useful?


 :
 : -
 : Uwe Schindler
 : H.-H.-Meier-Allee 63, D-28213 Bremen
 : http://www.thetaphi.de
 : eMail: u...@thetaphi.de
 :
 :
 :  -Original Message-
 :  From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
 :  Sent: Thursday, April 24, 2014 6:08 PM
 :  To: comm...@lucene.apache.org
 :  Subject: svn commit: r1589782 -
 :  /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/
 :  index/TestTerms.java
 : 
 :  Author: mikemccand
 :  Date: Thu Apr 24 16:07:30 2014
 :  New Revision: 1589782
 : 
 :  URL: http://svn.apache.org/r1589782
 :  Log:
 :  LUCENE-5610: don't use Lucene3x codec (the test writes arbitrary binary
 :  terms)
 : 
 :  Modified:
 : 
 :  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 :  ndex/TestTerms.java
 : 
 :  Modified:
 :  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 :  ndex/TestTerms.java
 :  URL:
 :  http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/cor
 :  e/src/test/org/apache/lucene/index/TestTerms.java?rev=1589782r1=1589
 :  781r2=1589782view=diff
 :  ==
 :  
 :  ---
 :  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/i
 :  ndex/TestTerms.java (original)
 :  +++
 :  lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene
 :  +++ /index/TestTerms.java Thu Apr 24 16:07:30 2014
 :  @@ -20,6 +20,8 @@ package org.apache.lucene.index;  import java.util.*;
 : 
 :   import org.apache.lucene.analysis.CannedBinaryTokenStream;
 :  +import org.apache.lucene.codecs.Codec;
 :  +import org.apache.lucene.codecs.lucene3x.Lucene3xCodec;
 :   import org.apache.lucene.document.Document;
 :   import org.apache.lucene.document.DoubleField;
 :   import org.apache.lucene.document.Field; @@ -51,6 +53,7 @@ public class
 :  TestTerms extends LuceneTes
 : }
 : 
 : public void testTermMinMaxRandom() throws Exception {
 :  +assumeFalse(test writes binary terms, Codec.getDefault()
 :  + instanceof Lucene3xCodec);
 :   Directory dir = newDirectory();
 :   RandomIndexWriter w = new RandomIndexWriter(random(), dir);
 :   int numDocs = atLeast(100);
 :
 :
 :
 : -
 : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 : For additional commands, e-mail: dev-h...@lucene.apache.org
 :
 :

 -Hoss
 http://www.lucidworks.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6012) Dutch language stemming issues

2014-04-24 Thread Ashokkumar Balasubramanian (JIRA)
Ashokkumar Balasubramanian created SOLR-6012:


 Summary: Dutch language stemming issues
 Key: SOLR-6012
 URL: https://issues.apache.org/jira/browse/SOLR-6012
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.5
 Environment: Linux
Reporter: Ashokkumar Balasubramanian
Priority: Minor


I am trying to search a word in dutch language with the word Brievenbussen. 
Normally this is the proper dutch word and it should result in some matches but 
it results in 0 matches. The dutch word Brievenbusen (Letter 's' is removed) 
returns matches.

The problem is it doesn't take the last word 'bus' vowel character into 
account. If a vowel is found in the last before character (in this case, it is 
'U'), then the proper dutch word should be Brievenbussen.

Can you please confirm if this is a problem with 3.5 version.

Please let me know if you need more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: release votes

2014-04-24 Thread Aric Coady
On Apr 24, 2014, at 8:07 AM, Thomas Koch k...@orbiteam.de wrote:
 Hi Andi,
 
 I don't agree that it is unimportant to make PyLucene releases. Without a
 ready-to-run software package the hurdles to use PyLucene are raised. It is
 already not quite simple (for beginners) to install PyLucene on the various
 platforms. Having a packaged release that is tested by some users provides a
 benefit to the community in my opinion. 

Relatedly, I have a pull request open to add a pylucene formula to homebrew.  
So mac users will be able to simply ‘brew install pylucene’.  That would be 
more consistent with having releases.

 However I can understand your arguments - there has been little feedback on
 your release announcements on the list recently. On the other hand there are
 frequent discussions about PyLucene on the list so I don't think the
 interest has declined. Did you check the number of downloads of the PyLucene
 distributions (if this is possible at all - due to the distributed releases
 on the apache mirrors ...)? This would be a more accurate indicator from my
 point of view.
 
 I must also admit that I did never understand the voting process in detail -
 i.e. who are the PMC members and what impact have  votes of non PMC users.
 Maybe some more transparency and another call for action would help to
 raise awareness in the community. 
 
 Just my thoughts...
 
 
 regards,
 Thomas 
 --
 OrbiTeam Software GmbH  Co. KG
 http://www.orbiteam.de
 
 
 -Ursprüngliche Nachricht-
 Von: Andi Vajda [mailto:va...@apache.org]
 Gesendet: Donnerstag, 24. April 2014 02:28
 An: pylucene-...@lucene.apache.org
 Betreff: release votes
 
 
  Hi all,
 
 Given the tiny amount of interest the pylucene releases create, it's maybe
 become unimportant to actually make PyLucene releases ?
 
 The release votes have had an increasingly difficult time to garner the
 three
 required PMC votes to pass. Non PMC users are also eerily quiet.
 
 Maybe the time has come to switch to a different model:
 
  - when a Lucene release happens, a PyLucene branch gets created with all
the necessary changes to build successfully and pass all tests against
this Lucene release
  - users interested in PyLucene check out that branch
  - done
 
  - no more releases, no more votes
 
 JCC can continue to be released to PyPI independently as it is today.
 That doesn't require any voting anyway (?).
 
 What do readers of this list think ?
 
 Andi..
 



Re: 4.8 Solr Ref Guide Release Plan

2014-04-24 Thread Anshum Gupta
I'm still trying to document some stuff (async core admin calls) but I'm
having issues saving that stuff.
Once I'm able to save that, I should be done with everything that's on my
mind for 4.8 documentation.


On Wed, Apr 23, 2014 at 10:33 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : 2) I'll review the TODO list arround 24 hours after the first
 Lucene/Solr 4.8
 : RC VOTE is called -- if it doesn't look like anyone is in hte middle of
 : working on stuff, I'll go ahead and cut a ref-guide RC.  If it looks like

 FYI: Tim Potter reached out to me that he's working on docing up the REST
 Manager stuff today -- so i'll plan on doing the RC arround 34 hours from
 now.

 If you see any low hanging fruit, jump on it today.


 -Hoss
 http://www.lucidworks.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 

Anshum Gupta
http://www.anshumgupta.net


[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979944#comment-13979944
 ] 

Ahmet Arslan commented on SOLR-6011:


{code}
{!complexphrase df=manu inOrder=true}high* vol*
{code}

is explained as 
{code}
weight(spanNear([spanOr([manu:high]), spanOr([manu:volume])], 0, true) in 31)
{code}

and

{code}
{!complexphrase df=manu inOrder=false}high* vol*
{code}

is explained as 
{code}
weight(spanNear([spanOr([manu:high]), spanOr([manu:volume])], 0, false) in 31)
{code}

It looks like local param {{inOrder}} is correctly propagated to constructor of 
{{SpanNearQuery}}. However both queries return the following example document.
{code:xml}
doc
  field name=id100-435805/field
  field name=manuhigh volume web/field
/doc
{code}

On the other hand
{code}
{!complexphrase df=manu inOrder=true}vol* high* 
{code}

and 
{code}
{!complexphrase df=manu inOrder=true}vol* high* 
{code}
do not return example document. Weird...

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical

 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.8 Solr Ref Guide Release Plan

2014-04-24 Thread Chris Hostetter

: I'm still trying to document some stuff (async core admin calls) but I'm
: having issues saving that stuff.
: Once I'm able to save that, I should be done with everything that's on my
: mind for 4.8 documentation.

Yeah ... cwiki seems to be having some performance issues at the moment, 
so releasing is on hold until it stabalizes and we can finish off some of 
hte final edits and Uwe can update the macro that handles the 4.8 javadoc 
links.


-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-6011:
---

Attachment: SOLR-6011.patch

Here's a test that fails.

I changed the testcase to not use separate schema / solrconfig files (it's 
crazy to add extra files for this).  It was also necessary to switch to a stock 
solrconfig to reproduce the bugs I was seeing.

After a quick look, it looks like hashCode / equals are not implemented 
correctly (they do not take into account inOrder) and hence the query cache can 
return incorrect results.

I'll work on a fix.

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Attachments: SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979969#comment-13979969
 ] 

Noble Paul edited comment on SOLR-5473 at 4/24/14 5:13 PM:
---

hi [~markrmil...@gmail.com] I'm  working on a reverse patch for this, and we 
can move the further development to a dedicated branch 

Meanwhile I would like to know what your concerns are on the following

* The idea of splitting the clusterstate.json itself
* The external interface . The public API, the changes we make to the zk nodes 
(I mean all the public things that impact the user)
* The idea of selectively watching states . And other implementation details, 
Or any other particular solutions which you think are better
* The API's which are 'undesirable' 

I would like to work towards a consensus  and resolve this 


was (Author: noble.paul):
hi [~markrmil...@gmail.com] I'm  work on a reverse patch for this, and we can 
move the further development to a dedicated branch 

Meanwhile I would like to know what your concerns are on the following

* The idea of splitting the clusterstate.json itself
* The external interface . The public API, the changes we make to the zk nodes 
(I mean all the public things that impact the user)
* The idea of selectively watching states . And other implementation details, 
Or any other particular solutions which you think are better
* The API's which are 'undesirable' 

I would like to work towards a consensus  and resolve this 

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
 ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979969#comment-13979969
 ] 

Noble Paul commented on SOLR-5473:
--

hi [~markrmil...@gmail.com] I'm  work on a reverse patch for this, and we can 
move the further development to a dedicated branch 

Meanwhile I would like to know what your concerns are on the following

* The idea of splitting the clusterstate.json itself
* The external interface . The public API, the changes we make to the zk nodes 
(I mean all the public things that impact the user)
* The idea of selectively watching states . And other implementation details, 
Or any other particular solutions which you think are better
* The API's which are 'undesirable' 

I would like to work towards a consensus  and resolve this 

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
 ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979983#comment-13979983
 ] 

Mark Miller commented on SOLR-5473:
---

Thank you.

bq. The idea of splitting the clusterstate.json itself

I have no problem with this.

bq. The external interface

I've already gone into a lot of the points. We can dig more into what is not 
clear from the above.

bq. The idea of selectively watching states 

The is also probably fine, though I'm not sure it's right to change it by 
default, and I'm not sure it should be so tied into The idea of splitting the 
clusterstate.json itself. Breaking things into parts makes it easier to digest 
and build and document properly.

bq. The API's which are 'undesirable'

I go into that above - again, I can answer specific questions. Look at all the 
get collection methods - look at all the crazy different behaviors depending on 
what you call - look at the lack of documentation. Future developers will be 
hopelessly lost. Anyway, I've brought up enough issues above to get started on 
understanding what some of the current problems are. If you look at the API's 
now, you can see it's just a mess. It all seems to work okay, and that is good, 
but it needs to be done thoughtfully as well, and I don't think anyone can 
easily deal with API's as they are.

bq. I would like to work towards a consensus and resolve this

I'm sure that can be done - I do think there is a lot to do and it's too core 
to rush it in.

I think a good approach would be too break it up and do things in discrete 
parts - eg splitting up clusterstate.json seems independent of a lot of these 
other changes. That part is not the most important part though - mostly, we 
have to get to some well documented API's that make sense - especially on 5x 
where we don't even necessarily have back compat concerns. 

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
 ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-6011:
---

Attachment: SOLR-6011.patch

OK, here's an updated patch that fixes the issue.

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Yonik Seeley
I ran into a very serious bug during my manual testing of 4.8 that I
think warrants a respin.
https://issues.apache.org/jira/browse/SOLR-6011

In a normal Solr setup, incorrect results are returned from complex
phrase queries if inOrder is ever changed for the same query.  This
would be maddening for most users to try and track down.

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5819) Investigate reduce size of ref-guide PDF

2014-04-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980009#comment-13980009
 ] 

Hoss Man commented on SOLR-5819:


Since the CWIKI upgrade still hasn't happened, i used rmuir's iText as the 
basis for a new micro-project on github...

https://github.com/hossman/pdf-shrinker

...people building the ref guide can manually run this to reduce the PDF size 
until the CWIKI upgrade is complete.

 Investigate  reduce size of ref-guide PDF
 --

 Key: SOLR-5819
 URL: https://issues.apache.org/jira/browse/SOLR-5819
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: img-0007.png, img-0008.png, img-0009.png, img-0010.png, 
 img-0011.png, img-0012.png, img-0013.png, img-0014.png


 As noted on the solr-user mailing list in response to the ANNOUNCE about the 
 4.7 ref guide, the size of the 4.4, 4.5  4.6 PDF files were all under 5MB, 
 but the 4.7 PDF was 30MB.
 opening this issue to track trying to reduce this



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5819) Investigate reduce size of ref-guide PDF

2014-04-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5819:
---

Description: 
As noted on the solr-user mailing list in response to the ANNOUNCE about the 
4.7 ref guide, the size of the 4.4, 4.5  4.6 PDF files were all under 5MB, but 
the 4.7 PDF was 30MB.

We've determine that the root cause is a bug in confluence 5.0 (related to 
duplicating images) that is fixed in 5.4.3 -- the next version Infra currently 
plans to upgrade to.

Until such time as the upgrade is finished, a work arround is to use a manual 
pdf shrinking tool such as this one to eliminate the duplication...

https://github.com/hossman/pdf-shrinker

  was:
As noted on the solr-user mailing list in response to the ANNOUNCE about the 
4.7 ref guide, the size of the 4.4, 4.5  4.6 PDF files were all under 5MB, but 
the 4.7 PDF was 30MB.

opening this issue to track trying to reduce this


 Investigate  reduce size of ref-guide PDF
 --

 Key: SOLR-5819
 URL: https://issues.apache.org/jira/browse/SOLR-5819
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: img-0007.png, img-0008.png, img-0009.png, img-0010.png, 
 img-0011.png, img-0012.png, img-0013.png, img-0014.png


 As noted on the solr-user mailing list in response to the ANNOUNCE about the 
 4.7 ref guide, the size of the 4.4, 4.5  4.6 PDF files were all under 5MB, 
 but the 4.7 PDF was 30MB.
 We've determine that the root cause is a bug in confluence 5.0 (related to 
 duplicating images) that is fixed in 5.4.3 -- the next version Infra 
 currently plans to upgrade to.
 Until such time as the upgrade is finished, a work arround is to use a manual 
 pdf shrinking tool such as this one to eliminate the duplication...
 https://github.com/hossman/pdf-shrinker



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980013#comment-13980013
 ] 

Noble Paul commented on SOLR-5473:
--

bq.The external interface
bq. I've already gone into a lot of the points. We can dig more into what is 
not clear from the above.

stateFormat attribute in collection, state node have the .json suffix. Please 
add if missed anything

bq.The idea of selectively watching states
bq.The is also probably fine, though I'm not sure it's right to change it by 
default, and I'm not sure it should be so tied into The idea of splitting the 
clusterstate.json itself. Breaking things into parts makes it easier to digest 
and build and document properly.

I'm not sure if it is possible to split them completely . The moment I split 
the states my choices are 
# all nodes watch all the collections
# nodes selectively watch collections 
# nodes watch no collections and read them all real time

One or more of the three solutions  needs to be built new into the system  and 
I have only added the 2nd because I thought only that would be useful. Do you 
think the other solutions are worthwhile to build or can you think of a better 
solution I probably would have missed?

bq.The API's which are 'undesirable'

I would take a relook at these . Meanwhile , if you can visualize what the 
API's should look like please post them here






 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, ec2-23-20-119-52_solr.log, 
 ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Uwe Schindler
OK,
I'll wait for the fix. Is this a new bug in 4.8?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, April 24, 2014 7:33 PM
 To: Lucene/Solr Dev
 Subject: Re: [VOTE] Lucene/Solr 4.8.0 RC1
 
 I ran into a very serious bug during my manual testing of 4.8 that I think
 warrants a respin.
 https://issues.apache.org/jira/browse/SOLR-6011
 
 In a normal Solr setup, incorrect results are returned from complex phrase
 queries if inOrder is ever changed for the same query.  This would be
 maddening for most users to try and track down.
 
 -Yonik
 http://heliosearch.org - solve Solr GC pauses with off-heap filters and
 fieldcache
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-6011:


Fix Version/s: 5.0
   4.9
   4.8

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Yonik Seeley
On Thu, Apr 24, 2014 at 1:57 PM, Uwe Schindler u...@thetaphi.de wrote:
 OK,
 I'll wait for the fix. Is this a new bug in 4.8?

OK, thanks.  It's an old bug in Lucene, but a new bug in Solr (since
complex phrase queries weren't exposed before).
I'll commit now.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5626:
--

Priority: Blocker  (was: Major)

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5626:
--

Fix Version/s: 4.8

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-5626:
---


As we respin 4.8, I will backport this one, too, because otherwise it could 
happen that somebody (like me) hit this while smoke testing...

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980035#comment-13980035
 ] 

ASF subversion and git services commented on LUCENE-5626:
-

Commit 1589811 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1589811 ]

Merged revision(s) 1589397 from lucene/dev/branches/branch_4x:
Merged revision(s) 1589394 from lucene/dev/trunk:
LUCENE-5626: Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed 
IOException (ERROR_ACESS_DENIED) on Windows if the lock file was created 
concurrently

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980037#comment-13980037
 ] 

ASF subversion and git services commented on SOLR-6011:
---

Commit 1589812 from [~yo...@apache.org] in branch 'dev/trunk'
[ https://svn.apache.org/r1589812 ]

SOLR-6011: ComplexPhraseQuery hashCode/equals fix for inOrder

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980046#comment-13980046
 ] 

ASF subversion and git services commented on LUCENE-5626:
-

Commit 1589813 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1589813 ]

LUCENE-5626: Move changes entry

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6013) Fix method visibility of Evaluator, refactor DateFormatEvaluator for extensibility

2014-04-24 Thread Aaron LaBella (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron LaBella updated SOLR-6013:


Attachment: 
0001-change-method-variable-visibility-and-refactor-for-extensibility.patch

attaching the patch for review/comments.

 Fix method visibility of Evaluator, refactor DateFormatEvaluator for 
 extensibility
 --

 Key: SOLR-6013
 URL: https://issues.apache.org/jira/browse/SOLR-6013
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.7
Reporter: Aaron LaBella
 Fix For: 4.8

 Attachments: 
 0001-change-method-variable-visibility-and-refactor-for-extensibility.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 This is similar to issue 5981, the Evaluator class is declared as abstract, 
 yet the parseParams method is package private?  Surely this is an oversight, 
 as I wouldn't expect everyone writing their own evaluators to have to deal 
 with parsing the parameters.
 Similarly, I needed to refactor DateFormatEvaluator because I need to do some 
 custom date math/parsing and it wasn't written in a way that I can extend it.
 Please review/apply my attached patch to the next version of Solr, ie: 4.8 or 
 4.9 if I must wait.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980049#comment-13980049
 ] 

ASF subversion and git services commented on LUCENE-5626:
-

Commit 1589814 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1589814 ]

Merged revision(s) 1589813 from lucene/dev/trunk:
LUCENE-5626: Move changes entry

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980054#comment-13980054
 ] 

ASF subversion and git services commented on LUCENE-5626:
-

Commit 1589824 from [~thetaphi] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1589824 ]

Merged revision(s) 1589814 from lucene/dev/branches/branch_4x:
Merged revision(s) 1589813 from lucene/dev/trunk:
LUCENE-5626: Move changes entry (merge props only)

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6013) Fix method visibility of Evaluator, refactor DateFormatEvaluator for extensibility

2014-04-24 Thread Aaron LaBella (JIRA)
Aaron LaBella created SOLR-6013:
---

 Summary: Fix method visibility of Evaluator, refactor 
DateFormatEvaluator for extensibility
 Key: SOLR-6013
 URL: https://issues.apache.org/jira/browse/SOLR-6013
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.7
Reporter: Aaron LaBella
 Fix For: 4.8
 Attachments: 
0001-change-method-variable-visibility-and-refactor-for-extensibility.patch

This is similar to issue 5981, the Evaluator class is declared as abstract, yet 
the parseParams method is package private?  Surely this is an oversight, as I 
wouldn't expect everyone writing their own evaluators to have to deal with 
parsing the parameters.

Similarly, I needed to refactor DateFormatEvaluator because I need to do some 
custom date math/parsing and it wasn't written in a way that I can extend it.

Please review/apply my attached patch to the next version of Solr, ie: 4.8 or 
4.9 if I must wait.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-24 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980053#comment-13980053
 ] 

Timothy Potter commented on SOLR-5473:
--

Thought I'd add my 2 cents on this one as I've worked on some of this code and 
want to get a better sense of how to move forward. Reverting and moving out to 
a branch sounds like a good idea.

In general, I think it would be good to split the discussion about this topic 
into 3 sections: 1) overall design / architecture, 2) implementation and impact 
on public API, 3) testing. Moving forward we should start with identifying 
where we have common ground in these areas and which aspects are more 
controversial and need more hashing out between us. 

Here's what I think I know but please correct where I'm off-base:

1) Overall Design / Architecture

It sounds like we're all on-board with splitting cluster state into a 
per-collection state znode. Do we intend to support both formats or do we 
intend to just migrate to the split approach? I think the answer is the latter, 
that going forward, SolrCloud will keep state in a separate znode per 
collection.

Noble's idea is that once the state is split, then cores only need to watch the 
znode for the collection/shard it's linked to. In other words, each SolrCore 
watches a specific state znode and thus does not receive any state change 
updates for other collections.

In terms of what's watched and what is not watched, this patch includes code 
from 5474 (as they were too intimately tied together to keep separated) which 
doesn't watch collection state changes on the client side. Instead the client 
relies on a _stateVer_ check during request processing and receives an error 
from the server if the client state is stale. I too think this is a little 
controversial / confusing and maybe we don't have to keep that as part of this 
solution. It was our mistake to merge those two into a single patch. We 
originally were thinking 5474 was needed to keep the number of watchers on a 
znode to a minimum in the event of many clients using many collections. 
However, I do think this feature can be split out and dealt with in a better 
way, if at all. In other words, split state znodes are watched from server and 
client side. 

Are there any other things design / architecture wise that are controversial?

2) Implementation (and API impact)

This seems like the biggest area of contention right now. The main issue is 
that the API changes still give the impression of two state tracking formats, 
whereas we really only want one format.

The common ground here is that there should be no mention of external in any 
public method or state format for that matter, right?

Noble: Assuming we're moving forward with stateFormat == 2 and the unified 
/clusterstate.json is going away, is it possible to not change any of the 
existing public methods? In other words, we're changing the internals of where 
state is kept, so why does that have to impact the public API? If not, let's 
come up with a plan for each change and how we can minimize impact of this. It 
seems to me that we need to be more diligent about API impacts of this change 
and focus on not breaking the public view of cluster state as much as possible. 
It would be helpful to have a bullet list of API impacts that are needed for 
this so we don't have to scour the patch looking for all possible changes.

3) Testing

I just wanted to mention that we've been doing a fair amount of integration 
testing with 100's of external collections per cluster. So I realize this is 
a big change but we have been testing this extensively in our QA labs. I only 
mention this so that others know that have been concentrating on hardening this 
feature over the past couple of months. Once we sort out the API problems, I'm 
confident that this approach will be solid.

To recap, I see a lot of common ground here and to move forward, we need to 
move this out to a branch and off trunk where we'll focus on cleaning up the 
API impacts of this work, support only the split format going forward (with a 
migration plan for existing installations). We also want to revisit the 
thinking behind not watching state changes on the client as that wasn't clear 
in the patch to this point.



 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, 

[jira] [Resolved] (LUCENE-5626) SimpleFSLockFactory access denied on windows.

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5626.
---

Resolution: Fixed

Backported to 4.8 for respin.

 SimpleFSLockFactory access denied on windows.
 ---

 Key: LUCENE-5626
 URL: https://issues.apache.org/jira/browse/LUCENE-5626
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5626.patch, LUCENE-5626.patch


 This happened twice in jenkins:
 {noformat}
 [lockStressTest2] Exception in thread main java.io.IOException:
 Access is denied
 [lockStressTest2] at
 java.io.WinNTFileSystem.createFileExclusively(Native Method)
 [lockStressTest2] at java.io.File.createNewFile(File.java:1012)
 [lockStressTest2] at
 org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:135)
 {noformat}
 My windows machine got struck by lightning, so I cannot fix this easily. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980062#comment-13980062
 ] 

ASF subversion and git services commented on SOLR-6011:
---

Commit 1589826 from [~yo...@apache.org] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1589826 ]

SOLR-6011: ComplexPhraseQuery hashCode/equals fix for inOrder

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5981) Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected)

2014-04-24 Thread Aaron LaBella (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980061#comment-13980061
 ] 

Aaron LaBella commented on SOLR-5981:
-

I'm not seeing this fix in the git mirror of lucene-solr?  I'm also wondering 
why it was moved from 4.8 into 4.9, I thought it was ready to go?

Thanks.

 Please change method visibility of getSolrWriter in DataImportHandler to 
 public (or at least protected)
 ---

 Key: SOLR-5981
 URL: https://issues.apache.org/jira/browse/SOLR-5981
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0
 Environment: Linux 3.13.9-200.fc20.x86_64
 Solr 4.6.0
Reporter: Aaron LaBella
Assignee: Shawn Heisey
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-5981.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I've been using the org.apache.solr.handler.dataimport.DataImportHandler for 
 a bit and it's an excellent model and architecture.  I'd like to extend the 
 usage of it to plugin my own DIHWriter, but, the code doesn't allow for it.  
 Please change ~line 227 in the DataImportHander class to be:
 public SolrWriter getSolrWriter
 instead of:
 private SolrWriter getSolrWriter
 or, at a minimum, protected, so that I can extend DataImportHandler and 
 override this method.
 Thank you *sincerely* in advance for the quick turn-around on this.  If the 
 change can be made in 4.6.0 and upstream, that'd be ideal.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980060#comment-13980060
 ] 

ASF subversion and git services commented on SOLR-6011:
---

Commit 1589825 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1589825 ]

SOLR-6011: ComplexPhraseQuery hashCode/equals fix for inOrder

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Yonik Seeley
On Thu, Apr 24, 2014 at 2:03 PM, Yonik Seeley yo...@heliosearch.com wrote:
 On Thu, Apr 24, 2014 at 1:57 PM, Uwe Schindler u...@thetaphi.de wrote:
 OK,
 I'll wait for the fix. Is this a new bug in 4.8?

 OK, thanks.  It's an old bug in Lucene, but a new bug in Solr (since
 complex phrase queries weren't exposed before).
 I'll commit now.

Actually, it looks like it was a new bug in Lucene as well, since
inOrder was just added for 4.8 in
https://issues.apache.org/jira/browse/LUCENE-3758

Anyway, the fix is now in the 4.8 branch.


-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980067#comment-13980067
 ] 

Uwe Schindler commented on SOLR-6011:
-

Somehow because of merge conflict the changes got lost in 4.x and 4.8. Sorry 
for the concurrent merge!

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980071#comment-13980071
 ] 

Yonik Seeley commented on SOLR-6011:


That's OK... I realized that it was a new bug for both lucene and solr, so no 
need to be in CHANGES at all...

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-6011.


Resolution: Fixed

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1589782 - /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/inde x/TestTerms.java

2014-04-24 Thread Robert Muir
On Thu, Apr 24, 2014 at 12:31 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Yeah, I didn't want to disable the full test, just that one method,
 because I want Terms.getMin/Max testing for Lucene3x too.

 Would be nice if we could @SuppressCodecs for just one method ...

Thats impossible: many tests create their indexes in @BeforeClass, so
codec selection really must be a per-class thing.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980087#comment-13980087
 ] 

ASF subversion and git services commented on SOLR-6011:
---

Commit 1589832 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1589832 ]

SOLR-6011: Remove changes entry also in trunk

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6011) inOrder does not work with the complexphrase parser

2014-04-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980090#comment-13980090
 ] 

Uwe Schindler commented on SOLR-6011:
-

OK, I removed the changes entry in trunk, too.

 inOrder does not work with the complexphrase parser
 ---

 Key: SOLR-6011
 URL: https://issues.apache.org/jira/browse/SOLR-6011
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Priority: Critical
 Fix For: 4.8, 4.9, 5.0

 Attachments: SOLR-6011.patch, SOLR-6011.patch


 {code}
 {!complexphrase}vol* high*
 does not match the Solr document containing ... high volume web ... (this 
 is correct)
 But adding inOrder=false still fails to make it match.
 {!complexphrase inOrder=false}vol* high*
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: svn commit: r1589782 - /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/inde x/TestTerms.java

2014-04-24 Thread Uwe Schindler
Another possibility that works is to move this test to a separate class, 
annotated with SuppressCodecs (if it does not depend on indexes created in 
beforeclass).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, April 24, 2014 8:41 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1589782 -
 /lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/
 inde x/TestTerms.java
 
 On Thu, Apr 24, 2014 at 12:31 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
  Yeah, I didn't want to disable the full test, just that one method,
  because I want Terms.getMin/Max testing for Lucene3x too.
 
  Would be nice if we could @SuppressCodecs for just one method ...
 
 Thats impossible: many tests create their indexes in @BeforeClass, so codec
 selection really must be a per-class thing.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] Lucene/Solr 4.8.0 RC1

2014-04-24 Thread Uwe Schindler
OK, I'll wait a bit and respin before going to bed (to give Jenkins a chance to 
test it finally) :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, April 24, 2014 8:31 PM
 To: Lucene/Solr Dev
 Subject: Re: [VOTE] Lucene/Solr 4.8.0 RC1
 
 On Thu, Apr 24, 2014 at 2:03 PM, Yonik Seeley yo...@heliosearch.com
 wrote:
  On Thu, Apr 24, 2014 at 1:57 PM, Uwe Schindler u...@thetaphi.de
 wrote:
  OK,
  I'll wait for the fix. Is this a new bug in 4.8?
 
  OK, thanks.  It's an old bug in Lucene, but a new bug in Solr (since
  complex phrase queries weren't exposed before).
  I'll commit now.
 
 Actually, it looks like it was a new bug in Lucene as well, since inOrder was
 just added for 4.8 in
 https://issues.apache.org/jira/browse/LUCENE-3758
 
 Anyway, the fix is now in the 4.8 branch.
 
 
 -Yonik
 http://heliosearch.org - solve Solr GC pauses with off-heap filters and
 fieldcache
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1589838 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/analysis/ lucene/analysis/common/src/resources/META-INF/services/ lucene/analysis/common/src/test/org/apache/lucene/anal

2014-04-24 Thread Uwe Schindler
Can you commit this to 4.8, because otherwise the uppercase factory does not 
work in solr. Its still time to do this.

Uwe

On 24. April 2014 21:18:40 MESZ, rm...@apache.org wrote:
Author: rmuir
Date: Thu Apr 24 19:18:39 2014
New Revision: 1589838

URL: http://svn.apache.org/r1589838
Log:
fix TestAllAnalysersHaveFactories to actually work, and add missing SPI
entry

Modified:
lucene/dev/branches/branch_4x/   (props changed)
lucene/dev/branches/branch_4x/lucene/   (props changed)
lucene/dev/branches/branch_4x/lucene/analysis/   (props changed)
lucene/dev/branches/branch_4x/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java

Modified:
lucene/dev/branches/branch_4x/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
URL:
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory?rev=1589838r1=1589837r2=1589838view=diff
==
---
lucene/dev/branches/branch_4x/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
(original)
+++
lucene/dev/branches/branch_4x/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.util.TokenFilterFactory
Thu Apr 24 19:18:39 2014
@@ -30,6 +30,7 @@ org.apache.lucene.analysis.compound.Hyph
 org.apache.lucene.analysis.core.LowerCaseFilterFactory
 org.apache.lucene.analysis.core.StopFilterFactory
 org.apache.lucene.analysis.core.TypeTokenFilterFactory
+org.apache.lucene.analysis.core.UpperCaseFilterFactory
 org.apache.lucene.analysis.cz.CzechStemFilterFactory
 org.apache.lucene.analysis.de.GermanLightStemFilterFactory
 org.apache.lucene.analysis.de.GermanMinimalStemFilterFactory

Modified:
lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java
URL:
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java?rev=1589838r1=1589837r2=1589838view=diff
==
---
lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java
(original)
+++
lucene/dev/branches/branch_4x/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java
Thu Apr 24 19:18:39 2014
@@ -130,6 +130,7 @@ public class TestAllAnalyzersHaveFactori
 || crazyComponents.contains(c)
 || oddlyNamedComponents.contains(c)
 || deprecatedDuplicatedComponents.contains(c)
+|| c.isAnnotationPresent(Deprecated.class) // deprecated ones
are typically back compat hacks
|| !(Tokenizer.class.isAssignableFrom(c) ||
TokenFilter.class.isAssignableFrom(c) ||
CharFilter.class.isAssignableFrom(c))
   ) {
 continue;
@@ -151,7 +152,7 @@ public class TestAllAnalyzersHaveFactori
   }
   assertSame(c, instance.create(new StringReader()).getClass());
 } catch (IllegalArgumentException e) {
-  if (!e.getMessage().contains(SPI)) {
+  if (!e.getMessage().contains(SPI) ||
e.getMessage().contains(does not exist)) {
 throw e;
   }
// TODO: For now pass because some factories have not yet a default
config that always works
@@ -173,7 +174,7 @@ public class TestAllAnalyzersHaveFactori
 assertSame(c, createdClazz);
   }
 } catch (IllegalArgumentException e) {
-  if (!e.getMessage().contains(SPI)) {
+  if (!e.getMessage().contains(SPI) ||
e.getMessage().contains(does not exist)) {
 throw e;
   }
// TODO: For now pass because some factories have not yet a default
config that always works
@@ -195,7 +196,7 @@ public class TestAllAnalyzersHaveFactori
 assertSame(c, createdClazz);
   }
 } catch (IllegalArgumentException e) {
-  if (!e.getMessage().contains(SPI)) {
+  if (!e.getMessage().contains(SPI) ||
e.getMessage().contains(does not exist)) {
 throw e;
   }
// TODO: For now pass because some factories have not yet a default
config that always works

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

[jira] [Created] (LUCENE-5630) Improve TestAllAnalyzersHaveFactories

2014-04-24 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5630:
---

 Summary: Improve TestAllAnalyzersHaveFactories
 Key: LUCENE-5630
 URL: https://issues.apache.org/jira/browse/LUCENE-5630
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


This test wasn't working at all, it would always pass.

It is sensitive to the strings inside exception messages, if we change those, 
it might suddenly stop working.

It would be great to improve this thing to be less fragile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6013) Fix method visibility of Evaluator, refactor DateFormatEvaluator for extensibility

2014-04-24 Thread Aaron LaBella (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron LaBella updated SOLR-6013:


Attachment: 0001-add-getters-for-datemathparser.patch

one more small patch after fully testing the changes for extensibility

 Fix method visibility of Evaluator, refactor DateFormatEvaluator for 
 extensibility
 --

 Key: SOLR-6013
 URL: https://issues.apache.org/jira/browse/SOLR-6013
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.7
Reporter: Aaron LaBella
 Fix For: 4.8

 Attachments: 0001-add-getters-for-datemathparser.patch, 
 0001-change-method-variable-visibility-and-refactor-for-extensibility.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 This is similar to issue 5981, the Evaluator class is declared as abstract, 
 yet the parseParams method is package private?  Surely this is an oversight, 
 as I wouldn't expect everyone writing their own evaluators to have to deal 
 with parsing the parameters.
 Similarly, I needed to refactor DateFormatEvaluator because I need to do some 
 custom date math/parsing and it wasn't written in a way that I can extend it.
 Please review/apply my attached patch to the next version of Solr, ie: 4.8 or 
 4.9 if I must wait.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5559) Argument validation for TokenFilters having numeric constructor parameter(s)

2014-04-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980175#comment-13980175
 ] 

Ahmet Arslan commented on LUCENE-5559:
--

Pinging [~rcmuir], if there is an interest for last patch that covers two 
overlooked TokenFilters : {{CapitalizationFilter}} and {{CodepointCountFilter}}

 Argument validation for TokenFilters having numeric constructor parameter(s)
 

 Key: LUCENE-5559
 URL: https://issues.apache.org/jira/browse/LUCENE-5559
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.7
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5559.patch, LUCENE-5559.patch, LUCENE-5559.patch, 
 LUCENE-5559.patch, LUCENE-5559.patch


 Some TokenFilters have numeric arguments in their constructors. They should 
 throw {{IllegalArgumentException}} for negative or meaningless values. 
 Here is some examples that demonstrates invalid/meaningless arguments :
 {code:xml}
  filter class=solr.LimitTokenCountFilterFactory maxTokenCount=-10 /
 {code}
 {code:xml}
  filter class=solr.LengthFilterFactory min=-5 max=-1 /
 {code}
 {code:xml}
  filter class=solr.LimitTokenPositionFilterFactory maxTokenPosition=-3 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5559) Argument validation for TokenFilters having numeric constructor parameter(s)

2014-04-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980187#comment-13980187
 ] 

Robert Muir commented on LUCENE-5559:
-

Oops, looks like i missed this patch? thanks Ahmet. I will take care.

 Argument validation for TokenFilters having numeric constructor parameter(s)
 

 Key: LUCENE-5559
 URL: https://issues.apache.org/jira/browse/LUCENE-5559
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.7
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5559.patch, LUCENE-5559.patch, LUCENE-5559.patch, 
 LUCENE-5559.patch, LUCENE-5559.patch


 Some TokenFilters have numeric arguments in their constructors. They should 
 throw {{IllegalArgumentException}} for negative or meaningless values. 
 Here is some examples that demonstrates invalid/meaningless arguments :
 {code:xml}
  filter class=solr.LimitTokenCountFilterFactory maxTokenCount=-10 /
 {code}
 {code:xml}
  filter class=solr.LengthFilterFactory min=-5 max=-1 /
 {code}
 {code:xml}
  filter class=solr.LimitTokenPositionFilterFactory maxTokenPosition=-3 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5559) Argument validation for TokenFilters having numeric constructor parameter(s)

2014-04-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980202#comment-13980202
 ] 

Ahmet Arslan commented on LUCENE-5559:
--

bq.  looks like i missed this patch?
No, actually I found those two after your commit.

 Argument validation for TokenFilters having numeric constructor parameter(s)
 

 Key: LUCENE-5559
 URL: https://issues.apache.org/jira/browse/LUCENE-5559
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.7
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5559.patch, LUCENE-5559.patch, LUCENE-5559.patch, 
 LUCENE-5559.patch, LUCENE-5559.patch


 Some TokenFilters have numeric arguments in their constructors. They should 
 throw {{IllegalArgumentException}} for negative or meaningless values. 
 Here is some examples that demonstrates invalid/meaningless arguments :
 {code:xml}
  filter class=solr.LimitTokenCountFilterFactory maxTokenCount=-10 /
 {code}
 {code:xml}
  filter class=solr.LengthFilterFactory min=-5 max=-1 /
 {code}
 {code:xml}
  filter class=solr.LimitTokenPositionFilterFactory maxTokenPosition=-3 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5630) Improve TestAllAnalyzersHaveFactories

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5630:
--

Attachment: LUCENE-5630.patch

This patch fixes the issue.

In fact the whole check was wrong and really to fragile. The new approach is 
100% safe:
- Separately lookup the class before doing anything. If this throws any 
exception, then the component is really missing
- Then do the same checks as before (to actually check that instantiation 
works), but don't check the message. Its easier. The newInstance method throws 
IAE, which wraps a NoSuchMethodException - so just check IAE#getCause()

I will commit this fix to all three branches.

 Improve TestAllAnalyzersHaveFactories
 -

 Key: LUCENE-5630
 URL: https://issues.apache.org/jira/browse/LUCENE-5630
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5630.patch


 This test wasn't working at all, it would always pass.
 It is sensitive to the strings inside exception messages, if we change those, 
 it might suddenly stop working.
 It would be great to improve this thing to be less fragile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5630) Improve TestAllAnalyzersHaveFactories

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5630:
--

Fix Version/s: 5.0
   4.9
   4.8

 Improve TestAllAnalyzersHaveFactories
 -

 Key: LUCENE-5630
 URL: https://issues.apache.org/jira/browse/LUCENE-5630
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5630.patch


 This test wasn't working at all, it would always pass.
 It is sensitive to the strings inside exception messages, if we change those, 
 it might suddenly stop working.
 It would be great to improve this thing to be less fragile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5630) Improve TestAllAnalyzersHaveFactories

2014-04-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980219#comment-13980219
 ] 

Robert Muir commented on LUCENE-5630:
-

Looks great, thanks!

 Improve TestAllAnalyzersHaveFactories
 -

 Key: LUCENE-5630
 URL: https://issues.apache.org/jira/browse/LUCENE-5630
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5630.patch


 This test wasn't working at all, it would always pass.
 It is sensitive to the strings inside exception messages, if we change those, 
 it might suddenly stop working.
 It would be great to improve this thing to be less fragile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5630) Improve TestAllAnalyzersHaveFactories

2014-04-24 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5630:
-

Assignee: Uwe Schindler

 Improve TestAllAnalyzersHaveFactories
 -

 Key: LUCENE-5630
 URL: https://issues.apache.org/jira/browse/LUCENE-5630
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.8, 4.9, 5.0

 Attachments: LUCENE-5630.patch


 This test wasn't working at all, it would always pass.
 It is sensitive to the strings inside exception messages, if we change those, 
 it might suddenly stop working.
 It would be great to improve this thing to be less fragile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >