date:20121006


 [ 
https://issues.apache.org/jira/browse/SOLR-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3915:


Attachment: SOLR-3915-screenshot.png

Yeah of course - screenshot attached .. will update with any new patches coming

 Color Legend for Cloud UI
 -

 Key: SOLR-3915
 URL: https://issues.apache.org/jira/browse/SOLR-3915
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.1

 Attachments: SOLR-3915.patch, SOLR-3915-screenshot.png


 The meaning of the used shard colors is not really clear, integrate kind of a 
 legend fo all possible node-states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times

2012-10-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470936#comment-13470936
 ] 

Robert Muir commented on LUCENE-4463:
-

{quote}
if you can settle for reusing the same JVM, tests.iters combined with 
testmethod (or tests.method=...*) will give you different test seeds everytime 
– only the global seed will be the same 
{quote}

Right, we can't really settle for that for Lucene's tests. Thats because things 
like Codec are set at class level, so i could run this 100 times 
and press commit and watch it fail because jenkins gets a different codec. And 
we have a lot of these, and only more being added.

Its a tradeoff, sure we could set Codec per-writer e.g. in our 
newIndexWriterConfig instead of per-test-class, but I think it makes debugging 
much simpler to look at it as a parameterized-test-class at this level of 
Codec apis maturity so we can easily see which one the test got when it failed.

So really we need a different per-class seed too: same as you would get when 
doing 'ant test' in a loop with an inefficient shell script.


 add support for running the same test method many times
 ---

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times

[
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470937#comment-13470937
]

Dawid Weiss commented on LUCENE-4463:
-

Let me start from the beginning. I talked about it once but I can't find it
now.

1. testmethod vs. tests.method

The reason for the complication in testmethod vs. tests.method is stemming
from how JUnit works. A test description must (in practice) be unique,
otherwise tools just go crazy. So to make a test repeat, its name must be made
unique. That's why if you do tests.iters=X every repetition of a single test
method will in fact be named uniquely as:

testMethod {#seq seed=[...]}

These are not things just added to the report, this is a method name as JUnit
Description object sees it. It's odd but it's a workaround that works and that
is (as far as I'm concerned) the only one possible.

So when you use -Dtestname=X this is an alias for -Dtests.method=X which
will filter out all these repeated tests (because effectively they don't match
the mattern). In order to include them, you need to add a glob like:
-Dtests.method=X*. Hoss and I added this to the test-help to make it
clear(er) a while back.

2. Seeds and tests.dup

The master test seed is passed to junit4 task once and it just stays there.
Everything else is derived from it. The duplication you see is a simple trick
-- we just duplicate the file name on input. Because every suite gets the same
random seed (mixed with its class name to make it more random across a single
run), a duplicated identical suite will still get the same master seed every
time.

This option was meant for accelerating a test scenario in which we want to
repeat a single suite/seed combination many times and want to do it using
multiple parallel JVMs.

3. What Robert wants (across-jvm repetition of a single suite with a different
seed each time).

Is effectively impossible right now without re-spinning junit4 with a
different seed each time.

I don't see I could marry all this into working with both the scenario above
and with Robert wants although I admit both are useful. A script (loop) doing
an antcall would work but this seems like an overkill. Fixing this at JUnit4
level isn't trivial either because the seed is currently picked even before
junit4 is started (to select the target charset).

add support for running the same test method many times
---

Key: LUCENE-4463
URL: https://issues.apache.org/jira/browse/LUCENE-4463
Project: Lucene - Core
Issue Type: Wish
Components: general/build
Reporter: Robert Muir
Attachments: LUCENE-4463.patch

I have a shell script for this, mike has a python script, its annoying :)
I want to do something like this:
ant beast -Dtestcase= -Dtestmethod= -Diterations=100
I would be happy with a simple loop that just invokes 'test' somehow: getting
a fresh new JVM to each iteration is desirable anyway (so you get fresh
codecs, etc).
the -Dtests.iters is not really useful for this because it does not allow
-Dtestmethod and it does not give a fresh jvm.
bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470940#comment-13470940
 ] 

Dawid Weiss commented on LUCENE-4463:
-

I have two ideas but they both have shortcomings -- I could make tests.dups run 
with different seed for each suite but they'd be the same sequence _on each 
forked JVM_ (add a static field to the current class-name-mixer and just mix 
with the repetition of the same suite name). An alternative is to modify junit4 
and do the same, but then to allow both the same seed and different seed each 
(different scenarios) we'd need... yet another -D option :)

 add support for running the same test method many times
 ---

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times

2012-10-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470943#comment-13470943
 ] 

Robert Muir commented on LUCENE-4463:
-

I know its not easy: thats why its a wish task :)

I guess I'm just basically listing what could be seen as an expert feature 
here, but arguably necessary if you are trying to do randomized tests.

The fact is that things dont always reproduce 100%, and you know this is 
definitely a failure in our tests (e.g. the current situation motivated
me to open LUCENE-4460). But really part of random testing is you know, you 
dont have to try to write targeted tests but just throw hardware
at the problem (which I'm doing... my office is really hot right now!).

The frustrating part is I think ideally you want to basically treat this whole 
randomized test situation like a normal deterministic unit test, you know
like a normal developer would have, so you know you fixed the bug, even if the 
test isn't great and doesnt reproduce 100%, you want to know its really
fixed rather than taking blind stabs, waiting to see if all the computers in 
your house running full throttle will trip a bug in 24 hours to declare success 
:)

So I'm just basically opening this wish task to try to think of ways to make 
this easier and more efficient.

I'd actually go so far to say the tests.iters is really outdated for lucene's 
tests these days (since we have so much class-level parameterization
and we should be focusing on this tests.dups (and maybe removing the 
tests.iters totally). Maybe thats just particular to us though, but as I 
mentioned
above I think we show some real use cases for parameterizing the entire test 
class with certain things because it simplifies debugging.


 add support for running the same test method many times
 ---

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3915) Color Legend for Cloud UI


 [ 
https://issues.apache.org/jira/browse/SOLR-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3915:


Attachment: SOLR-3915.patch
SOLR-3915-screenshot.png

Next Version, using the same markup as the graph itself does.

For Documentation, the Definition is coming from [mark's comment on 
SOLR-3174|https://issues.apache.org/jira/browse/SOLR-3174?focusedCommentId=13255923page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13255923]

 Color Legend for Cloud UI
 -

 Key: SOLR-3915
 URL: https://issues.apache.org/jira/browse/SOLR-3915
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.1

 Attachments: SOLR-3915.patch, SOLR-3915.patch, 
 SOLR-3915-screenshot.png, SOLR-3915-screenshot.png


 The meaning of the used shard colors is not really clear, integrate kind of a 
 legend fo all possible node-states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3174) Visualize Cluster State


[ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470945#comment-13470945
 ] 

Stefan Matheis (steffkes) commented on SOLR-3174:
-

bq. Another thing we should probably do is add a key for the meaning of the 
colors.
oO didn't see this comment yet ... but now we have one, coming with SOLR-3915 =)

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Ryan McKinley
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.0-ALPHA

 Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, 
 SOLR-3174-graph.png, SOLR-3174.patch, SOLR-3174.patch, SOLR-3174.patch, 
 SOLR-3174.patch, SOLR-3174.patch, SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, 
 SOLR-3174-rgraph.png


 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times

2012-10-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470946#comment-13470946
 ] 

Robert Muir commented on LUCENE-4463:
-

And just to give a little more background, I mean the stuff we are dealing with 
is really crazy in some sense.

You don't see the normal jenkins servers emitting failures: what happened is we 
realized we weren't really testing XYZ
that we thought we were testing for months: I'm trying to help make up for lost 
time :(

So all these failures you have seen this week have been typically 
nasty-to-debug hard-to-reproduce long-tail failures
that would normally take a ton of time to show up: Its just been Mike debugging 
and fixing and me trying to figure
out more ways to provoke these failures in a more efficient way, like 
good-cop/bad-cop


 add support for running the same test method many times
 ---

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

VOTE: release 4.0 (RC2)

2012-10-06 Thread Robert Muir

artifacts here: http://s.apache.org/lusolr40rc2

Thanks for the good inspection of rc#1 and finding bugs, which found
test bugs and other bugs!
I am happy this was all discovered and sorted out before release.

vote stays open until wednesday, the weekend is just extra time for
evaluating the RC.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3640) Core Admin UI issues on Chrome


[ 
https://issues.apache.org/jira/browse/SOLR-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470957#comment-13470957
 ] 

Stefan Matheis (steffkes) commented on SOLR-3640:
-

sorry for the late response, [~astubbs] is this still valid? otherwise i'd like 
to resolve it

 Core Admin UI issues on Chrome
 --

 Key: SOLR-3640
 URL: https://issues.apache.org/jira/browse/SOLR-3640
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0-ALPHA
Reporter: Antony Stubbs
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, Screen Shot 
 2012-07-18 at 3.05.10 PM.png


 Trying to click on any of the buttons apparently has no affect. They also 
 have no icons next to them anymore and appear down the left.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3917) Partial State is not defined for Dynamic Fields Types

Stefan Matheis (steffkes) created SOLR-3917:
---

 Summary: Partial State is not defined for Dynamic Fields  Types
 Key: SOLR-3917
 URL: https://issues.apache.org/jira/browse/SOLR-3917
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1


SOLR-3734 introduced a partial state for fields, which are referenced f.e. 
within a copyfield, but are not explicit declared in the schema -- checking not 
correctly for the state, the schema browser throws an error for dynamic fields 
and types

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3917) Partial State is not defined for Dynamic Fields Types


 [ 
https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3917:


Attachment: SOLR-3917.patch

 Partial State is not defined for Dynamic Fields  Types
 ---

 Key: SOLR-3917
 URL: https://issues.apache.org/jira/browse/SOLR-3917
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1

 Attachments: SOLR-3917.patch


 SOLR-3734 introduced a partial state for fields, which are referenced f.e. 
 within a copyfield, but are not explicit declared in the schema -- checking 
 not correctly for the state, the schema browser throws an error for dynamic 
 fields and types

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3734) Forever loop in schema browser


[ 
https://issues.apache.org/jira/browse/SOLR-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470962#comment-13470962
 ] 

Stefan Matheis (steffkes) commented on SOLR-3734:
-

Committed revision 1394980. lucene_solr_4_0

 Forever loop in schema browser
 --

 Key: SOLR-3734
 URL: https://issues.apache.org/jira/browse/SOLR-3734
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, web gui
Reporter: Lance Norskog
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1

 Attachments: SOLR-3734.patch, SOLR-3734.patch, 
 SOLR-3734_schema_browser_blocks_solr_conf_dir.zip


 When I start Solr with the attached conf directory, and hit the Schema 
 Browser, the loading circle spins permanently. 
 I don't know if the problem is in the UI or in Solr. The UI does not display 
 the Ajax solr calls, and I don't have a debugging proxy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3917) Partial State is not defined for Dynamic Fields Types


 [ 
https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3917:


Attachment: SOLR-3917.patch

updated patch, using {{is_f}} to identify if we are displaying the details of a 
field

 Partial State is not defined for Dynamic Fields  Types
 ---

 Key: SOLR-3917
 URL: https://issues.apache.org/jira/browse/SOLR-3917
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1

 Attachments: SOLR-3917.patch, SOLR-3917.patch


 SOLR-3734 introduced a partial state for fields, which are referenced f.e. 
 within a copyfield, but are not explicit declared in the schema -- checking 
 not correctly for the state, the schema browser throws an error for dynamic 
 fields and types

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-10-06 Thread Christian Moen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470967#comment-13470967
 ] 

Christian Moen commented on LUCENE-3921:


Lance,

The idea I had in mind for Japanese uses language specific characteristics for 
katakana terms and perhaps weights that are dictionary-specific as well.  
However, we are hacking the our statistical model here and there are 
limitations as to how far we can go with this.

I don't know a whole lot about the Smart Chinese toolkit, but I believe the 
same approach to compound segmentation could work for Chinese as well.  
However, weights and implementation would likely to be separate.  Note that the 
above is really about one specific kind of compound segmentation that applies 
to Japanese so the thinking was to add additional heuristics for this specific 
type that is particularly tricky.

It might be a good idea to approach this problem also using the 
{{DictionaryCompoundWordTokenFilter}} and collectively build some lexical 
assets for compound splitting for the relevant languages than hacking our 
models.

 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0-ALPHA
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

2012-10-06 Thread Arcadius Ahouansou (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arcadius Ahouansou updated LUCENE-1822:
---

Attachment: LUCENE-1822-tests.patch

Hi Koji.

Thanks for the patch update.
The failing tests have been fixed.
Some are obvious. 
For the tests checking for subInfo, 
we have something like
exptected: subInfos=(theboth((195,203)))/0.86791086(189,289)
actual   : subInfos=(theboth((195,203)))/0.86791086(149,249)


Honestly, I haven't got into the detail of verifying/counting the offset 
positions for   the search terms.


Could you have a look please?

Thanks.


 FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
 naive
 --

 Key: LUCENE-1822
 URL: https://issues.apache.org/jira/browse/LUCENE-1822
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 2.9
 Environment: any
Reporter: Alex Vigdor
Assignee: Koji Sekiguchi
Priority: Minor
 Attachments: LUCENE-1822.patch, LUCENE-1822.patch, 
 LUCENE-1822-tests.patch


 The new FastVectorHighlighter performs extremely well, however I've found in 
 testing that the window of text chosen per fragment is often very poor, as it 
 is hard coded in SimpleFragListBuilder to always select starting 6 characters 
 to the left of the first phrase match in a fragment.  When selecting long 
 fragments, this often means that there is barely any context before the 
 highlighted word, and lots after; even worse, when highlighting a phrase at 
 the end of a short text the beginning is cut off, even though the entire 
 phrase would fit in the specified fragCharSize.  For example, highlighting 
 Punishment in Crime and Punishment  returns e and bPunishment/b no 
 matter what fragCharSize is specified.  I am going to attach a patch that 
 improves the text window selection by recalculating the starting margin once 
 all phrases in the fragment have been identified - this way if a single word 
 is matched in a fragment, it will appear in the middle of the highlight, 
 instead of 6 characters from the beginning.  This way one can also guarantee 
 that the entirety of short texts are represented in a fragment by specifying 
 a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3917) Partial State is not defined for Dynamic Fields Types


 [ 
https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3917.
-

Resolution: Fixed

Committed revision 1394983. trunk
Committed revision 1394987. branch_4x
Committed revision 1394990. lucene_solr_4_0

 Partial State is not defined for Dynamic Fields  Types
 ---

 Key: SOLR-3917
 URL: https://issues.apache.org/jira/browse/SOLR-3917
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1

 Attachments: SOLR-3917.patch, SOLR-3917.patch


 SOLR-3734 introduced a partial state for fields, which are referenced f.e. 
 within a copyfield, but are not explicit declared in the schema -- checking 
 not correctly for the state, the schema browser throws an error for dynamic 
 fields and types

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: VOTE: release 4.0 (RC2)

2012-10-06 Thread Uwe Schindler

Hi,

+1 to release this time!

- I ran smoketester on Linux (JDKs: 1.6.0_33, 1.7.0_07, server, 64bit), passed!
- I used the PANGAEA index (version 3.6.1, copied from our production system), 
ran checkindex on it (both JDKs) passed. Index size and deletions were reported 
correctly this time. I also checked an force-merged PANGAEA index, passed also.
- I used IndexUpgrader to upgrade both 3.6.1 indexes, passed.
- I checked the output of IndexUpgrader again with CheckIndex, the output for 
the force-merged 3.6.1 index was identical to the migrated 4.0 index (number of 
terms,)
- I compared the index sizes of the single-segment 3.6.1 and 4.0.0 indexes: 4.0 
was slightly larger. You have to know that this index contains thousands of 
fields (it allows to search in XML based on XQuery-like, so we have a field for 
every possible xpath of our rather complex XML schema in the index). The search 
speed improved dramatically (because of the separate term dictionaries for 
every field). In addition, so the bigger size is also caused by splitting 
fields to separate term dictionaries vs. one in 3.6. 4.0 also has more 
statistics. I think an index with 10 fields may be smaller, will try this a 
little bit later with another index
- I used the demo module to run some text-only queries, they passed.

On small thing: We mention (in lucene's package), that Java 6 is needed, so we 
should at least mention that in the release notes. We should improve our 
docs/index.(html|xsl) to mention system requirements. Same for Solr. We have a 
system requirements page on the website, but that is unversioned, so we should 
also add a section for 4.0 there. But this is not the way to go. We should also 
mention that Java 7 is the preferred Java version, if you have 1.7.0_01 at 
least.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Saturday, October 06, 2012 10:11 AM
 To: dev@lucene.apache.org
 Subject: VOTE: release 4.0 (RC2)
 
 artifacts here: http://s.apache.org/lusolr40rc2
 
 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs
 and other bugs!
 I am happy this was all discovered and sorted out before release.
 
 vote stays open until wednesday, the weekend is just extra time for evaluating
 the RC.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: VOTE: release 4.0 (RC2)

2012-10-06 Thread Uwe Schindler

I mean: we don't mention system requirements correctly - sorry

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Saturday, October 06, 2012 1:19 PM
 To: dev@lucene.apache.org
 Subject: RE: VOTE: release 4.0 (RC2)
 
 Hi,
 
 +1 to release this time!
 
 - I ran smoketester on Linux (JDKs: 1.6.0_33, 1.7.0_07, server, 64bit), 
 passed!
 - I used the PANGAEA index (version 3.6.1, copied from our production system),
 ran checkindex on it (both JDKs) passed. Index size and deletions were 
 reported
 correctly this time. I also checked an force-merged PANGAEA index, passed
 also.
 - I used IndexUpgrader to upgrade both 3.6.1 indexes, passed.
 - I checked the output of IndexUpgrader again with CheckIndex, the output for
 the force-merged 3.6.1 index was identical to the migrated 4.0 index (number
 of terms,)
 - I compared the index sizes of the single-segment 3.6.1 and 4.0.0 indexes: 
 4.0
 was slightly larger. You have to know that this index contains thousands of
 fields (it allows to search in XML based on XQuery-like, so we have a field 
 for
 every possible xpath of our rather complex XML schema in the index). The
 search speed improved dramatically (because of the separate term dictionaries
 for every field). In addition, so the bigger size is also caused by splitting 
 fields to
 separate term dictionaries vs. one in 3.6. 4.0 also has more statistics. I 
 think an
 index with 10 fields may be smaller, will try this a little bit later with 
 another
 index
 - I used the demo module to run some text-only queries, they passed.
 
 On small thing: We mention (in lucene's package), that Java 6 is needed, so we
 should at least mention that in the release notes. We should improve our
 docs/index.(html|xsl) to mention system requirements. Same for Solr. We have
 a system requirements page on the website, but that is unversioned, so we
 should also add a section for 4.0 there. But this is not the way to go. We 
 should
 also mention that Java 7 is the preferred Java version, if you have 1.7.0_01 
 at
 least.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Saturday, October 06, 2012 10:11 AM
  To: dev@lucene.apache.org
  Subject: VOTE: release 4.0 (RC2)
 
  artifacts here: http://s.apache.org/lusolr40rc2
 
  Thanks for the good inspection of rc#1 and finding bugs, which found
  test bugs and other bugs!
  I am happy this was all discovered and sorted out before release.
 
  vote stays open until wednesday, the weekend is just extra time for
  evaluating the RC.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) add support for running the same test method many times

[
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470975#comment-13470975
]

Dawid Weiss commented on LUCENE-4463:
-

I absolutely understand. There seem to be a few recurring scenarios:
- random test (exploring the combinations space; typically jenkins)
- random test, many repetitions of a single test method, constant seed
(-Dtestcase=... -Dtests.iters=... -Dtests.seed=XXX:YYY)
- random test, many repetitions of a single test method, variable seed starting
from a single master (-Dtestcase=... -Dtests.iters=... -Dtests.seed=XXX)
- random test, many repetitions of a single suite, constant seed
(-Dtestcase=... -Dtests.dups=... -Dtests.seed=...); this also applies for
repeating a single test method within a suite but accelerated to run on
multiple cores if one has many.
- random test, many repetitions of a single suite, random seed (-Dtestcase=...
-Dtests.dups=...).

We currently seem to have all these except for the last one. I have a working
patch in my head, I'll attach shortly.

Btw. I don't think there's anything I can do to make Mike NOT run his
Python/SSH magic because he scatters tests across a farm of machines... I plan
to do this for junit4 around year 2020, he, he. Not that it's very complicated
technically but it'd require a lot of refactorings and then testing for
potential infrastructure problems, detecting hung processes/sockets/jvms, etc.

add support for running the same test method many times
---

Key: LUCENE-4463
URL: https://issues.apache.org/jira/browse/LUCENE-4463
Project: Lucene - Core
Issue Type: Wish
Components: general/build
Reporter: Robert Muir
Attachments: LUCENE-4463.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds


 [ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4463:


Summary: Add support for running the same test method/class many times with 
different class seeds  (was: add support for running the same test method many 
times)

 Add support for running the same test method/class many times with different 
 class seeds
 

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4463) add support for running the same test method many times


 [ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-4463:
---

Assignee: Dawid Weiss

 add support for running the same test method many times
 ---

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1572 - Failure!

2012-10-06 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/1572/
Java: 64bit/jdk1.8.0-ea-b58 -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 9044 lines...]
[junit4:junit4] ERROR: JVM J1 ended with an exception, command line: 
/mnt/ssd/jenkins/tools/java/64bit/jdk1.8.0-ea-b58/jre/bin/java 
-XX:+UseParallelGC -Dtests.prefix=tests -Dtests.seed=EA34BF7EB6C56752 -Xmx512M 
-Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false 
-Dtests.lockdir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build 
-Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random 
-Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.1 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/testlogging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=3 -DtempDir=. 
-Djava.io.tmpdir=. 
-Dtests.sandbox.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build/solr-core
 
-Dclover.db.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.1-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -classpath

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds

2012-10-06 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470976#comment-13470976
]

Uwe Schindler commented on LUCENE-4463:
---

bq. Btw. I don't think there's anything I can do to make Mike NOT run his
Python/SSH magic because he scatters tests across a farm of machines... I plan
to do this for junit4 around year 2020, he, he. Not that it's very complicated
technically but it'd require a lot of refactorings and then testing for
potential infrastructure problems, detecting hung processes/sockets/jvms, etc.

I dont think you need to do that: He should install Jenkins on his farmhouse
machine and then setup a slave in the GUI for every combine harvester operated
by his slaves. He can then create a job, not bound to a specific node and run
it on all slaves in parallel. Very easy to setup, the SSH-Magic is included in
Jenkins (dumb slave ): Jenkins connects via SSH to the slave, copies the
slave.jar via scp and starts the jenkins cows. On the dairies you don’t need to
setup anything beyond a VM in $PATH.

Add support for running the same test method/class many times with different
class seeds

Key: LUCENE-4463
URL: https://issues.apache.org/jira/browse/LUCENE-4463
Project: Lucene - Core
Issue Type: Wish
Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
Attachments: LUCENE-4463.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470980#comment-13470980
 ] 

Dawid Weiss commented on LUCENE-4463:
-

Yeah, I'm sure Mike will stick to his SSH scripts though :)

Anyway, my first idea won't work. The seed decorator has to have a constant mix 
function for a given class -- it cannot change over time for the same time 
because then you wouldn't know the actual seed (and be able to repeat it) if a 
failure happened at iteration  1.

I'll try with my second idea which requires modification to the runner. The 
problem again is that this involves a longer cycle of releasing via maven, etc. 

 Add support for running the same test method/class many times with different 
 class seeds
 

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds

2012-10-06 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470991#comment-13470991
]

Michael McCandless commented on LUCENE-4463:

The scatter tests across a bunch of networked machines script is here:
http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/runRemoteTests.py
... it just uses randomizedrunner to execute the tests, but ssh to distribute
each test case to the N machines. All JVMs (N per remote machine) pull from a
single shared tasks queue, in order of slowest test to fastest test ... it
communicates with randomizedrunner using its nice stdin/.events API :)

It runs [nearly!] all Lucene/Solr tests across N machines and reports any
failures ... the source code is scary and has hardwired constants for my env
... but it makes running all tests wicked fast.

But that's a very different use case than beasting a single test (this
issue). For that I use
http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/repeatLuceneTest.py
... however, it's single threaded, and does not run on remote machines ...
would be fun to fix that!

bq. He should install Jenkins on his farmhouse machine and then setup a slave

Well I think we need to solve this issue first (how to run many iters of a
single testcase testmethod, each w/ a different seed)? Then I agree Jenkins
could be used for distribution instead of ssh + scripts.

Add support for running the same test method/class many times with different
class seeds

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds

2012-10-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470993#comment-13470993
 ] 

Michael McCandless commented on LUCENE-4463:


bq. Yeah, I'm sure Mike will stick to his SSH scripts though

Not if we had an efficient way to distribute tests across N JVMs running on M 
machines from a single queue.

One big problem w/ runRemoteTests.py is it does CLASSPATH pollution, ie the 
CLASSPATH it runs with is the union of all CLASSPATHs for all tests ... this is 
bad because then it fails to catch dependency problems, or cases when module X 
shouldn't use module Y but does.  This also causes certain tests to false fail 
...

 Add support for running the same test method/class many times with different 
 class seeds
 

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: release 4.0 (RC2)

2012-10-06 Thread Michael McCandless

+1

Smoke tester is happy in my env (Ubuntu 12.04, Javas 1.6.0_32 / 1.7.0_04).

Mike McCandless

http://blog.mikemccandless.com

On Sat, Oct 6, 2012 at 4:10 AM, Robert Muir rcm...@gmail.com wrote:
 artifacts here: http://s.apache.org/lusolr40rc2

 Thanks for the good inspection of rc#1 and finding bugs, which found
 test bugs and other bugs!
 I am happy this was all discovered and sorted out before release.

 vote stays open until wednesday, the weekend is just extra time for
 evaluating the RC.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Build failed in Jenkins: slow-io-beasting #2325

See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/2325/

--
[...truncated 682 lines...]
[junit4:junit4] OK  0.03s J2 | TestOmitNorms.testNoNrmFile
[junit4:junit4] Completed on J2 in 1.27s, 5 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestThreadedForceMerge
[junit4:junit4] OK  1.01s J3 | TestThreadedForceMerge.testThreadedForceMerge
[junit4:junit4] Completed on J3 in 1.03s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestCodecs
[junit4:junit4] OK  0.02s J0 | TestCodecs.testFixedPostings
[junit4:junit4] OK  0.13s J0 | TestCodecs.testSepPositionAfterMerge
[junit4:junit4] OK  0.13s J0 | TestCodecs.testRandomPostings
[junit4:junit4] Completed on J0 in 0.30s, 3 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestMixedCodecs
[junit4:junit4] OK  0.42s J2 | TestMixedCodecs.test
[junit4:junit4] Completed on J2 in 0.45s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestIndexInput
[junit4:junit4] OK  0.03s J3 | TestIndexInput.testBufferedIndexInputRead
[junit4:junit4] OK  0.03s J3 | TestIndexInput.testRawIndexInputRead
[junit4:junit4] OK  0.02s J3 | TestIndexInput.testByteArrayDataInput
[junit4:junit4] Completed on J3 in 0.33s, 3 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestParallelCompositeReader
[junit4:junit4] OK  0.02s J2 | 
TestParallelCompositeReader.testIncompatibleIndexes2
[junit4:junit4] OK  0.02s J2 | 
TestParallelCompositeReader.testIncompatibleIndexes1
[junit4:junit4] OK  0.08s J2 | 
TestParallelCompositeReader.testIgnoreStoredFields
[junit4:junit4] OK  0.08s J2 | TestParallelCompositeReader.testRefCounts1
[junit4:junit4] OK  0.17s J2 | 
TestParallelCompositeReader.testQueriesCompositeComposite
[junit4:junit4] OK  0.00s J2 | TestParallelCompositeReader.testRefCounts2
[junit4:junit4] OK  0.20s J2 | 
TestParallelCompositeReader.testIncompatibleIndexes3
[junit4:junit4] OK  0.10s J2 | TestParallelCompositeReader.testQueries
[junit4:junit4] Completed on J2 in 0.67s, 8 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping
[junit4:junit4] OK  0.52s J3 | TestLazyProxSkipping.testLazySkipping
[junit4:junit4] OK  0.14s J3 | TestLazyProxSkipping.testSeek
[junit4:junit4] Completed on J3 in 0.67s, 2 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestCrash
[junit4:junit4]   1 TEST: initIndex
[junit4:junit4]   1 TEST: done initIndex
[junit4:junit4]   1 TEST: now crash
[junit4:junit4] OK  0.11s J0 | TestCrash.testWriterAfterCrash
[junit4:junit4] OK  0.22s J0 | TestCrash.testCrashAfterClose
[junit4:junit4] OK  0.06s J0 | TestCrash.testCrashAfterCloseNoWait
[junit4:junit4] OK  0.40s J0 | TestCrash.testCrashWhileIndexing
[junit4:junit4] OK  0.17s J0 | TestCrash.testCrashAfterReopen
[junit4:junit4] Completed on J0 in 1.13s, 5 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestOmitTf
[junit4:junit4] OK  0.27s J3 | TestOmitTf.testNoPrxFile
[junit4:junit4] OK  0.00s J3 | TestOmitTf.testStats
[junit4:junit4] OK  0.00s J3 | TestOmitTf.testOmitTermFreqAndPositions
[junit4:junit4] OK  0.02s J3 | TestOmitTf.testMixedRAM
[junit4:junit4] OK  0.03s J3 | TestOmitTf.testBasic
[junit4:junit4] OK  0.16s J3 | TestOmitTf.testMixedMerge
[junit4:junit4] Completed on J3 in 0.49s, 6 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestDocValuesTypeCompatibility
[junit4:junit4] OK  0.05s J2 | 
TestDocValuesTypeCompatibility.testIncompatibleTypesBytes
[junit4:junit4] OK  0.09s J2 | 
TestDocValuesTypeCompatibility.testAddCompatibleByteTypes
[junit4:junit4] OK  0.30s J2 | 
TestDocValuesTypeCompatibility.testAddCompatibleDoubleTypes
[junit4:junit4] OK  0.09s J2 | 
TestDocValuesTypeCompatibility.testAddCompatibleIntTypes
[junit4:junit4] OK  0.27s J2 | 
TestDocValuesTypeCompatibility.testAddCompatibleDoubleTypes2
[junit4:junit4] Completed on J2 in 0.89s, 5 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestForceMergeForever
[junit4:junit4] OK  0.64s J0 | TestForceMergeForever.test
[junit4:junit4] Completed on J0 in 0.66s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestTermVectorsWriter
[junit4:junit4] OK  0.00s J2 | 
TestTermVectorsWriter.testTermVectorCorruption3
[junit4:junit4] OK  0.02s J2 | 
TestTermVectorsWriter.testNoTermVectorAfterTermVectorMerge
[junit4:junit4] OK  0.00s J2 | 
TestTermVectorsWriter.testEndOffsetPositionCharAnalyzer
[junit4:junit4] OK  0.02s J2 | 
TestTermVectorsWriter.testDoubleOffsetCounting2
[junit4:junit4] OK  0.00s J2 | 
TestTermVectorsWriter.testTermVectorCorruption
[junit4:junit4] OK  0.02s J2 | 
TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter
[junit4:junit4] OK  0.06s J2 |

Jenkins build is back to normal : slow-io-beasting #2326

See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/2326/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471005#comment-13471005
 ] 

Dawid Weiss commented on LUCENE-4463:
-

bq. Not if we had an efficient way to distribute tests across N JVMs running on 
M machines from a single queue.

Yeah... I'll try to fix this issue so that you can run across N JVMs, but still 
locally. I don't think I'll have the time in the nearest future to work on 
truly distributed mode.

 Add support for running the same test method/class many times with different 
 class seeds
 

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Kazuaki Hiraga (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471068#comment-13471068
]

Kazuaki Hiraga commented on LUCENE-3922:

Sorry for this late reply.

Although I have some request to improve capability, this is very helpful and
nice charfilter for me.
Thank you! Christian!!

My requests are the following:

Is it difficult to support numbers with period as the following?
３．２兆円
５．２億円

On the other hand, I agree with Christian to not preserving leading zeros. So,
◯◯七 doesn't need to become 007.

I think It would be helpful that this charfilter supports old Kanji numeric
characters (KYU-KANJI or DAIJI) such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參
(Three), or configureable.

Add Japanese Kanji number normalization to Kuromoji
---

Key: LUCENE-3922
URL: https://issues.apache.org/jira/browse/LUCENE-3922
Project: Lucene - Core
Issue Type: New Feature
Components: modules/analysis
Affects Versions: 4.0-ALPHA
Reporter: Kazuaki Hiraga
Labels: features
Attachments: LUCENE-3922.patch

Japanese people use Kanji numerals instead of Arabic numerals for writing
price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and
十二月(December). So, we would like to normalize those Kanji numerals to Arabic
numerals (I don't think we need to have a capability to normalize to Kanji
numerals).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals

2012-10-06 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471081#comment-13471081
 ] 

David Smiley commented on LUCENE-4285:
--

I admit checked exceptions would have alerted me to my bug, but that doesn't 
make the API any nicer -- I still need null checks littered through my FST user 
code now.  I don't know the FST internals but I'd be surprised to hear that 
adding support for an empty FST adds appreciable overhead.  If this overhead 
we're discussing is a simple conditional check, then this is net-zero since as 
it is I need these null checks on my end of the API due to my FST being 
potentially null.

 Improve FST API usability for mere mortals
 --

 Key: LUCENE-4285
 URL: https://issues.apache.org/jira/browse/LUCENE-4285
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: David Smiley

 FST technology is something that has brought amazing advances to Lucene, yet 
 the API is hard to use for the vast majority of users like me.  I know that 
 performance of FSTs is really important, but surely a lot can be done without 
 sacrificing that.
 (comments will hold specific ideas and problems)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals


[ 
https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471083#comment-13471083
 ] 

Dawid Weiss commented on LUCENE-4285:
-

Things are more difficult than they seem at the surface. An elegant solution 
would encode an empty automaton without any extra flags or checks. In an arc 
based representation there is simply no notion of an empty set of arcs though 
-- there needs to be at least one and if it's present on the root state then, 
well, it's no longer an empty automaton. Like I said -- this can be modeled 
with an initial state transition (the symbol doesn't matter); if this 
transition is final then this the automaton is empty (there is no actual root 
state). But this also changes how traversals are implemented and would affect 
all of the existing code.

 Improve FST API usability for mere mortals
 --

 Key: LUCENE-4285
 URL: https://issues.apache.org/jira/browse/LUCENE-4285
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: David Smiley

 FST technology is something that has brought amazing advances to Lucene, yet 
 the API is hard to use for the vast majority of users like me.  I know that 
 performance of FSTs is really important, but surely a lot can be done without 
 sacrificing that.
 (comments will hold specific ideas and problems)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds


[ 
https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471090#comment-13471090
 ] 

Dawid Weiss commented on LUCENE-4463:
-

I thought about it for a bit longer and exercised a few scenarios. The problem 
is that I designed everything (and I mean everything) with two ideas in mind:
- every random element (be it a selection of components, shuffling of order or 
whatever) is a derivative of a single master seed. This seed is picked by 
junit4 task and is then used to sort suites to be executed, pick parameters, 
then is passed to suites to log messages, stack traces, etc.
- execution of a test suite (in the sense of a single class) is isolated from 
anything else -- any other class running before or after. So you can provide 
the same master seed for a single class and it should execute identically, even 
if it's detached from the entire sequence of suites than ran during the full 
test. The seed decorators that we currently use alter the master seed with a 
hash of the test class's name to make it different for each class running under 
the same master seed, but this is an independent operation -- whether something 
ran before or after doesn't matter.

The idea of running the same suite many times with a _different_ master seed 
each time conflicts with these assumptions because then every subsequent 
execution of the same class will _not_ be a derivative of the master seed 
anymore (and will most likely depend on how many classes executed before or 
even be random). 

Let me illustrate this on an example. Let's say the master seed is XXX; we use 
this seed to pick file.encoding and for this seed it becomes UTF-8. If we now 
pick a random master seed (say, YYY) for concrete class and it fails, it'll 
report YYY back to the console. But if you ant -Dtests.seed=YYY then the 
selection of file.encoding will be different because, ehm, it's not XXX 
anymore. file.encoding has to be picked before the JVM is started so it cannot 
be done from within the running test runner, etc.

This is just one of the problem scenarios, there are more but I hope you get 
the picture.

A clean solution to the problem would be to make a loop inside ant, around 
the contents of the test-macro (so that the entire sequence of picking the 
master seed, picking parameters, spawning JVMs, etc. is repeated). This isn't 
really going to make matters much faster because it'll fork new JVMs etc.

A dirty solution is to screw the above idealistic point of view and have a 
seed decorator which affects the master seed before it is propagated to each 
suite. This will cause all the headaches mentioned above PLUS you'll have to 
get the failing seed directly from the failing test (stack trace or whatever 
other message is printed) because it won't be the master seed JUnit4 greets you 
with. Then you could indeed run as many concurrent instances of the same suite 
with random seeds as you like (JVMs reused). This does sound like 
super-advanced and convoluted piece of functionality for something that will be 
probably used pretty frequently (which means lots of wtfs on the mailing list).

Don't know, really.



 Add support for running the same test method/class many times with different 
 class seeds
 

 Key: LUCENE-4463
 URL: https://issues.apache.org/jira/browse/LUCENE-4463
 Project: Lucene - Core
  Issue Type: Wish
  Components: general/build
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-4463.patch


 I have a shell script for this, mike has a python script, its annoying :)
 I want to do something like this:
 ant beast -Dtestcase= -Dtestmethod= -Diterations=100
 I would be happy with a simple loop that just invokes 'test' somehow: getting 
 a fresh new JVM to each iteration is desirable anyway (so you get fresh 
 codecs, etc). 
 the -Dtests.iters is not really useful for this because it does not allow 
 -Dtestmethod and it does not give a fresh jvm.
 bonus points if it can use multiple jvms at the same time though :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.6.0_35) - Build # 1065 - Failure!

2012-10-06 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1065/
Java: 64bit/jdk1.6.0_35 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 26494 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:245: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:551: 
Unable to delete file 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build\analysis\common\lucene-analyzers-common-4.1-SNAPSHOT.jar

Total time: 65 minutes 15 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.6.0_35 -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: release 4.0 (RC2)

2012-10-06 Thread Tommaso Teofili

+1 smoke tests are ok.

Tommaso

2012/10/6 Michael McCandless luc...@mikemccandless.com

 +1

 Smoke tester is happy in my env (Ubuntu 12.04, Javas 1.6.0_32 / 1.7.0_04).

 Mike McCandless

 http://blog.mikemccandless.com

 On Sat, Oct 6, 2012 at 4:10 AM, Robert Muir rcm...@gmail.com wrote:
  artifacts here: http://s.apache.org/lusolr40rc2
 
  Thanks for the good inspection of rc#1 and finding bugs, which found
  test bugs and other bugs!
  I am happy this was all discovered and sorted out before release.
 
  vote stays open until wednesday, the weekend is just extra time for
  evaluating the RC.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3918) Create dist-excl-slf4j target

2012-10-06 Thread Shawn Heisey (JIRA)

Shawn Heisey created SOLR-3918:
--

 Summary: Create dist-excl-slf4j target
 Key: SOLR-3918
 URL: https://issues.apache.org/jira/browse/SOLR-3918
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0-BETA, 3.6.1
Reporter: Shawn Heisey
Priority: Trivial
 Fix For: 4.0, 3.6.2, 4.1, 5.0


If you want to create an entire dist target but leave out slf4j bindings, you 
must currently use this:

ant dist-solrj, dist-core, dist-test-framework, dist-contrib dist-war-excl-slf4j

It would be better to have a single target.  Attaching a patch against 
branch_4x for this.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3918) Create dist-excl-slf4j target

2012-10-06 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-3918:
---

Attachment: SOLR-3918.patch

 Create dist-excl-slf4j target
 -

 Key: SOLR-3918
 URL: https://issues.apache.org/jira/browse/SOLR-3918
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Shawn Heisey
Priority: Trivial
 Fix For: 4.0, 3.6.2, 4.1, 5.0

 Attachments: SOLR-3918.patch


 If you want to create an entire dist target but leave out slf4j bindings, you 
 must currently use this:
 ant dist-solrj, dist-core, dist-test-framework, dist-contrib 
 dist-war-excl-slf4j
 It would be better to have a single target.  Attaching a patch against 
 branch_4x for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3918) Create dist-excl-slf4j target

2012-10-06 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-3918:
---

Fix Version/s: (was: 4.0)

 Create dist-excl-slf4j target
 -

 Key: SOLR-3918
 URL: https://issues.apache.org/jira/browse/SOLR-3918
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Shawn Heisey
Priority: Trivial
 Fix For: 3.6.2, 4.1, 5.0

 Attachments: SOLR-3918.patch


 If you want to create an entire dist target but leave out slf4j bindings, you 
 must currently use this:
 ant dist-solrj, dist-core, dist-test-framework, dist-contrib 
 dist-war-excl-slf4j
 It would be better to have a single target.  Attaching a patch against 
 branch_4x for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3916) fl parsing is sensitive to newlines at the end of field names

2012-10-06 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471116#comment-13471116
 ] 

Yonik Seeley commented on SOLR-3916:


bq. If you look at the patch, you can see my point quite easily: when parsing 
the fl, ReturnFields is naively only treating the ' ' character as whitespace 
and not recognizing any other whitespace characters that might exist between 
field names.

I had looked at the patch, and still didn't consider not checking for other 
types of whitespace between fieldnames a bug since we never promised to support 
that.  If you look at the code that was used before ReturnFields, it also used 
a pattern that only split on comma or space.  The previous code did handle 
leading/trailing whitespace via using String.trim() first though.


 fl parsing is sensitive to newlines at the end of field names
 -

 Key: SOLR-3916
 URL: https://issues.apache.org/jira/browse/SOLR-3916
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.0, 4.1, 5.0

 Attachments: SOLR-3916.patch


 As reported by giovanni.bricconi on the user list, there is a bug in fl 
 parsing that causes solr to get confused when a field name is followed by a 
 newline character -- eg: in a requestHandler default like...
 {noformat}
 !-- newlines showing using $ --$
 str name=fl$
sku,store_slug$  
 /str$
 {noformat}
 ...this results in solr assuming it should use function parsing to evaluate 
 the field name, which can cause missleading errors if the field name can't be 
 used in a function (eg: can not use FieldCache on multivalued field: 
 store_slug)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Lance Norskog (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471117#comment-13471117
]

Lance Norskog commented on LUCENE-3922:
---

bq. On the other hand, I agree with Christian to not preserving leading zeros.
So, ◯◯七 doesn't need to become 007.
This example shows why leading zeros should be preserved :)

There are different kinds of text search. Searching for media titles like James
Bond movies is a very different thing from searching newspaper articles. You
might want to find ◯◯七 as the Japanese-language release and 007 as the
English-language release. These numbers are brands, not numbers.

Add Japanese Kanji number normalization to Kuromoji
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-10-06 Thread Lance Norskog (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471121#comment-13471121
]

Lance Norskog commented on LUCENE-3921:
---

Another way to look at this is that Smart Chinese and Kuromoji are systems for
minimizing bogus bigrams. This allows phrase queries to function without
finding bogus results. The CJK bigram creator generates bogus bigrams, which
cause phrase queries to find bogus results. [SOLR-3653] is the result of my
experience in supporting searching Chinese legal documents. I have some useful
numbers at the end of the page.

Add decompose compound Japanese Katakana token capability to Kuromoji
-

Key: LUCENE-3921
URL: https://issues.apache.org/jira/browse/LUCENE-3921
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 4.0-ALPHA
Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
Labels: features

Japanese morphological analyzer, Kuromoji doesn't have a capability to
decompose every Japanese Katakana compound tokens to sub-tokens. It seems
that some Katakana tokens can be decomposed, but it cannot be applied every
Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ
don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary
has バッグ in its entry. I would like to apply the decompose feature to every
Katakana tokens if the sub-tokens are in the dictionary or add the capability
to force apply the decompose feature to every Katakana tokens.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Kazuaki Hiraga (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471123#comment-13471123
 ] 

Kazuaki Hiraga commented on LUCENE-3922:


Lance, you may be right.  Although I have never seen that Japanese people use 
Kanji numbers for James Bond movies :-), I can't say that we never use Kanji 
for that kind of expression.

Christian, Is it possible to choose preserve leading zeros or not?

 Add Japanese Kanji number normalization to Kuromoji
 ---

 Key: LUCENE-3922
 URL: https://issues.apache.org/jira/browse/LUCENE-3922
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.0-ALPHA
Reporter: Kazuaki Hiraga
  Labels: features
 Attachments: LUCENE-3922.patch


 Japanese people use Kanji numerals instead of Arabic numerals for writing 
 price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
 numerals (I don't think we need to have a capability to normalize to Kanji 
 numerals).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-10-06 Thread Lance Norskog (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471121#comment-13471121
]

Lance Norskog edited comment on LUCENE-3921 at 10/7/12 12:33 AM:
-

Statistical models and rule-based models always have a failure rate. When you
use them you have to decide what to do about the failures. Attacking the
failures with another model drives toward Xeno's Paradox. For Chinese language
search, breaking the failures into bigrams makes a lot of sense. The CJK bigram
generator creates a massive amount of bogus bigrams. Bogus bigrams case bogus
results from sloppy phrase searches.

Smart Chinese and Kuromoji are not systems for doing natural-language
processing). They are systems for minimizing bogus bigrams. This allows sloppy
phrase queries to find fewer bogus results. In my use case, Smart Chinese
created only 2% (40k/1.8m) of the possible bigrams. [SOLR-3653] is the result
of my experience in supporting searching Chinese legal documents. I have some
useful numbers at the end of the page.

was (Author: lancenorskog):
Statistical models and rule-based models always have a failure rate. When
you use them you have to decide what to do about the failures. Attacking the
failures with another model drives toward Xeno's Paradox. For Chinese language
search, breaking the failures into bigrams makes a lot of sense.

Add decompose compound Japanese Katakana token capability to Kuromoji
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-10-06 Thread Christian Moen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471132#comment-13471132
]

Christian Moen commented on LUCENE-3922:

{quote}
Is it difficult to support numbers with period as the following?
３．２兆円
５．２億円
{quote}

Supporting this is no problem and a good idea.

{quote}
I think It would be helpful that this charfilter supports old Kanji numeric
characters (KYU-KANJI or DAIJI) such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參
(Three), or configureable.
{quote}

This is also easy to support.

As for making preserving zeros configurable, that's also possible, of course.

It's great to get more feedback on what sort of functionality we need and what
should be configurable options. Hopefully, we can find a good balance without
adding too much complexity.

Thanks for the feedback.

Add Japanese Kanji number normalization to Kuromoji
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_07) - Build # 1068 - Failure!

2012-10-06 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1068/
Java: 64bit/jdk1.7.0_07 -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 27190 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:245: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:551: 
Unable to delete file 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build\analysis\common\lucene-analyzers-common-4.1-SNAPSHOT.jar

Total time: 61 minutes 18 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.7.0_07 -XX:+UseG1GC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Build failed in Jenkins: slow-io-beasting #3090

See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/3090/

--
[...truncated 2472 lines...]
[junit4:junit4] OK  0.02s J0 | TestCompoundFile.testSingleFile
[junit4:junit4] Completed on J0 in 3.06s, 18 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestNRTReaderWithThreads
[junit4:junit4] OK  1.56s J2 | TestNRTReaderWithThreads.testIndexing
[junit4:junit4] Completed on J2 in 1.57s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestCrashCausesCorruptIndex
[junit4:junit4] OK  0.22s J2 | 
TestCrashCausesCorruptIndex.testCrashCorruptsIndexing
[junit4:junit4] Completed on J2 in 0.27s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestConcurrentMergeScheduler
[junit4:junit4] OK  0.25s J1 | 
TestConcurrentMergeScheduler.testDeleteMerging
[junit4:junit4] OK  0.48s J1 | TestConcurrentMergeScheduler.testNoWaitClose
[junit4:junit4] OK  0.38s J1 | TestConcurrentMergeScheduler.testNoExtraFiles
[junit4:junit4] OK  0.09s J1 | 
TestConcurrentMergeScheduler.testFlushExceptions
[junit4:junit4] Completed on J1 in 1.21s, 4 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestStressAdvance
[junit4:junit4] OK  0.39s J2 | TestStressAdvance.testStressAdvance
[junit4:junit4] Completed on J2 in 0.41s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.Test2BDocs
[junit4:junit4] OK  0.00s J3 | Test2BDocs.testOverflow
[junit4:junit4] OK  0.62s J3 | Test2BDocs.testExactlyAtLimit
[junit4:junit4] Completed on J3 in 1.26s, 2 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestConsistentFieldNumbers
[junit4:junit4] OK  0.08s J0 | TestConsistentFieldNumbers.testAddIndexes
[junit4:junit4] OK  0.06s J0 | 
TestConsistentFieldNumbers.testSameFieldNumbersAcrossSegments
[junit4:junit4] OK  0.05s J0 | TestConsistentFieldNumbers.testManyFields
[junit4:junit4] OK  0.86s J0 | 
TestConsistentFieldNumbers.testFieldNumberGaps
[junit4:junit4] Completed on J0 in 1.06s, 4 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestRollingUpdates
[junit4:junit4] OK  0.94s J1 | TestRollingUpdates.testUpdateSameDoc
[junit4:junit4] OK  0.23s J1 | TestRollingUpdates.testRollingUpdates
[junit4:junit4] Completed on J1 in 1.18s, 2 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterOnDiskFull
[junit4:junit4] OK  0.03s J0 | 
TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull
[junit4:junit4] OK  0.00s J0 | 
TestIndexWriterOnDiskFull.testImmediateDiskFull
[junit4:junit4] OK  0.03s J0 | 
TestIndexWriterOnDiskFull.testCorruptionAfterDiskFullDuringMerge
[junit4:junit4] OK  0.54s J0 | 
TestIndexWriterOnDiskFull.testAddIndexOnDiskFull
[junit4:junit4] Completed on J0 in 0.62s, 4 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestSegmentReader
[junit4:junit4] OK  0.13s J2 | TestSegmentReader.test
[junit4:junit4] OK  0.19s J2 | TestSegmentReader.testDocument
[junit4:junit4] OK  0.11s J2 | TestSegmentReader.testGetFieldNameVariations
[junit4:junit4] OK  0.26s J2 | TestSegmentReader.testTerms
[junit4:junit4] OK  0.11s J2 | TestSegmentReader.testNorms
[junit4:junit4] OK  0.11s J2 | TestSegmentReader.testTermVectors
[junit4:junit4] Completed on J2 in 0.95s, 6 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestMaxTermFrequency
[junit4:junit4] OK  0.36s J0 | TestMaxTermFrequency.test
[junit4:junit4] Completed on J0 in 0.37s, 1 test
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestStressIndexing2
[junit4:junit4] OK  0.30s J1 | TestStressIndexing2.testMultiConfig
[junit4:junit4] OK  0.11s J1 | TestStressIndexing2.testRandomIWReader
[junit4:junit4] OK  0.11s J1 | TestStressIndexing2.testRandom
[junit4:junit4] Completed on J1 in 0.52s, 3 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestCodecs
[junit4:junit4] OK  0.00s J3 | TestCodecs.testFixedPostings
[junit4:junit4] OK  0.02s J3 | TestCodecs.testSepPositionAfterMerge
[junit4:junit4] OK  1.40s J3 | TestCodecs.testRandomPostings
[junit4:junit4] Completed on J3 in 1.43s, 3 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestCrash
[junit4:junit4] OK  0.06s J2 | TestCrash.testCrashAfterClose
[junit4:junit4] OK  0.20s J2 | TestCrash.testCrashAfterCloseNoWait
[junit4:junit4] OK  0.03s J2 | TestCrash.testCrashWhileIndexing
[junit4:junit4]   1 TEST: initIndex
[junit4:junit4]   1 TEST: done initIndex
[junit4:junit4]   1 TEST: now crash
[junit4:junit4] OK  0.11s J2 | TestCrash.testWriterAfterCrash
[junit4:junit4] OK  0.25s J2 | TestCrash.testCrashAfterReopen
[junit4:junit4] Completed on J2 in 0.67s, 5 tests
[junit4:junit4] 
[junit4:junit4] Suite: org.apache.lucene.index.TestForceMergeForever
[junit4:junit4] OK  0.13s J3 | TestForceMergeForever.test

Jenkins build is back to normal : slow-io-beasting #3091