[jira] [Updated] (SOLR-3915) Color Legend for Cloud UI
[ https://issues.apache.org/jira/browse/SOLR-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3915: Attachment: SOLR-3915-screenshot.png Yeah of course - screenshot attached .. will update with any new patches coming Color Legend for Cloud UI - Key: SOLR-3915 URL: https://issues.apache.org/jira/browse/SOLR-3915 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.1 Attachments: SOLR-3915.patch, SOLR-3915-screenshot.png The meaning of the used shard colors is not really clear, integrate kind of a legend fo all possible node-states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470936#comment-13470936 ] Robert Muir commented on LUCENE-4463: - {quote} if you can settle for reusing the same JVM, tests.iters combined with testmethod (or tests.method=...*) will give you different test seeds everytime – only the global seed will be the same {quote} Right, we can't really settle for that for Lucene's tests. Thats because things like Codec are set at class level, so i could run this 100 times and press commit and watch it fail because jenkins gets a different codec. And we have a lot of these, and only more being added. Its a tradeoff, sure we could set Codec per-writer e.g. in our newIndexWriterConfig instead of per-test-class, but I think it makes debugging much simpler to look at it as a parameterized-test-class at this level of Codec apis maturity so we can easily see which one the test got when it failed. So really we need a different per-class seed too: same as you would get when doing 'ant test' in a loop with an inefficient shell script. add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470937#comment-13470937 ] Dawid Weiss commented on LUCENE-4463: - Let me start from the beginning. I talked about it once but I can't find it now. 1. testmethod vs. tests.method The reason for the complication in testmethod vs. tests.method is stemming from how JUnit works. A test description must (in practice) be unique, otherwise tools just go crazy. So to make a test repeat, its name must be made unique. That's why if you do tests.iters=X every repetition of a single test method will in fact be named uniquely as: testMethod {#seq seed=[...]} These are not things just added to the report, this is a method name as JUnit Description object sees it. It's odd but it's a workaround that works and that is (as far as I'm concerned) the only one possible. So when you use -Dtestname=X this is an alias for -Dtests.method=X which will filter out all these repeated tests (because effectively they don't match the mattern). In order to include them, you need to add a glob like: -Dtests.method=X*. Hoss and I added this to the test-help to make it clear(er) a while back. 2. Seeds and tests.dup The master test seed is passed to junit4 task once and it just stays there. Everything else is derived from it. The duplication you see is a simple trick -- we just duplicate the file name on input. Because every suite gets the same random seed (mixed with its class name to make it more random across a single run), a duplicated identical suite will still get the same master seed every time. This option was meant for accelerating a test scenario in which we want to repeat a single suite/seed combination many times and want to do it using multiple parallel JVMs. 3. What Robert wants (across-jvm repetition of a single suite with a different seed each time). Is effectively impossible right now without re-spinning junit4 with a different seed each time. I don't see I could marry all this into working with both the scenario above and with Robert wants although I admit both are useful. A script (loop) doing an antcall would work but this seems like an overkill. Fixing this at JUnit4 level isn't trivial either because the seed is currently picked even before junit4 is started (to select the target charset). add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470940#comment-13470940 ] Dawid Weiss commented on LUCENE-4463: - I have two ideas but they both have shortcomings -- I could make tests.dups run with different seed for each suite but they'd be the same sequence _on each forked JVM_ (add a static field to the current class-name-mixer and just mix with the repetition of the same suite name). An alternative is to modify junit4 and do the same, but then to allow both the same seed and different seed each (different scenarios) we'd need... yet another -D option :) add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470943#comment-13470943 ] Robert Muir commented on LUCENE-4463: - I know its not easy: thats why its a wish task :) I guess I'm just basically listing what could be seen as an expert feature here, but arguably necessary if you are trying to do randomized tests. The fact is that things dont always reproduce 100%, and you know this is definitely a failure in our tests (e.g. the current situation motivated me to open LUCENE-4460). But really part of random testing is you know, you dont have to try to write targeted tests but just throw hardware at the problem (which I'm doing... my office is really hot right now!). The frustrating part is I think ideally you want to basically treat this whole randomized test situation like a normal deterministic unit test, you know like a normal developer would have, so you know you fixed the bug, even if the test isn't great and doesnt reproduce 100%, you want to know its really fixed rather than taking blind stabs, waiting to see if all the computers in your house running full throttle will trip a bug in 24 hours to declare success :) So I'm just basically opening this wish task to try to think of ways to make this easier and more efficient. I'd actually go so far to say the tests.iters is really outdated for lucene's tests these days (since we have so much class-level parameterization and we should be focusing on this tests.dups (and maybe removing the tests.iters totally). Maybe thats just particular to us though, but as I mentioned above I think we show some real use cases for parameterizing the entire test class with certain things because it simplifies debugging. add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3915) Color Legend for Cloud UI
[ https://issues.apache.org/jira/browse/SOLR-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3915: Attachment: SOLR-3915.patch SOLR-3915-screenshot.png Next Version, using the same markup as the graph itself does. For Documentation, the Definition is coming from [mark's comment on SOLR-3174|https://issues.apache.org/jira/browse/SOLR-3174?focusedCommentId=13255923page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13255923] Color Legend for Cloud UI - Key: SOLR-3915 URL: https://issues.apache.org/jira/browse/SOLR-3915 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.1 Attachments: SOLR-3915.patch, SOLR-3915.patch, SOLR-3915-screenshot.png, SOLR-3915-screenshot.png The meaning of the used shard colors is not really clear, integrate kind of a legend fo all possible node-states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470945#comment-13470945 ] Stefan Matheis (steffkes) commented on SOLR-3174: - bq. Another thing we should probably do is add a key for the meaning of the colors. oO didn't see this comment yet ... but now we have one, coming with SOLR-3915 =) Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Components: web gui Reporter: Ryan McKinley Assignee: Stefan Matheis (steffkes) Fix For: 4.0-ALPHA Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, SOLR-3174-graph.png, SOLR-3174.patch, SOLR-3174.patch, SOLR-3174.patch, SOLR-3174.patch, SOLR-3174.patch, SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, SOLR-3174-rgraph.png It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470946#comment-13470946 ] Robert Muir commented on LUCENE-4463: - And just to give a little more background, I mean the stuff we are dealing with is really crazy in some sense. You don't see the normal jenkins servers emitting failures: what happened is we realized we weren't really testing XYZ that we thought we were testing for months: I'm trying to help make up for lost time :( So all these failures you have seen this week have been typically nasty-to-debug hard-to-reproduce long-tail failures that would normally take a ton of time to show up: Its just been Mike debugging and fixing and me trying to figure out more ways to provoke these failures in a more efficient way, like good-cop/bad-cop add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
VOTE: release 4.0 (RC2)
artifacts here: http://s.apache.org/lusolr40rc2 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs and other bugs! I am happy this was all discovered and sorted out before release. vote stays open until wednesday, the weekend is just extra time for evaluating the RC. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3640) Core Admin UI issues on Chrome
[ https://issues.apache.org/jira/browse/SOLR-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470957#comment-13470957 ] Stefan Matheis (steffkes) commented on SOLR-3640: - sorry for the late response, [~astubbs] is this still valid? otherwise i'd like to resolve it Core Admin UI issues on Chrome -- Key: SOLR-3640 URL: https://issues.apache.org/jira/browse/SOLR-3640 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0-ALPHA Reporter: Antony Stubbs Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, Screen Shot 2012-07-18 at 3.05.10 PM.png Trying to click on any of the buttons apparently has no affect. They also have no icons next to them anymore and appear down the left. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3917) Partial State is not defined for Dynamic Fields Types
Stefan Matheis (steffkes) created SOLR-3917: --- Summary: Partial State is not defined for Dynamic Fields Types Key: SOLR-3917 URL: https://issues.apache.org/jira/browse/SOLR-3917 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.1 SOLR-3734 introduced a partial state for fields, which are referenced f.e. within a copyfield, but are not explicit declared in the schema -- checking not correctly for the state, the schema browser throws an error for dynamic fields and types -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3917) Partial State is not defined for Dynamic Fields Types
[ https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3917: Attachment: SOLR-3917.patch Partial State is not defined for Dynamic Fields Types --- Key: SOLR-3917 URL: https://issues.apache.org/jira/browse/SOLR-3917 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.1 Attachments: SOLR-3917.patch SOLR-3734 introduced a partial state for fields, which are referenced f.e. within a copyfield, but are not explicit declared in the schema -- checking not correctly for the state, the schema browser throws an error for dynamic fields and types -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3734) Forever loop in schema browser
[ https://issues.apache.org/jira/browse/SOLR-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470962#comment-13470962 ] Stefan Matheis (steffkes) commented on SOLR-3734: - Committed revision 1394980. lucene_solr_4_0 Forever loop in schema browser -- Key: SOLR-3734 URL: https://issues.apache.org/jira/browse/SOLR-3734 Project: Solr Issue Type: Bug Components: Schema and Analysis, web gui Reporter: Lance Norskog Assignee: Stefan Matheis (steffkes) Fix For: 4.1 Attachments: SOLR-3734.patch, SOLR-3734.patch, SOLR-3734_schema_browser_blocks_solr_conf_dir.zip When I start Solr with the attached conf directory, and hit the Schema Browser, the loading circle spins permanently. I don't know if the problem is in the UI or in Solr. The UI does not display the Ajax solr calls, and I don't have a debugging proxy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3917) Partial State is not defined for Dynamic Fields Types
[ https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3917: Attachment: SOLR-3917.patch updated patch, using {{is_f}} to identify if we are displaying the details of a field Partial State is not defined for Dynamic Fields Types --- Key: SOLR-3917 URL: https://issues.apache.org/jira/browse/SOLR-3917 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.1 Attachments: SOLR-3917.patch, SOLR-3917.patch SOLR-3734 introduced a partial state for fields, which are referenced f.e. within a copyfield, but are not explicit declared in the schema -- checking not correctly for the state, the schema browser throws an error for dynamic fields and types -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470967#comment-13470967 ] Christian Moen commented on LUCENE-3921: Lance, The idea I had in mind for Japanese uses language specific characteristics for katakana terms and perhaps weights that are dictionary-specific as well. However, we are hacking the our statistical model here and there are limitations as to how far we can go with this. I don't know a whole lot about the Smart Chinese toolkit, but I believe the same approach to compound segmentation could work for Chinese as well. However, weights and implementation would likely to be separate. Note that the above is really about one specific kind of compound segmentation that applies to Japanese so the thinking was to add additional heuristics for this specific type that is particularly tricky. It might be a good idea to approach this problem also using the {{DictionaryCompoundWordTokenFilter}} and collectively build some lexical assets for compound splitting for the relevant languages than hacking our models. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0-ALPHA Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arcadius Ahouansou updated LUCENE-1822: --- Attachment: LUCENE-1822-tests.patch Hi Koji. Thanks for the patch update. The failing tests have been fixed. Some are obvious. For the tests checking for subInfo, we have something like exptected: subInfos=(theboth((195,203)))/0.86791086(189,289) actual : subInfos=(theboth((195,203)))/0.86791086(149,249) Honestly, I haven't got into the detail of verifying/counting the offset positions for the search terms. Could you have a look please? Thanks. FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive -- Key: LUCENE-1822 URL: https://issues.apache.org/jira/browse/LUCENE-1822 Project: Lucene - Core Issue Type: Improvement Components: modules/highlighter Affects Versions: 2.9 Environment: any Reporter: Alex Vigdor Assignee: Koji Sekiguchi Priority: Minor Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822-tests.patch The new FastVectorHighlighter performs extremely well, however I've found in testing that the window of text chosen per fragment is often very poor, as it is hard coded in SimpleFragListBuilder to always select starting 6 characters to the left of the first phrase match in a fragment. When selecting long fragments, this often means that there is barely any context before the highlighted word, and lots after; even worse, when highlighting a phrase at the end of a short text the beginning is cut off, even though the entire phrase would fit in the specified fragCharSize. For example, highlighting Punishment in Crime and Punishment returns e and bPunishment/b no matter what fragCharSize is specified. I am going to attach a patch that improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified - this way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3917) Partial State is not defined for Dynamic Fields Types
[ https://issues.apache.org/jira/browse/SOLR-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-3917. - Resolution: Fixed Committed revision 1394983. trunk Committed revision 1394987. branch_4x Committed revision 1394990. lucene_solr_4_0 Partial State is not defined for Dynamic Fields Types --- Key: SOLR-3917 URL: https://issues.apache.org/jira/browse/SOLR-3917 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Fix For: 4.1 Attachments: SOLR-3917.patch, SOLR-3917.patch SOLR-3734 introduced a partial state for fields, which are referenced f.e. within a copyfield, but are not explicit declared in the schema -- checking not correctly for the state, the schema browser throws an error for dynamic fields and types -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: VOTE: release 4.0 (RC2)
Hi, +1 to release this time! - I ran smoketester on Linux (JDKs: 1.6.0_33, 1.7.0_07, server, 64bit), passed! - I used the PANGAEA index (version 3.6.1, copied from our production system), ran checkindex on it (both JDKs) passed. Index size and deletions were reported correctly this time. I also checked an force-merged PANGAEA index, passed also. - I used IndexUpgrader to upgrade both 3.6.1 indexes, passed. - I checked the output of IndexUpgrader again with CheckIndex, the output for the force-merged 3.6.1 index was identical to the migrated 4.0 index (number of terms,) - I compared the index sizes of the single-segment 3.6.1 and 4.0.0 indexes: 4.0 was slightly larger. You have to know that this index contains thousands of fields (it allows to search in XML based on XQuery-like, so we have a field for every possible xpath of our rather complex XML schema in the index). The search speed improved dramatically (because of the separate term dictionaries for every field). In addition, so the bigger size is also caused by splitting fields to separate term dictionaries vs. one in 3.6. 4.0 also has more statistics. I think an index with 10 fields may be smaller, will try this a little bit later with another index - I used the demo module to run some text-only queries, they passed. On small thing: We mention (in lucene's package), that Java 6 is needed, so we should at least mention that in the release notes. We should improve our docs/index.(html|xsl) to mention system requirements. Same for Solr. We have a system requirements page on the website, but that is unversioned, so we should also add a section for 4.0 there. But this is not the way to go. We should also mention that Java 7 is the preferred Java version, if you have 1.7.0_01 at least. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, October 06, 2012 10:11 AM To: dev@lucene.apache.org Subject: VOTE: release 4.0 (RC2) artifacts here: http://s.apache.org/lusolr40rc2 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs and other bugs! I am happy this was all discovered and sorted out before release. vote stays open until wednesday, the weekend is just extra time for evaluating the RC. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: VOTE: release 4.0 (RC2)
I mean: we don't mention system requirements correctly - sorry - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, October 06, 2012 1:19 PM To: dev@lucene.apache.org Subject: RE: VOTE: release 4.0 (RC2) Hi, +1 to release this time! - I ran smoketester on Linux (JDKs: 1.6.0_33, 1.7.0_07, server, 64bit), passed! - I used the PANGAEA index (version 3.6.1, copied from our production system), ran checkindex on it (both JDKs) passed. Index size and deletions were reported correctly this time. I also checked an force-merged PANGAEA index, passed also. - I used IndexUpgrader to upgrade both 3.6.1 indexes, passed. - I checked the output of IndexUpgrader again with CheckIndex, the output for the force-merged 3.6.1 index was identical to the migrated 4.0 index (number of terms,) - I compared the index sizes of the single-segment 3.6.1 and 4.0.0 indexes: 4.0 was slightly larger. You have to know that this index contains thousands of fields (it allows to search in XML based on XQuery-like, so we have a field for every possible xpath of our rather complex XML schema in the index). The search speed improved dramatically (because of the separate term dictionaries for every field). In addition, so the bigger size is also caused by splitting fields to separate term dictionaries vs. one in 3.6. 4.0 also has more statistics. I think an index with 10 fields may be smaller, will try this a little bit later with another index - I used the demo module to run some text-only queries, they passed. On small thing: We mention (in lucene's package), that Java 6 is needed, so we should at least mention that in the release notes. We should improve our docs/index.(html|xsl) to mention system requirements. Same for Solr. We have a system requirements page on the website, but that is unversioned, so we should also add a section for 4.0 there. But this is not the way to go. We should also mention that Java 7 is the preferred Java version, if you have 1.7.0_01 at least. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, October 06, 2012 10:11 AM To: dev@lucene.apache.org Subject: VOTE: release 4.0 (RC2) artifacts here: http://s.apache.org/lusolr40rc2 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs and other bugs! I am happy this was all discovered and sorted out before release. vote stays open until wednesday, the weekend is just extra time for evaluating the RC. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470975#comment-13470975 ] Dawid Weiss commented on LUCENE-4463: - I absolutely understand. There seem to be a few recurring scenarios: - random test (exploring the combinations space; typically jenkins) - random test, many repetitions of a single test method, constant seed (-Dtestcase=... -Dtests.iters=... -Dtests.seed=XXX:YYY) - random test, many repetitions of a single test method, variable seed starting from a single master (-Dtestcase=... -Dtests.iters=... -Dtests.seed=XXX) - random test, many repetitions of a single suite, constant seed (-Dtestcase=... -Dtests.dups=... -Dtests.seed=...); this also applies for repeating a single test method within a suite but accelerated to run on multiple cores if one has many. - random test, many repetitions of a single suite, random seed (-Dtestcase=... -Dtests.dups=...). We currently seem to have all these except for the last one. I have a working patch in my head, I'll attach shortly. Btw. I don't think there's anything I can do to make Mike NOT run his Python/SSH magic because he scatters tests across a farm of machines... I plan to do this for junit4 around year 2020, he, he. Not that it's very complicated technically but it'd require a lot of refactorings and then testing for potential infrastructure problems, detecting hung processes/sockets/jvms, etc. add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4463: Summary: Add support for running the same test method/class many times with different class seeds (was: add support for running the same test method many times) Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4463) add support for running the same test method many times
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-4463: --- Assignee: Dawid Weiss add support for running the same test method many times --- Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b58) - Build # 1572 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/1572/ Java: 64bit/jdk1.8.0-ea-b58 -XX:+UseParallelGC All tests passed Build Log: [...truncated 9044 lines...] [junit4:junit4] ERROR: JVM J1 ended with an exception, command line: /mnt/ssd/jenkins/tools/java/64bit/jdk1.8.0-ea-b58/jre/bin/java -XX:+UseParallelGC -Dtests.prefix=tests -Dtests.seed=EA34BF7EB6C56752 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.lockdir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build -Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.1 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/testlogging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=3 -DtempDir=. -Djava.io.tmpdir=. -Dtests.sandbox.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build/solr-core -Dclover.db.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/junit4/tests.policy -Dlucene.version=4.1-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -classpath
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470976#comment-13470976 ] Uwe Schindler commented on LUCENE-4463: --- bq. Btw. I don't think there's anything I can do to make Mike NOT run his Python/SSH magic because he scatters tests across a farm of machines... I plan to do this for junit4 around year 2020, he, he. Not that it's very complicated technically but it'd require a lot of refactorings and then testing for potential infrastructure problems, detecting hung processes/sockets/jvms, etc. I dont think you need to do that: He should install Jenkins on his farmhouse machine and then setup a slave in the GUI for every combine harvester operated by his slaves. He can then create a job, not bound to a specific node and run it on all slaves in parallel. Very easy to setup, the SSH-Magic is included in Jenkins (dumb slave ): Jenkins connects via SSH to the slave, copies the slave.jar via scp and starts the jenkins cows. On the dairies you don’t need to setup anything beyond a VM in $PATH. Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470980#comment-13470980 ] Dawid Weiss commented on LUCENE-4463: - Yeah, I'm sure Mike will stick to his SSH scripts though :) Anyway, my first idea won't work. The seed decorator has to have a constant mix function for a given class -- it cannot change over time for the same time because then you wouldn't know the actual seed (and be able to repeat it) if a failure happened at iteration 1. I'll try with my second idea which requires modification to the runner. The problem again is that this involves a longer cycle of releasing via maven, etc. Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470991#comment-13470991 ] Michael McCandless commented on LUCENE-4463: The scatter tests across a bunch of networked machines script is here: http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/runRemoteTests.py ... it just uses randomizedrunner to execute the tests, but ssh to distribute each test case to the N machines. All JVMs (N per remote machine) pull from a single shared tasks queue, in order of slowest test to fastest test ... it communicates with randomizedrunner using its nice stdin/.events API :) It runs [nearly!] all Lucene/Solr tests across N machines and reports any failures ... the source code is scary and has hardwired constants for my env ... but it makes running all tests wicked fast. But that's a very different use case than beasting a single test (this issue). For that I use http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/repeatLuceneTest.py ... however, it's single threaded, and does not run on remote machines ... would be fun to fix that! bq. He should install Jenkins on his farmhouse machine and then setup a slave Well I think we need to solve this issue first (how to run many iters of a single testcase testmethod, each w/ a different seed)? Then I agree Jenkins could be used for distribution instead of ssh + scripts. Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470993#comment-13470993 ] Michael McCandless commented on LUCENE-4463: bq. Yeah, I'm sure Mike will stick to his SSH scripts though Not if we had an efficient way to distribute tests across N JVMs running on M machines from a single queue. One big problem w/ runRemoteTests.py is it does CLASSPATH pollution, ie the CLASSPATH it runs with is the union of all CLASSPATHs for all tests ... this is bad because then it fails to catch dependency problems, or cases when module X shouldn't use module Y but does. This also causes certain tests to false fail ... Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: release 4.0 (RC2)
+1 Smoke tester is happy in my env (Ubuntu 12.04, Javas 1.6.0_32 / 1.7.0_04). Mike McCandless http://blog.mikemccandless.com On Sat, Oct 6, 2012 at 4:10 AM, Robert Muir rcm...@gmail.com wrote: artifacts here: http://s.apache.org/lusolr40rc2 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs and other bugs! I am happy this was all discovered and sorted out before release. vote stays open until wednesday, the weekend is just extra time for evaluating the RC. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: slow-io-beasting #2325
See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/2325/ -- [...truncated 682 lines...] [junit4:junit4] OK 0.03s J2 | TestOmitNorms.testNoNrmFile [junit4:junit4] Completed on J2 in 1.27s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestThreadedForceMerge [junit4:junit4] OK 1.01s J3 | TestThreadedForceMerge.testThreadedForceMerge [junit4:junit4] Completed on J3 in 1.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCodecs [junit4:junit4] OK 0.02s J0 | TestCodecs.testFixedPostings [junit4:junit4] OK 0.13s J0 | TestCodecs.testSepPositionAfterMerge [junit4:junit4] OK 0.13s J0 | TestCodecs.testRandomPostings [junit4:junit4] Completed on J0 in 0.30s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestMixedCodecs [junit4:junit4] OK 0.42s J2 | TestMixedCodecs.test [junit4:junit4] Completed on J2 in 0.45s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestIndexInput [junit4:junit4] OK 0.03s J3 | TestIndexInput.testBufferedIndexInputRead [junit4:junit4] OK 0.03s J3 | TestIndexInput.testRawIndexInputRead [junit4:junit4] OK 0.02s J3 | TestIndexInput.testByteArrayDataInput [junit4:junit4] Completed on J3 in 0.33s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestParallelCompositeReader [junit4:junit4] OK 0.02s J2 | TestParallelCompositeReader.testIncompatibleIndexes2 [junit4:junit4] OK 0.02s J2 | TestParallelCompositeReader.testIncompatibleIndexes1 [junit4:junit4] OK 0.08s J2 | TestParallelCompositeReader.testIgnoreStoredFields [junit4:junit4] OK 0.08s J2 | TestParallelCompositeReader.testRefCounts1 [junit4:junit4] OK 0.17s J2 | TestParallelCompositeReader.testQueriesCompositeComposite [junit4:junit4] OK 0.00s J2 | TestParallelCompositeReader.testRefCounts2 [junit4:junit4] OK 0.20s J2 | TestParallelCompositeReader.testIncompatibleIndexes3 [junit4:junit4] OK 0.10s J2 | TestParallelCompositeReader.testQueries [junit4:junit4] Completed on J2 in 0.67s, 8 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping [junit4:junit4] OK 0.52s J3 | TestLazyProxSkipping.testLazySkipping [junit4:junit4] OK 0.14s J3 | TestLazyProxSkipping.testSeek [junit4:junit4] Completed on J3 in 0.67s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCrash [junit4:junit4] 1 TEST: initIndex [junit4:junit4] 1 TEST: done initIndex [junit4:junit4] 1 TEST: now crash [junit4:junit4] OK 0.11s J0 | TestCrash.testWriterAfterCrash [junit4:junit4] OK 0.22s J0 | TestCrash.testCrashAfterClose [junit4:junit4] OK 0.06s J0 | TestCrash.testCrashAfterCloseNoWait [junit4:junit4] OK 0.40s J0 | TestCrash.testCrashWhileIndexing [junit4:junit4] OK 0.17s J0 | TestCrash.testCrashAfterReopen [junit4:junit4] Completed on J0 in 1.13s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestOmitTf [junit4:junit4] OK 0.27s J3 | TestOmitTf.testNoPrxFile [junit4:junit4] OK 0.00s J3 | TestOmitTf.testStats [junit4:junit4] OK 0.00s J3 | TestOmitTf.testOmitTermFreqAndPositions [junit4:junit4] OK 0.02s J3 | TestOmitTf.testMixedRAM [junit4:junit4] OK 0.03s J3 | TestOmitTf.testBasic [junit4:junit4] OK 0.16s J3 | TestOmitTf.testMixedMerge [junit4:junit4] Completed on J3 in 0.49s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestDocValuesTypeCompatibility [junit4:junit4] OK 0.05s J2 | TestDocValuesTypeCompatibility.testIncompatibleTypesBytes [junit4:junit4] OK 0.09s J2 | TestDocValuesTypeCompatibility.testAddCompatibleByteTypes [junit4:junit4] OK 0.30s J2 | TestDocValuesTypeCompatibility.testAddCompatibleDoubleTypes [junit4:junit4] OK 0.09s J2 | TestDocValuesTypeCompatibility.testAddCompatibleIntTypes [junit4:junit4] OK 0.27s J2 | TestDocValuesTypeCompatibility.testAddCompatibleDoubleTypes2 [junit4:junit4] Completed on J2 in 0.89s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestForceMergeForever [junit4:junit4] OK 0.64s J0 | TestForceMergeForever.test [junit4:junit4] Completed on J0 in 0.66s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestTermVectorsWriter [junit4:junit4] OK 0.00s J2 | TestTermVectorsWriter.testTermVectorCorruption3 [junit4:junit4] OK 0.02s J2 | TestTermVectorsWriter.testNoTermVectorAfterTermVectorMerge [junit4:junit4] OK 0.00s J2 | TestTermVectorsWriter.testEndOffsetPositionCharAnalyzer [junit4:junit4] OK 0.02s J2 | TestTermVectorsWriter.testDoubleOffsetCounting2 [junit4:junit4] OK 0.00s J2 | TestTermVectorsWriter.testTermVectorCorruption [junit4:junit4] OK 0.02s J2 | TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter [junit4:junit4] OK 0.06s J2 |
Jenkins build is back to normal : slow-io-beasting #2326
See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/2326/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471005#comment-13471005 ] Dawid Weiss commented on LUCENE-4463: - bq. Not if we had an efficient way to distribute tests across N JVMs running on M machines from a single queue. Yeah... I'll try to fix this issue so that you can run across N JVMs, but still locally. I don't think I'll have the time in the nearest future to work on truly distributed mode. Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471068#comment-13471068 ] Kazuaki Hiraga commented on LUCENE-3922: Sorry for this late reply. Although I have some request to improve capability, this is very helpful and nice charfilter for me. Thank you! Christian!! My requests are the following: Is it difficult to support numbers with period as the following? 3.2兆円 5.2億円 On the other hand, I agree with Christian to not preserving leading zeros. So, ◯◯七 doesn't need to become 007. I think It would be helpful that this charfilter supports old Kanji numeric characters (KYU-KANJI or DAIJI) such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 (Three), or configureable. Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Kazuaki Hiraga Labels: features Attachments: LUCENE-3922.patch Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals
[ https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471081#comment-13471081 ] David Smiley commented on LUCENE-4285: -- I admit checked exceptions would have alerted me to my bug, but that doesn't make the API any nicer -- I still need null checks littered through my FST user code now. I don't know the FST internals but I'd be surprised to hear that adding support for an empty FST adds appreciable overhead. If this overhead we're discussing is a simple conditional check, then this is net-zero since as it is I need these null checks on my end of the API due to my FST being potentially null. Improve FST API usability for mere mortals -- Key: LUCENE-4285 URL: https://issues.apache.org/jira/browse/LUCENE-4285 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: David Smiley FST technology is something that has brought amazing advances to Lucene, yet the API is hard to use for the vast majority of users like me. I know that performance of FSTs is really important, but surely a lot can be done without sacrificing that. (comments will hold specific ideas and problems) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4285) Improve FST API usability for mere mortals
[ https://issues.apache.org/jira/browse/LUCENE-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471083#comment-13471083 ] Dawid Weiss commented on LUCENE-4285: - Things are more difficult than they seem at the surface. An elegant solution would encode an empty automaton without any extra flags or checks. In an arc based representation there is simply no notion of an empty set of arcs though -- there needs to be at least one and if it's present on the root state then, well, it's no longer an empty automaton. Like I said -- this can be modeled with an initial state transition (the symbol doesn't matter); if this transition is final then this the automaton is empty (there is no actual root state). But this also changes how traversals are implemented and would affect all of the existing code. Improve FST API usability for mere mortals -- Key: LUCENE-4285 URL: https://issues.apache.org/jira/browse/LUCENE-4285 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: David Smiley FST technology is something that has brought amazing advances to Lucene, yet the API is hard to use for the vast majority of users like me. I know that performance of FSTs is really important, but surely a lot can be done without sacrificing that. (comments will hold specific ideas and problems) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4463) Add support for running the same test method/class many times with different class seeds
[ https://issues.apache.org/jira/browse/LUCENE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471090#comment-13471090 ] Dawid Weiss commented on LUCENE-4463: - I thought about it for a bit longer and exercised a few scenarios. The problem is that I designed everything (and I mean everything) with two ideas in mind: - every random element (be it a selection of components, shuffling of order or whatever) is a derivative of a single master seed. This seed is picked by junit4 task and is then used to sort suites to be executed, pick parameters, then is passed to suites to log messages, stack traces, etc. - execution of a test suite (in the sense of a single class) is isolated from anything else -- any other class running before or after. So you can provide the same master seed for a single class and it should execute identically, even if it's detached from the entire sequence of suites than ran during the full test. The seed decorators that we currently use alter the master seed with a hash of the test class's name to make it different for each class running under the same master seed, but this is an independent operation -- whether something ran before or after doesn't matter. The idea of running the same suite many times with a _different_ master seed each time conflicts with these assumptions because then every subsequent execution of the same class will _not_ be a derivative of the master seed anymore (and will most likely depend on how many classes executed before or even be random). Let me illustrate this on an example. Let's say the master seed is XXX; we use this seed to pick file.encoding and for this seed it becomes UTF-8. If we now pick a random master seed (say, YYY) for concrete class and it fails, it'll report YYY back to the console. But if you ant -Dtests.seed=YYY then the selection of file.encoding will be different because, ehm, it's not XXX anymore. file.encoding has to be picked before the JVM is started so it cannot be done from within the running test runner, etc. This is just one of the problem scenarios, there are more but I hope you get the picture. A clean solution to the problem would be to make a loop inside ant, around the contents of the test-macro (so that the entire sequence of picking the master seed, picking parameters, spawning JVMs, etc. is repeated). This isn't really going to make matters much faster because it'll fork new JVMs etc. A dirty solution is to screw the above idealistic point of view and have a seed decorator which affects the master seed before it is propagated to each suite. This will cause all the headaches mentioned above PLUS you'll have to get the failing seed directly from the failing test (stack trace or whatever other message is printed) because it won't be the master seed JUnit4 greets you with. Then you could indeed run as many concurrent instances of the same suite with random seeds as you like (JVMs reused). This does sound like super-advanced and convoluted piece of functionality for something that will be probably used pretty frequently (which means lots of wtfs on the mailing list). Don't know, really. Add support for running the same test method/class many times with different class seeds Key: LUCENE-4463 URL: https://issues.apache.org/jira/browse/LUCENE-4463 Project: Lucene - Core Issue Type: Wish Components: general/build Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-4463.patch I have a shell script for this, mike has a python script, its annoying :) I want to do something like this: ant beast -Dtestcase= -Dtestmethod= -Diterations=100 I would be happy with a simple loop that just invokes 'test' somehow: getting a fresh new JVM to each iteration is desirable anyway (so you get fresh codecs, etc). the -Dtests.iters is not really useful for this because it does not allow -Dtestmethod and it does not give a fresh jvm. bonus points if it can use multiple jvms at the same time though :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.6.0_35) - Build # 1065 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1065/ Java: 64bit/jdk1.6.0_35 -XX:+UseSerialGC All tests passed Build Log: [...truncated 26494 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:245: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:551: Unable to delete file C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build\analysis\common\lucene-analyzers-common-4.1-SNAPSHOT.jar Total time: 65 minutes 15 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.6.0_35 -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: release 4.0 (RC2)
+1 smoke tests are ok. Tommaso 2012/10/6 Michael McCandless luc...@mikemccandless.com +1 Smoke tester is happy in my env (Ubuntu 12.04, Javas 1.6.0_32 / 1.7.0_04). Mike McCandless http://blog.mikemccandless.com On Sat, Oct 6, 2012 at 4:10 AM, Robert Muir rcm...@gmail.com wrote: artifacts here: http://s.apache.org/lusolr40rc2 Thanks for the good inspection of rc#1 and finding bugs, which found test bugs and other bugs! I am happy this was all discovered and sorted out before release. vote stays open until wednesday, the weekend is just extra time for evaluating the RC. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3918) Create dist-excl-slf4j target
Shawn Heisey created SOLR-3918: -- Summary: Create dist-excl-slf4j target Key: SOLR-3918 URL: https://issues.apache.org/jira/browse/SOLR-3918 Project: Solr Issue Type: Improvement Affects Versions: 4.0-BETA, 3.6.1 Reporter: Shawn Heisey Priority: Trivial Fix For: 4.0, 3.6.2, 4.1, 5.0 If you want to create an entire dist target but leave out slf4j bindings, you must currently use this: ant dist-solrj, dist-core, dist-test-framework, dist-contrib dist-war-excl-slf4j It would be better to have a single target. Attaching a patch against branch_4x for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3918) Create dist-excl-slf4j target
[ https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-3918: --- Attachment: SOLR-3918.patch Create dist-excl-slf4j target - Key: SOLR-3918 URL: https://issues.apache.org/jira/browse/SOLR-3918 Project: Solr Issue Type: Improvement Affects Versions: 3.6.1, 4.0-BETA Reporter: Shawn Heisey Priority: Trivial Fix For: 4.0, 3.6.2, 4.1, 5.0 Attachments: SOLR-3918.patch If you want to create an entire dist target but leave out slf4j bindings, you must currently use this: ant dist-solrj, dist-core, dist-test-framework, dist-contrib dist-war-excl-slf4j It would be better to have a single target. Attaching a patch against branch_4x for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3918) Create dist-excl-slf4j target
[ https://issues.apache.org/jira/browse/SOLR-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-3918: --- Fix Version/s: (was: 4.0) Create dist-excl-slf4j target - Key: SOLR-3918 URL: https://issues.apache.org/jira/browse/SOLR-3918 Project: Solr Issue Type: Improvement Affects Versions: 3.6.1, 4.0-BETA Reporter: Shawn Heisey Priority: Trivial Fix For: 3.6.2, 4.1, 5.0 Attachments: SOLR-3918.patch If you want to create an entire dist target but leave out slf4j bindings, you must currently use this: ant dist-solrj, dist-core, dist-test-framework, dist-contrib dist-war-excl-slf4j It would be better to have a single target. Attaching a patch against branch_4x for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3916) fl parsing is sensitive to newlines at the end of field names
[ https://issues.apache.org/jira/browse/SOLR-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471116#comment-13471116 ] Yonik Seeley commented on SOLR-3916: bq. If you look at the patch, you can see my point quite easily: when parsing the fl, ReturnFields is naively only treating the ' ' character as whitespace and not recognizing any other whitespace characters that might exist between field names. I had looked at the patch, and still didn't consider not checking for other types of whitespace between fieldnames a bug since we never promised to support that. If you look at the code that was used before ReturnFields, it also used a pattern that only split on comma or space. The previous code did handle leading/trailing whitespace via using String.trim() first though. fl parsing is sensitive to newlines at the end of field names - Key: SOLR-3916 URL: https://issues.apache.org/jira/browse/SOLR-3916 Project: Solr Issue Type: Bug Affects Versions: 4.0-BETA Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.0, 4.1, 5.0 Attachments: SOLR-3916.patch As reported by giovanni.bricconi on the user list, there is a bug in fl parsing that causes solr to get confused when a field name is followed by a newline character -- eg: in a requestHandler default like... {noformat} !-- newlines showing using $ --$ str name=fl$ sku,store_slug$ /str$ {noformat} ...this results in solr assuming it should use function parsing to evaluate the field name, which can cause missleading errors if the field name can't be used in a function (eg: can not use FieldCache on multivalued field: store_slug) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471117#comment-13471117 ] Lance Norskog commented on LUCENE-3922: --- bq. On the other hand, I agree with Christian to not preserving leading zeros. So, ◯◯七 doesn't need to become 007. This example shows why leading zeros should be preserved :) There are different kinds of text search. Searching for media titles like James Bond movies is a very different thing from searching newspaper articles. You might want to find ◯◯七 as the Japanese-language release and 007 as the English-language release. These numbers are brands, not numbers. Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Kazuaki Hiraga Labels: features Attachments: LUCENE-3922.patch Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471121#comment-13471121 ] Lance Norskog commented on LUCENE-3921: --- Statistical models and rule-based models always have a failure rate. When you use them you have to decide what to do about the failures. Attacking the failures with another model drives toward Xeno's Paradox. For Chinese language search, breaking the failures into bigrams makes a lot of sense. Another way to look at this is that Smart Chinese and Kuromoji are systems for minimizing bogus bigrams. This allows phrase queries to function without finding bogus results. The CJK bigram creator generates bogus bigrams, which cause phrase queries to find bogus results. [SOLR-3653] is the result of my experience in supporting searching Chinese legal documents. I have some useful numbers at the end of the page. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0-ALPHA Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471123#comment-13471123 ] Kazuaki Hiraga commented on LUCENE-3922: Lance, you may be right. Although I have never seen that Japanese people use Kanji numbers for James Bond movies :-), I can't say that we never use Kanji for that kind of expression. Christian, Is it possible to choose preserve leading zeros or not? Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Kazuaki Hiraga Labels: features Attachments: LUCENE-3922.patch Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471121#comment-13471121 ] Lance Norskog edited comment on LUCENE-3921 at 10/7/12 12:33 AM: - Statistical models and rule-based models always have a failure rate. When you use them you have to decide what to do about the failures. Attacking the failures with another model drives toward Xeno's Paradox. For Chinese language search, breaking the failures into bigrams makes a lot of sense. The CJK bigram generator creates a massive amount of bogus bigrams. Bogus bigrams case bogus results from sloppy phrase searches. Smart Chinese and Kuromoji are not systems for doing natural-language processing). They are systems for minimizing bogus bigrams. This allows sloppy phrase queries to find fewer bogus results. In my use case, Smart Chinese created only 2% (40k/1.8m) of the possible bigrams. [SOLR-3653] is the result of my experience in supporting searching Chinese legal documents. I have some useful numbers at the end of the page. was (Author: lancenorskog): Statistical models and rule-based models always have a failure rate. When you use them you have to decide what to do about the failures. Attacking the failures with another model drives toward Xeno's Paradox. For Chinese language search, breaking the failures into bigrams makes a lot of sense. Another way to look at this is that Smart Chinese and Kuromoji are systems for minimizing bogus bigrams. This allows phrase queries to function without finding bogus results. The CJK bigram creator generates bogus bigrams, which cause phrase queries to find bogus results. [SOLR-3653] is the result of my experience in supporting searching Chinese legal documents. I have some useful numbers at the end of the page. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0-ALPHA Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471132#comment-13471132 ] Christian Moen commented on LUCENE-3922: {quote} Is it difficult to support numbers with period as the following? 3.2兆円 5.2億円 {quote} Supporting this is no problem and a good idea. {quote} I think It would be helpful that this charfilter supports old Kanji numeric characters (KYU-KANJI or DAIJI) such as 壱, 壹 (One), 弌, 弐, 貳 (Two), 弍, 参,參 (Three), or configureable. {quote} This is also easy to support. As for making preserving zeros configurable, that's also possible, of course. It's great to get more feedback on what sort of functionality we need and what should be configurable options. Hopefully, we can find a good balance without adding too much complexity. Thanks for the feedback. Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Kazuaki Hiraga Labels: features Attachments: LUCENE-3922.patch Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_07) - Build # 1068 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1068/ Java: 64bit/jdk1.7.0_07 -XX:+UseG1GC All tests passed Build Log: [...truncated 27190 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:245: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:551: Unable to delete file C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build\analysis\common\lucene-analyzers-common-4.1-SNAPSHOT.jar Total time: 61 minutes 18 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.7.0_07 -XX:+UseG1GC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: slow-io-beasting #3090
See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/3090/ -- [...truncated 2472 lines...] [junit4:junit4] OK 0.02s J0 | TestCompoundFile.testSingleFile [junit4:junit4] Completed on J0 in 3.06s, 18 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestNRTReaderWithThreads [junit4:junit4] OK 1.56s J2 | TestNRTReaderWithThreads.testIndexing [junit4:junit4] Completed on J2 in 1.57s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCrashCausesCorruptIndex [junit4:junit4] OK 0.22s J2 | TestCrashCausesCorruptIndex.testCrashCorruptsIndexing [junit4:junit4] Completed on J2 in 0.27s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestConcurrentMergeScheduler [junit4:junit4] OK 0.25s J1 | TestConcurrentMergeScheduler.testDeleteMerging [junit4:junit4] OK 0.48s J1 | TestConcurrentMergeScheduler.testNoWaitClose [junit4:junit4] OK 0.38s J1 | TestConcurrentMergeScheduler.testNoExtraFiles [junit4:junit4] OK 0.09s J1 | TestConcurrentMergeScheduler.testFlushExceptions [junit4:junit4] Completed on J1 in 1.21s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestStressAdvance [junit4:junit4] OK 0.39s J2 | TestStressAdvance.testStressAdvance [junit4:junit4] Completed on J2 in 0.41s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.Test2BDocs [junit4:junit4] OK 0.00s J3 | Test2BDocs.testOverflow [junit4:junit4] OK 0.62s J3 | Test2BDocs.testExactlyAtLimit [junit4:junit4] Completed on J3 in 1.26s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestConsistentFieldNumbers [junit4:junit4] OK 0.08s J0 | TestConsistentFieldNumbers.testAddIndexes [junit4:junit4] OK 0.06s J0 | TestConsistentFieldNumbers.testSameFieldNumbersAcrossSegments [junit4:junit4] OK 0.05s J0 | TestConsistentFieldNumbers.testManyFields [junit4:junit4] OK 0.86s J0 | TestConsistentFieldNumbers.testFieldNumberGaps [junit4:junit4] Completed on J0 in 1.06s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestRollingUpdates [junit4:junit4] OK 0.94s J1 | TestRollingUpdates.testUpdateSameDoc [junit4:junit4] OK 0.23s J1 | TestRollingUpdates.testRollingUpdates [junit4:junit4] Completed on J1 in 1.18s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterOnDiskFull [junit4:junit4] OK 0.03s J0 | TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull [junit4:junit4] OK 0.00s J0 | TestIndexWriterOnDiskFull.testImmediateDiskFull [junit4:junit4] OK 0.03s J0 | TestIndexWriterOnDiskFull.testCorruptionAfterDiskFullDuringMerge [junit4:junit4] OK 0.54s J0 | TestIndexWriterOnDiskFull.testAddIndexOnDiskFull [junit4:junit4] Completed on J0 in 0.62s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSegmentReader [junit4:junit4] OK 0.13s J2 | TestSegmentReader.test [junit4:junit4] OK 0.19s J2 | TestSegmentReader.testDocument [junit4:junit4] OK 0.11s J2 | TestSegmentReader.testGetFieldNameVariations [junit4:junit4] OK 0.26s J2 | TestSegmentReader.testTerms [junit4:junit4] OK 0.11s J2 | TestSegmentReader.testNorms [junit4:junit4] OK 0.11s J2 | TestSegmentReader.testTermVectors [junit4:junit4] Completed on J2 in 0.95s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestMaxTermFrequency [junit4:junit4] OK 0.36s J0 | TestMaxTermFrequency.test [junit4:junit4] Completed on J0 in 0.37s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestStressIndexing2 [junit4:junit4] OK 0.30s J1 | TestStressIndexing2.testMultiConfig [junit4:junit4] OK 0.11s J1 | TestStressIndexing2.testRandomIWReader [junit4:junit4] OK 0.11s J1 | TestStressIndexing2.testRandom [junit4:junit4] Completed on J1 in 0.52s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCodecs [junit4:junit4] OK 0.00s J3 | TestCodecs.testFixedPostings [junit4:junit4] OK 0.02s J3 | TestCodecs.testSepPositionAfterMerge [junit4:junit4] OK 1.40s J3 | TestCodecs.testRandomPostings [junit4:junit4] Completed on J3 in 1.43s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCrash [junit4:junit4] OK 0.06s J2 | TestCrash.testCrashAfterClose [junit4:junit4] OK 0.20s J2 | TestCrash.testCrashAfterCloseNoWait [junit4:junit4] OK 0.03s J2 | TestCrash.testCrashWhileIndexing [junit4:junit4] 1 TEST: initIndex [junit4:junit4] 1 TEST: done initIndex [junit4:junit4] 1 TEST: now crash [junit4:junit4] OK 0.11s J2 | TestCrash.testWriterAfterCrash [junit4:junit4] OK 0.25s J2 | TestCrash.testCrashAfterReopen [junit4:junit4] Completed on J2 in 0.67s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestForceMergeForever [junit4:junit4] OK 0.13s J3 | TestForceMergeForever.test
Jenkins build is back to normal : slow-io-beasting #3091
See http://sierranevada.servebeer.com:8080/job/slow-io-beasting/3091/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org