Fwd: Maven Central is probably blocked in China
Does it mean we can remove that additional mirror from ivy settings? Dawid -- Forwarded message -- From: Niclas Hedhman nic...@hedhman.org Date: Thu, Jul 12, 2012 at 3:20 AM Subject: Re: Maven Central is probably blocked in China To: Brian Fox bri...@infinity.nu Cc: Maven Developers List d...@maven.apache.org Yes. repo1 repo2 central central01 central02 all resolve now to hosts that are not blocked. Well done!! Cheers Niclas On Thu, Jul 12, 2012 at 6:20 AM, Brian Fox bri...@infinity.nu wrote: Niclas, I'm told it's working now. Can you confirm? On Tue, Jul 10, 2012 at 1:11 PM, Brian Fox bri...@infinity.nu wrote: The network team confirmed that this is only Unicom with the issue. They are looking at alternate routes that would hopefully work. On Mon, Jul 9, 2012 at 5:31 PM, Niclas Hedhman nic...@hedhman.org wrote: Ok, good to know that it is not completely blocked. It is likely that there are multiple FirewallOps across China regions and the (I think) 3 ISPs (China Telecom, China Mobile and Unicom). As I mentioned, the edgecast address couldn't be reached, but the akamai one could. I am personally in downtown Shanghai, using China Unicom's Fiber To The Building. I am seen as 58.246.154.81 from the outside at the moment, can reach your a978.g1.akamai.net, but not wpc.829D.edgecastcdn.net. I can also VPN to Beijing, to a 163.com datacenter (which I think is a China Telecom subsidiary), having IP number 60.191.221.179. From there, both hosts above are reachable. So, yes, it seems to be regionalized or per ISP (which makes this less of a problem than I thought). I also mentioned that I am personally on VPN and I am not really affected, but developers I have met are not willing to pay for that service and don't have it. Cheers Niclas On Tue, Jul 10, 2012 at 1:58 AM, Brian Fox bri...@infinity.nu wrote: Niclas, We are seeing a lot of traffic to Central from China, so this certainly isn't a case of the Great Firewall blocking everything, rather it seems a little more localized. Can you send more more info about your source ip and geo location that we could use to see what's up? Possibly we can get the traffic routed to a China friendly ip. On Mon, Jul 9, 2012 at 12:08 PM, Brian E. Fox bri...@infinity.nu wrote: Hi Nicolas, this isn't intentional of course. Let me see what I can dig up based in your traces. --Brian (mobile) On Jul 7, 2012, at 11:45 PM, Niclas Hedhman nic...@hedhman.org wrote: (I am not subscribed, so please CC me on any responses) I live in China. I normally have a VPN enabled to circumvent various blocking (YouTube, Twitter, ++) that the Chinese government has in place. I normally don't think much about it. But, today I had my computer rebooted and couldn't build a project, because Maven Central couldn't be reached. So, before I realized that my VPN wasn't running I tracerouted a bit. repo1 resolved to niclas:~ niclas$ dig repo1.maven.org | grep ^[a-z] repo1.maven.org.1751INCNAMEcentral.maven.org. central.maven.org.212INCNAMEcentral02.maven.org. central02.maven.org.7112INCNAME wpc.829D.edgecastcdn.net. wpc.829D.edgecastcdn.net. 3164INCNAME gs1.wpc.edgecastcdn.net. gs1.wpc.edgecastcdn.net. 2292INA68.232.45.253 and from that I also tried central01 niclas:~ niclas$ dig central01.maven.org | grep ^[a-z] central01.maven.org.6477INCNAME central01.maven.org.edgesuite.net. central01.maven.org.edgesuite.net. 20877 IN CNAME a978.g1.akamai.net. a978.g1.akamai.net.4INA124.40.42.31 a978.g1.akamai.net.4INA124.40.42.6 And with tracerouting both (see below), it struck me that VPN might not be enabled and the IP on edgecastcdn.net is probably blocked by China potentially serving something they don't like, could be anything... Yeah, China is BAD, we all know that, but shouldn't we (Apache) try to minimize the problem for your ordinary Chinese developer, could be a student, hobbyist, small entrepreneur and so on, who isn't anti-government (most people here are quite content with the government) to be able to use Apache projects? The fact is now, that without reasonably reliable access to Maven Central, one can not really participate in many, many of the Java projects at ASF. I don't know how the DNS and host resolution is supposed to work, who is participating in the hosting and under what terms. But I think Maven/Sonatype should have in its interest to NOT EXCLUDE some staggering amount of Java programmers, and perhaps try to find a way to get a better SLA here. If you need help from someone to check from the inside the Great Firewall, just let me know... Cheers Niclas traceroute to gs1.wpc.edgecastcdn.net
[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412568#comment-13412568 ] Dawid Weiss commented on LUCENE-3950: - {code} +typedef resource=org/apache/rat/anttasks/antlib.xml uri=antlib:org.apache.rat.anttasks classpath -fileset dir=. includes=rat*.jar/ +fileset dir=${common.dir}/tools/lib includes=apache-rat-0.8.jar/ /classpath /typedef {code} I don't like this duplication of version numbers in ivy and ant files. I think it'd be nicer to use ivy's fileset or path to resolve these JARs if they're not part of the distribution? load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4216) Token X exceeds length of provided text sized X
Ibrahim created LUCENE-4216: --- Summary: Token X exceeds length of provided text sized X Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS
[ https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412603#comment-13412603 ] Yonik Seeley commented on SOLR-3613: bq. I'm not talking about precluding running as a webapp [...] so I'm going to pimp [jetty] it out +1 I also don't think we should force solr. for all the system properties. If someone ads the ability to optionally check for the webapp prefix, then I think we should still be free to use zkHost, collection.*, etc, in the examples/doc. bq. a thin HTTP layer around Lucene I've certainly never thought of Solr as that. Solr had faceting, numerics, etc, years before Lucene. Solr is about being a practical useful search server... and lately more morphing into a NoSQL server with first-class full-text search. Namespace Solr's JAVA OPTIONS - Key: SOLR-3613 URL: https://issues.apache.org/jira/browse/SOLR-3613 Project: Solr Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Jan Høydahl Fix For: 4.0 Solr being a web-app, should play nicely in a setting where users deploy it on a shared appServer. To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid name clashes and for clarity when reading your appserver startup script. We currently do that with most: {{solr.solr.home, solr.data.dir, solr.abortOnConfigurationError, solr.directoryFactory, solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we fail to do so. Before release of 4.0 we should make sure to clean this up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412605#comment-13412605 ] Uwe Schindler commented on LUCENE-3950: --- I had the same problem with this commit, but I remember that Robert said, there was actually a problem with RAT running from ivy:cachepath/. I would also really prefer to have this one only in cache, as we dont ship with this tool, so we dont have to take care of license,... We use all tasks from cachepatch (pegdown for converting markdown-HTML, cpptasks,...). Side note: I am thinking about adding clover, too. The required license file can be shipped together with our src package in the tools directory (Atlassian allowed this to the ASF, because the license only allows to check org.apache.* packages) and clover-2.6.1.jar can be downloaded via Ivy. load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412611#comment-13412611 ] Dawid Weiss commented on LUCENE-3950: - but I remember that Robert said, there was actually a problem with RAT running from ivy:cachepath/ Robert you recall what was that problem? load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-3950: --- Assignee: Uwe Schindler Hi, I reopen, as it works with cachepath. No fckd up lib folder with tools we dont need for compile. It is now behaving identical to cpptasks, junit, pegdown, maven-ant-tasks and all other build-tools. No License checks required. load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated SOLR-3524: - Attachment: SOLR-3524.patch Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412627#comment-13412627 ] Christian Moen commented on SOLR-3524: -- Patch updated due to recent configuration changes. Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412628#comment-13412628 ] Christian Moen commented on SOLR-3524: -- Committed revision 1360592 on {{trunk}} Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3950: -- Attachment: LUCENE-3950-cachepath.patch Patch. Works fine on different machines. I have no RAT in my .lib folder, maybe that was Robert's problem (conflict with cachepath)? load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-3950-cachepath.patch, LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3614) XML parsing in XPathEntityProcessor doesn't respect ENTITY declarations?
[ https://issues.apache.org/jira/browse/SOLR-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412639#comment-13412639 ] Thomas Beckers commented on SOLR-3614: -- I guess this behaviour was introduced with a fix for SOLR-964. XML parsing in XPathEntityProcessor doesn't respect ENTITY declarations? Key: SOLR-3614 URL: https://issues.apache.org/jira/browse/SOLR-3614 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-3614.patch As reported by Michael Belenki on solr-user, pointing XPathEntityProcessor at XML files that use DTD ENTITY declarations causes XML parse errors of the form... {noformat} org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:testdata.xml rows processed:0 ... Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity uuml ... {noformat} ...even when the entity is specifically declared. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412659#comment-13412659 ] Christian Moen commented on SOLR-3524: -- Committed revision 1360613 on {{branch_4x}} Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen resolved SOLR-3524. -- Resolution: Fixed Fix Version/s: 5.0 4.0 Thanks, Kazu and Ohtani-san! Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License
Uwe Schindler created LUCENE-4217: - Summary: Load clover.jar from ivy-cachepath andy ship sources with License Key: LUCENE-4217 URL: https://issues.apache.org/jira/browse/LUCENE-4217 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 When clover granted use the license for their clover-2.6.3.jar file they allowed us to ship this license file to every developer. Currently clover setup is very hard for users, so this issue will make it simple. If you want to run tests with clover, just pass -Drun.clover=true to ant clean test. ANT will then download clover via IVY and point it to the license file in our tools folder. The license is supplemented by the original mail from Atlassian, that everybody is allowed to use it with code in the org.apache. java package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2566) + - operators allow any amount of whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412673#comment-13412673 ] Karsten R. commented on LUCENE-2566: Should StandardQueryParser work like QueryParser? In current branches and trunk TestQPHelper still contains the line assertQueryEquals(a OR ! b, null, a -b); (and also a - b is parsed as a -b) + - operators allow any amount of whitespace Key: LUCENE-2566 URL: https://issues.apache.org/jira/browse/LUCENE-2566 Project: Lucene - Java Issue Type: Bug Components: core/queryparser Affects Versions: 3.6 Reporter: Yonik Seeley Assignee: Jan Høydahl Priority: Minor Fix For: 4.0-ALPHA, 3.6.1 Attachments: LUCENE-2566-3x.patch, LUCENE-2566.patch As an example, (foo - bar) is treated like (foo -bar). It seems like for +- to be treated as unary operators, they should be immediately followed by the operand. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task
[ https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412674#comment-13412674 ] Robert Muir commented on LUCENE-3950: - {quote} Robert you recall what was that problem? {quote} I think the problem was i tried to use the fine grained maven artifacts (rat-core + rat-tasks) using the big 'rat' jar with all its dependencies in one thing works great, and if it works on cachepath, even better. i dont care about actual jars, just that the task works :) load rat via ivy for rat-sources task - Key: LUCENE-3950 URL: https://issues.apache.org/jira/browse/LUCENE-3950 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-3950-cachepath.patch, LUCENE-3950.patch we now fail the build on rat problems (LUCENE-1866), so we should make it easy to run rat-sources for people to test locally (it takes like 3 seconds total for the whole trunk) Also this is safer than putting rat in your ~/.ant/lib because that adds some classes from commons to your ant classpath (which we currently wrongly use in compile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=1360619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve-tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Replication and proxy settings
Hello, I work near Cambridge at the EMBL-EBI and I would like to contribute to SOLR. Our current project involved 2 teams developing a full text search service based on SOLR 3.6. We have had issues when trying to replicate a master copy of an index to a slave using HTTP proxy settings passed to the JRE. To solve this issue, I've created a copy of the SnapPuller and the ReplicationHandler, modified the code to manage proxy settings and modified the configuration of the SOLR slave to use this new handler. We have tested the replication using proxy settings in our environment with success. What I would like to do now is to apply these changes to the SnapPuller directly: - check proxy settings before creating the httpClient - apply the proxy setting to the httpClient HostConfiguration. I don't want to patch 3.6 and would like to apply the change to SOLR 4. Please, tell me what do you think about this change and if I can proceed. What is the usual procedure to commit code? Best regards, Gautier
RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev= 136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev= 136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412685#comment-13412685 ] Christian Moen commented on SOLR-3524: -- {{CHANGES.txt}} for some reason didn't make it into {{branch_4x}}. Fixed this in revision 1360622. Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory --- Key: SOLR-3524 URL: https://issues.apache.org/jira/browse/SOLR-3524 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Kazuaki Hiraga Assignee: Christian Moen Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3524.patch, SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation. I would like to have an option I can configure this behavior by fieldtype definition in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev= 136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
ok but currently they are required, if you add a 3rd party jar and dont add a checksum for it, the build fails. so if we add back the dependency to resolve-tools for top-level lucene resolve (build.xml, only invoked a single time), then it all works. - target name=resolve depends=resolve-tools + target name=resolve Otherwise, jar-checksums task will fail, because it will remove the asm jar's checksum. On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote: What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev= 136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve depends=resolve-tools + target name=resolve sequential ant dir=test-framework target=resolve inheritall=false propertyset refid=uptodate.and.compiled.properties/ This part of the commit is a bug, it should go back to depending upon resolve- tools (or please remove the ASM.jar!!!) Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what happens. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Right, ever since then the jar-checksums task has not worked correctly. I dont know how you added a checksum, maybe with 'openssl sha1' yourself manually? But i needed to add a checksum for the commit to succeed, so i had to fix this task. On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote: What is different to Sunday afternoon? I added asm-all-debug.jar with a checksum generated by my local windows tools and it worked? I don’t care about this ant task. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:03 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt ok but currently they are required, if you add a 3rd party jar and dont add a checksum for it, the build fails. so if we add back the dependency to resolve-tools for top-level lucene resolve (build.xml, only invoked a single time), then it all works. - target name=resolve depends=resolve-tools + target name=resolve Otherwise, jar-checksums task will fail, because it will remove the asm jar's checksum. On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote: What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt Modified: lucene/dev/trunk/lucene/build.xml lucene/dev/trunk/lucene/common-build.xml lucene/dev/trunk/lucene/tools/ivy.xml Modified: lucene/dev/trunk/lucene/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev= 136 0619r1=1360618r2=1360619view=diff == --- lucene/dev/trunk/lucene/build.xml (original) +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012 @@ -227,7 +227,7 @@ /forbidden-apis /target - target name=resolve
RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Aha - I have no idea what you are talking about. Fix it, for me it works! I refuse to add endless chains of depends everywhere just to get a stupid tool xy working in compination yx on foobar. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:07 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Right, ever since then the jar-checksums task has not worked correctly. I dont know how you added a checksum, maybe with 'openssl sha1' yourself manually? But i needed to add a checksum for the commit to succeed, so i had to fix this task. On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote: What is different to Sunday afternoon? I added asm-all-debug.jar with a checksum generated by my local windows tools and it worked? I don’t care about this ant task. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:03 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt ok but currently they are required, if you add a 3rd party jar and dont add a checksum for it, the build fails. so if we add back the dependency to resolve-tools for top-level lucene resolve (build.xml, only invoked a single time), then it all works. - target name=resolve depends=resolve-tools + target name=resolve Otherwise, jar-checksums task will fail, because it will remove the asm jar's checksum. On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote: What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without documenting it anywhere. Sorry. If you want this fixed, open issue and commit it separately. But not together with unrelated stuff. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:40 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat- NOTICE.txt On Thu, Jul 12, 2012 at 6:34 AM, uschind...@apache.org wrote: Author: uschindler Date: Thu Jul 12 10:34:11 2012 New Revision: 1360619 URL: http://svn.apache.org/viewvc?rev=1360619view=rev Log: LUCENE-3950: Use ivy.cachepath for Apache RAT Removed: lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Can you simply fix it for me? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:20 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt its not an endless chain of depends, tools/ is not really a real module and had no dependencies before. So its excluded from the ordinary modules-crawl (no documentation, javadocs, tests, packaging is done for it). Now it has dependencies, so its important that resolve defer to it if we want it to work within IDEs such as eclipse, and if we want tasks like jar-checksum to work. On Thu, Jul 12, 2012 at 7:11 AM, Uwe Schindler u...@thetaphi.de wrote: Aha - I have no idea what you are talking about. Fix it, for me it works! I refuse to add endless chains of depends everywhere just to get a stupid tool xy working in compination yx on foobar. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:07 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Right, ever since then the jar-checksums task has not worked correctly. I dont know how you added a checksum, maybe with 'openssl sha1' yourself manually? But i needed to add a checksum for the commit to succeed, so i had to fix this task. On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote: What is different to Sunday afternoon? I added asm-all-debug.jar with a checksum generated by my local windows tools and it worked? I don’t care about this ant task. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:03 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt ok but currently they are required, if you add a 3rd party jar and dont add a checksum for it, the build fails. so if we add back the dependency to resolve-tools for top-level lucene resolve (build.xml, only invoked a single time), then it all works. - target name=resolve depends=resolve-tools + target name=resolve Otherwise, jar-checksums task will fail, because it will remove the asm jar's checksum. On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote: What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat- NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote: You committed this without
Re: Replication and proxy settings
Gautier: It's perfectly appropriate to open a JIRA and attach your code as a patch, thanks! If you haven't already seen it, this describes now to make patches, etc... http://wiki.apache.org/solr/HowToContribute Most IDEs have the ability to create them too, and svn -diff is easy to do (execute that from the dir that contains both solr and lucene for ease-of-use, please). We typically name patches as SOLR-.patch, where ### is the JIRA number. Here's the JIRA link: https://issues.apache.org/jira/browse/solr, you have to create a user ID to add JIRAs/patches. Code is always welcome! Best Erick On Thu, Jul 12, 2012 at 6:43 AM, Gautier Koscielny kosci...@ebi.ac.uk wrote: Hello, I work near Cambridge at the EMBL-EBI and I would like to contribute to SOLR. Our current project involved 2 teams developing a full text search service based on SOLR 3.6. We have had issues when trying to replicate a master copy of an index to a slave using HTTP proxy settings passed to the JRE. To solve this issue, I've created a copy of the SnapPuller and the ReplicationHandler, modified the code to manage proxy settings and modified the configuration of the SOLR slave to use this new handler. We have tested the replication using proxy settings in our environment with success. What I would like to do now is to apply these changes to the SnapPuller directly: - check proxy settings before creating the httpClient - apply the proxy setting to the httpClient HostConfiguration. I don't want to patch 3.6 and would like to apply the change to SOLR 4. Please, tell me what do you think about this change and if I can proceed. What is the usual procedure to commit code? Best regards, Gautier - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml
Ahm... + SystemProperty name=lucidworksLogsHome//request._mm_dd.log Maybe change this name! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: markrmil...@apache.org [mailto:markrmil...@apache.org] Sent: Thursday, July 12, 2012 1:43 PM To: comm...@lucene.apache.org Subject: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml Author: markrmiller Date: Thu Jul 12 11:42:50 2012 New Revision: 1360640 URL: http://svn.apache.org/viewvc?rev=1360640view=rev Log: add a commented out example to jetty.xml for configuring a request log Modified: lucene/dev/branches/branch_4x/ (props changed) lucene/dev/branches/branch_4x/solr/ (props changed) lucene/dev/branches/branch_4x/solr/example/ (props changed) lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Modified: lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/et c/jetty.xml?rev=1360640r1=1360639r2=1360640view=diff == --- lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml (original) +++ lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Thu Jul 12 11:42:50 2012 @@ -87,6 +87,32 @@ /New /Set +!-- === -- +!-- Configure Request Log -- +!-- === -- +!-- +Ref id=Handlers + Call name=addHandler +Arg + New id=RequestLog class=org.eclipse.jetty.server.handler.RequestLogHandler +Set name=requestLog + New id=RequestLogImpl class=org.eclipse.jetty.server.NCSARequestLog +Set name=filename + SystemProperty name=lucidworksLogsHome//request._mm_dd.log +/Set +Set name=filenameDateFormat_mm_dd/Set +Set name=retainDays90/Set +Set name=appendtrue/Set +Set name=extendedfalse/Set +Set name=logCookiesfalse/Set +Set name=LogTimeZoneUTC/Set + /New +/Set + /New +/Arg + /Call +/Ref +-- !-- === -- !-- extra options -- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3617) Consider adding start scripts.
Mark Miller created SOLR-3617: - Summary: Consider adding start scripts. Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412723#comment-13412723 ] Mark Miller commented on SOLR-3617: --- Thoughts? Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml
It's supposed to be 'logs'. I changed it in the example dir I was testing with, but missed it in the actual src it seems. On Jul 12, 2012, at 7:49 AM, Uwe Schindler wrote: Ahm... + SystemProperty name=lucidworksLogsHome//request._mm_dd.log Maybe change this name! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: markrmil...@apache.org [mailto:markrmil...@apache.org] Sent: Thursday, July 12, 2012 1:43 PM To: comm...@lucene.apache.org Subject: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml Author: markrmiller Date: Thu Jul 12 11:42:50 2012 New Revision: 1360640 URL: http://svn.apache.org/viewvc?rev=1360640view=rev Log: add a commented out example to jetty.xml for configuring a request log Modified: lucene/dev/branches/branch_4x/ (props changed) lucene/dev/branches/branch_4x/solr/ (props changed) lucene/dev/branches/branch_4x/solr/example/ (props changed) lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Modified: lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/et c/jetty.xml?rev=1360640r1=1360639r2=1360640view=diff == --- lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml (original) +++ lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Thu Jul 12 11:42:50 2012 @@ -87,6 +87,32 @@ /New /Set +!-- === -- +!-- Configure Request Log -- +!-- === -- +!-- +Ref id=Handlers + Call name=addHandler +Arg + New id=RequestLog class=org.eclipse.jetty.server.handler.RequestLogHandler +Set name=requestLog + New id=RequestLogImpl class=org.eclipse.jetty.server.NCSARequestLog +Set name=filename + SystemProperty name=lucidworksLogsHome//request._mm_dd.log +/Set +Set name=filenameDateFormat_mm_dd/Set +Set name=retainDays90/Set +Set name=appendtrue/Set +Set name=extendedfalse/Set +Set name=logCookiesfalse/Set +Set name=LogTimeZoneUTC/Set + /New +/Set + /New +/Arg + /Call +/Ref +-- !-- === -- !-- extra options -- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
Yes, ill fix! On Thu, Jul 12, 2012 at 7:25 AM, Uwe Schindler u...@thetaphi.de wrote: Can you simply fix it for me? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:20 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt its not an endless chain of depends, tools/ is not really a real module and had no dependencies before. So its excluded from the ordinary modules-crawl (no documentation, javadocs, tests, packaging is done for it). Now it has dependencies, so its important that resolve defer to it if we want it to work within IDEs such as eclipse, and if we want tasks like jar-checksum to work. On Thu, Jul 12, 2012 at 7:11 AM, Uwe Schindler u...@thetaphi.de wrote: Aha - I have no idea what you are talking about. Fix it, for me it works! I refuse to add endless chains of depends everywhere just to get a stupid tool xy working in compination yx on foobar. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:07 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt Right, ever since then the jar-checksums task has not worked correctly. I dont know how you added a checksum, maybe with 'openssl sha1' yourself manually? But i needed to add a checksum for the commit to succeed, so i had to fix this task. On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote: What is different to Sunday afternoon? I added asm-all-debug.jar with a checksum generated by my local windows tools and it worked? I don’t care about this ant task. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 1:03 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt ok but currently they are required, if you add a 3rd party jar and dont add a checksum for it, the build fails. so if we add back the dependency to resolve-tools for top-level lucene resolve (build.xml, only invoked a single time), then it all works. - target name=resolve depends=resolve-tools + target name=resolve Otherwise, jar-checksums task will fail, because it will remove the asm jar's checksum. On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote: What does this f*cking task do? These checksums are a no-go for me. I hate them and please remove them completely! It took me a hour on the weekend to get this shitty task working! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:55 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt again, please run 'ant jar-checksums' you will see the problem. On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote: Nat validate works form me. There is a checksum for asm so where is the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, July 12, 2012 12:49 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat- NOTICE.txt Well, it was pretty related. I needed to add the checksums to commit this, or 'ant validate' would fail and jenkins would have been very angry! So i had to fix jar-checksums in order to commit! On Thu, Jul 12,
Re: Replication and proxy settings
Hi Erick, I'll follow these guidelines. I'll open a IRA ticket for 3.6.1 Thank you Gautier On 12 Jul 2012, at 12:38, Erick Erickson wrote: Gautier: It's perfectly appropriate to open a JIRA and attach your code as a patch, thanks! If you haven't already seen it, this describes now to make patches, etc... http://wiki.apache.org/solr/HowToContribute Most IDEs have the ability to create them too, and svn -diff is easy to do (execute that from the dir that contains both solr and lucene for ease-of-use, please). We typically name patches as SOLR-.patch, where ### is the JIRA number. Here's the JIRA link: https://issues.apache.org/jira/browse/solr, you have to create a user ID to add JIRAs/patches. Code is always welcome! Best Erick On Thu, Jul 12, 2012 at 6:43 AM, Gautier Koscielny kosci...@ebi.ac.uk wrote: Hello, I work near Cambridge at the EMBL-EBI and I would like to contribute to SOLR. Our current project involved 2 teams developing a full text search service based on SOLR 3.6. We have had issues when trying to replicate a master copy of an index to a slave using HTTP proxy settings passed to the JRE. To solve this issue, I've created a copy of the SnapPuller and the ReplicationHandler, modified the code to manage proxy settings and modified the configuration of the SOLR slave to use this new handler. We have tested the replication using proxy settings in our environment with success. What I would like to do now is to apply these changes to the SnapPuller directly: - check proxy settings before creating the httpClient - apply the proxy setting to the httpClient HostConfiguration. I don't want to patch 3.6 and would like to apply the change to SOLR 4. Please, tell me what do you think about this change and if I can proceed. What is the usual procedure to commit code? Best regards, Gautier - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412727#comment-13412727 ] Robert Muir commented on SOLR-3617: --- Somewhat related: it might be worth adding /etc/init.d-type start/stop/etc scripts, at least one that works for Linux. I'm sure people have these already themselves or are writing their own. Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412728#comment-13412728 ] Uwe Schindler commented on SOLR-3617: - As fan of Ubuntu, please also add a upstart config (/etc/init/solr.conf), thats much easier to write than stupid shell script with the well known start|stop|... switch :-) Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3618) Enable replication of master using proxy settings
Gautier Koscielny created SOLR-3618: --- Summary: Enable replication of master using proxy settings Key: SOLR-3618 URL: https://issues.apache.org/jira/browse/SOLR-3618 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 3.6.1 Reporter: Gautier Koscielny Fix For: 3.6.1 Check whether system properties http.proxyHost and http.proxyPort are set to initialize the httpClient instance properly in the SnapPuller class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-3377: -- Assignee: Yonik Seeley eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4192) SpatialStrategy: Remove isPolyField() and createField(shape)
[ https://issues.apache.org/jira/browse/LUCENE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-4192. -- Resolution: Fixed Committed to 4x trunk SpatialStrategy: Remove isPolyField() and createField(shape) Key: LUCENE-4192 URL: https://issues.apache.org/jira/browse/LUCENE-4192 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 Attachments: LUCENE-4192_remove_spatial_isPolyField_and_createField.patch On SpatialStrategy, I think the presence of isPolyField() and the single-field createField(shape) is a but much. They were probably copied from Solr's FieldType design without really thinking much if they were really needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3609) Pin down the Solr webapp to a specific directory rather than a unique random directory.
[ https://issues.apache.org/jira/browse/SOLR-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-3609. --- Resolution: Fixed Pin down the Solr webapp to a specific directory rather than a unique random directory. --- Key: SOLR-3609 URL: https://issues.apache.org/jira/browse/SOLR-3609 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3609.patch I'd like to pin down the extracted webapp dir to a constant known location. I think this is a better user experience, in that the location is fixed, and it also would allow us to write scripts that can find all of our jars. For example, there is currently some functionality in ZkController.main to handle some ZooKeeper tasks before starting Solr - something you often want to be able to do. There are more tools that would be nice to add. To create the best user experience for these tools, it would be great to have an example/cloud-tools directory with some simple scripts to make for easy cmd line execution. These scripts will need to be able to easily locate the webapps jars. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3460) Improve cmd line config bootstrap tool.
[ https://issues.apache.org/jira/browse/SOLR-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3460: -- Fix Version/s: 5.0 Improve cmd line config bootstrap tool. --- Key: SOLR-3460 URL: https://issues.apache.org/jira/browse/SOLR-3460 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: SOLR-3460.patch, SOLR-3460.patch Improve cmd line tool for bootstrapping config sets. Rather than take a config set name and directory, make it work like -Dboostrap_conf=true and read solr.xml to find config sets. Config sets will be named after the collection and auto linked to the identically named collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412825#comment-13412825 ] David Smiley commented on LUCENE-4173: -- +0. I am guessing that Ryan added this concept so that it would be easier to demonstrate easily index a variety of shapes in a variety of different ways, ignoring cases where some strategy doesn't handle some particular shape. But I think this feature if you could call it that, has dubious value otherwise. Clearly it does and should default to false so it won't harm anyone if they leave this alone. Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
SynonymFilter, FST, and Aho-Corasick algorithm
Hello. I'm embarking on developing code similar to the SynonymFilter but which merely needs to record out of band to the analysis where there is matching text in the input tokens to the corpus in the FST. I'm calling this a keyword tagger in which I shove text through it and when it's done it tells me at what offsets there is a match to a corpus of keyword phrases, and to what keywords/phrases they were exactly. It doesn't have to inject or modify the token stream because the results of this are going elsewhere. Although, it would be a fine approach to only omit the tags as I call them as a way of consuming the results, but I'm not indexing them so it doesn't matter. I noticed the following TODOs at the start: // TODO: maybe we should resolve token - wordID then run // FST on wordIDs, for better perf? I intend on doing this since my matching keyword/phrases are often more than one word, and I expect this will save memory and be faster. // TODO: a more efficient approach would be Aho/Corasick's // algorithm // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm // It improves over the current approach here // because it does not fully re-start matching at every // token. For example if one pattern is a b c x // and another is b c d and the input is a b c d, on // trying to parse a b c x but failing when you got to x, // rather than starting over again your really should // immediately recognize that b c d matches at the next // input. I suspect this won't matter that much in // practice, but it's possible on some set of synonyms it // will. We'd have to modify Aho/Corasick to enforce our // conflict resolving (eg greedy matching) because that algo // finds all matches. This really amounts to adding a .* // closure to the FST and then determinizing it. Could someone please clarify how the problem in the example above is to be fixed? At the end it states how to solve it, but I don't know how to do that and I'm not sure if there is anything more to it since after all if it's as easy as that last sentence sounds then it would have been done already ;-) This code is intense! I wish FSTs were better documented. For example, there are no javadocs on public members of FST.Arc like output and nextFinalOutput which are pertinent since SynonymFilter directly accesses them. IMO the state of FSTs is such that those that wrote them know how they work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't touch them because we don't know how. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412872#comment-13412872 ] Mark Miller commented on SOLR-3488: --- I'm going to add a collection RELOAD command, and beef up the tests a little. Still more to do after that. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412874#comment-13412874 ] Markus Jelsma commented on SOLR-3488: - Is it intended for a collection RELOAD action to reload all collection cores immediately? That implies downtime i assume? Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
Knowing how fsts work and being comfortable with the api that evolved through a series of exploratory patches are two different things. I like my fsa api much better and there was an effort to do something similar for lucene but i gave up at some point because the speed of development killed me. Can you describe what youre trying to achieve in more detail? Ive used fsts for pattern matching (sequences of arbitrary length) and my experience is that simple state trackers work wery well (even if they may seem to do lots of spurious tracking). On Jul 12, 2012 5:09 PM, Smiley, David W. dsmi...@mitre.org wrote: Hello. I'm embarking on developing code similar to the SynonymFilter but which merely needs to record out of band to the analysis where there is matching text in the input tokens to the corpus in the FST. I'm calling this a keyword tagger in which I shove text through it and when it's done it tells me at what offsets there is a match to a corpus of keyword phrases, and to what keywords/phrases they were exactly. It doesn't have to inject or modify the token stream because the results of this are going elsewhere. Although, it would be a fine approach to only omit the tags as I call them as a way of consuming the results, but I'm not indexing them so it doesn't matter. I noticed the following TODOs at the start: // TODO: maybe we should resolve token - wordID then run // FST on wordIDs, for better perf? I intend on doing this since my matching keyword/phrases are often more than one word, and I expect this will save memory and be faster. // TODO: a more efficient approach would be Aho/Corasick's // algorithm // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm // It improves over the current approach here // because it does not fully re-start matching at every // token. For example if one pattern is a b c x // and another is b c d and the input is a b c d, on // trying to parse a b c x but failing when you got to x, // rather than starting over again your really should // immediately recognize that b c d matches at the next // input. I suspect this won't matter that much in // practice, but it's possible on some set of synonyms it // will. We'd have to modify Aho/Corasick to enforce our // conflict resolving (eg greedy matching) because that algo // finds all matches. This really amounts to adding a .* // closure to the FST and then determinizing it. Could someone please clarify how the problem in the example above is to be fixed? At the end it states how to solve it, but I don't know how to do that and I'm not sure if there is anything more to it since after all if it's as easy as that last sentence sounds then it would have been done already ;-) This code is intense! I wish FSTs were better documented. For example, there are no javadocs on public members of FST.Arc like output and nextFinalOutput which are pertinent since SynonymFilter directly accesses them. IMO the state of FSTs is such that those that wrote them know how they work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't touch them because we don't know how. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412888#comment-13412888 ] Mark Miller commented on SOLR-3488: --- bq. Is it intended for a collection RELOAD action to reload all collection cores immediately? Yes, at least initially. Essentially a convenience method for reloading your cores to pick up changed config or settings. There may be other ways we allow that to happen more automatically eventually, but at a minimum we need the ability to trigger a collection wide reload. There are things to consider for a truly massive cluster - do you really want every node trying to read the new configs form zk at the same time? That's in the future if I end up working on it. We'd have to see how many servers it takes before you end up with a problem, if it is indeed a problem at all. bq. That implies downtime i assume? I'm not sure why? Core reloads don't involve any down time. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
On Jul 12, 2012, at 11:51 AM, Dawid Weiss wrote: Knowing how fsts work and being comfortable with the api that evolved through a series of exploratory patches are two different things. I like my fsa api much better and there was an effort to do something similar for lucene but i gave up at some point because the speed of development killed me. Do you mean it was slow to coordinate / get consensus or…? Just curious. Can you describe what youre trying to achieve in more detail? Ive used fsts for pattern matching (sequences of arbitrary length) and my experience is that simple state trackers work wery well (even if they may seem to do lots of spurious tracking). I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm The number of names I want to handle is in the millions and so use of Lucene's FST is essential as I see it. ~ David On Jul 12, 2012 5:09 PM, Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org wrote: Hello. I'm embarking on developing code similar to the SynonymFilter but which merely needs to record out of band to the analysis where there is matching text in the input tokens to the corpus in the FST. I'm calling this a keyword tagger in which I shove text through it and when it's done it tells me at what offsets there is a match to a corpus of keyword phrases, and to what keywords/phrases they were exactly. It doesn't have to inject or modify the token stream because the results of this are going elsewhere. Although, it would be a fine approach to only omit the tags as I call them as a way of consuming the results, but I'm not indexing them so it doesn't matter. I noticed the following TODOs at the start: // TODO: maybe we should resolve token - wordID then run // FST on wordIDs, for better perf? I intend on doing this since my matching keyword/phrases are often more than one word, and I expect this will save memory and be faster. // TODO: a more efficient approach would be Aho/Corasick's // algorithm // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm // It improves over the current approach here // because it does not fully re-start matching at every // token. For example if one pattern is a b c x // and another is b c d and the input is a b c d, on // trying to parse a b c x but failing when you got to x, // rather than starting over again your really should // immediately recognize that b c d matches at the next // input. I suspect this won't matter that much in // practice, but it's possible on some set of synonyms it // will. We'd have to modify Aho/Corasick to enforce our // conflict resolving (eg greedy matching) because that algo // finds all matches. This really amounts to adding a .* // closure to the FST and then determinizing it. Could someone please clarify how the problem in the example above is to be fixed? At the end it states how to solve it, but I don't know how to do that and I'm not sure if there is anything more to it since after all if it's as easy as that last sentence sounds then it would have been done already ;-) This code is intense! I wish FSTs were better documented. For example, there are no javadocs on public members of FST.Arc like output and nextFinalOutput which are pertinent since SynonymFilter directly accesses them. IMO the state of FSTs is such that those that wrote them know how they work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't touch them because we don't know how. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.orgmailto:dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.orgmailto:dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
The development was too fast for me to keep up. And by the time i had some concept of the api mike wrote about million lines of code that would have to be rewritten ;) The current api isn't bad. Its fast. I asked for an example of what you're trying to do because then i'd be able to tell you if what i used would work. The number of entries does not matter. I did use fsts but simple fsts nothing special. On Jul 12, 2012 6:05 PM, Smiley, David W. dsmi...@mitre.org wrote: On Jul 12, 2012, at 11:51 AM, Dawid Weiss wrote: Knowing how fsts work and being comfortable with the api that evolved through a series of exploratory patches are two different things. I like my fsa api much better and there was an effort to do something similar for lucene but i gave up at some point because the speed of development killed me. Do you mean it was slow to coordinate / get consensus or…? Just curious. Can you describe what youre trying to achieve in more detail? Ive used fsts for pattern matching (sequences of arbitrary length) and my experience is that simple state trackers work wery well (even if they may seem to do lots of spurious tracking). I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho –Corasick_string_matching_algorithm The number of names I want to handle is in the millions and so use of Lucene's FST is essential as I see it. ~ David On Jul 12, 2012 5:09 PM, Smiley, David W. dsmi...@mitre.org wrote: Hello. I'm embarking on developing code similar to the SynonymFilter but which merely needs to record out of band to the analysis where there is matching text in the input tokens to the corpus in the FST. I'm calling this a keyword tagger in which I shove text through it and when it's done it tells me at what offsets there is a match to a corpus of keyword phrases, and to what keywords/phrases they were exactly. It doesn't have to inject or modify the token stream because the results of this are going elsewhere. Although, it would be a fine approach to only omit the tags as I call them as a way of consuming the results, but I'm not indexing them so it doesn't matter. I noticed the following TODOs at the start: // TODO: maybe we should resolve token - wordID then run // FST on wordIDs, for better perf? I intend on doing this since my matching keyword/phrases are often more than one word, and I expect this will save memory and be faster. // TODO: a more efficient approach would be Aho/Corasick's // algorithm // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm // It improves over the current approach here // because it does not fully re-start matching at every // token. For example if one pattern is a b c x // and another is b c d and the input is a b c d, on // trying to parse a b c x but failing when you got to x, // rather than starting over again your really should // immediately recognize that b c d matches at the next // input. I suspect this won't matter that much in // practice, but it's possible on some set of synonyms it // will. We'd have to modify Aho/Corasick to enforce our // conflict resolving (eg greedy matching) because that algo // finds all matches. This really amounts to adding a .* // closure to the FST and then determinizing it. Could someone please clarify how the problem in the example above is to be fixed? At the end it states how to solve it, but I don't know how to do that and I'm not sure if there is anything more to it since after all if it's as easy as that last sentence sounds then it would have been done already ;-) This code is intense! I wish FSTs were better documented. For example, there are no javadocs on public members of FST.Arc like output and nextFinalOutput which are pertinent since SynonymFilter directly accesses them. IMO the state of FSTs is such that those that wrote them know how they work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't touch them because we don't know how. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com wrote: The development was too fast for me to keep up. And by the time i had some concept of the api mike wrote about million lines of code that would have to be rewritten ;) Mike is very happy to help rewrite that code for a better FST API :) We can and should also make incremental improvements. I do agree it's horrible to have code that only a small set of people understand: such code is effectively dead. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
The api shouldn't be the goal. Those initial fst changes were driven by real needs an optimizations and are still great. The api will eventually take better shape and form based on those use cases i'm sure. That patch that I had tried to extract a common notion of output and its grammar. It was half baked anyway so no big loss. On Jul 12, 2012 6:20 PM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com wrote: The development was too fast for me to keep up. And by the time i had some concept of the api mike wrote about million lines of code that would have to be rewritten ;) Mike is very happy to help rewrite that code for a better FST API :) We can and should also make incremental improvements. I do agree it's horrible to have code that only a small set of people understand: such code is effectively dead. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412920#comment-13412920 ] Yonik Seeley commented on SOLR-3377: Thanks Bernd, this looks like an improvement. After some ad-hoc testing, it seems we still have problems with q=(+id:42) Another minor concern: the change to clause.field to exclude things like '(' also means that when it's not a valid lucene query, our reconstructed query will currently drop the paren. Example: A query of (a:b with a qf=id correctly produces id:(a:b but a query of (id:b produces id:b That type of thing should really only affect exact match type fields where punctuation isn't dropped - not sure how much of an issue it really is. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412927#comment-13412927 ] Markus Jelsma commented on SOLR-3488: - Thanks for claryfing, it makes sense. About the downtime on core reload, a load balancer pinging Solr's admin/ping handler will definately mark the node as down; the ping request will time out for up to a few seconds or even longer in case of many firstSearcher events. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
On Thu, Jul 12, 2012 at 12:19 PM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com wrote: The development was too fast for me to keep up. And by the time i had some concept of the api mike wrote about million lines of code that would have to be rewritten ;) Mike is very happy to help rewrite that code for a better FST API :) We can and should also make incremental improvements. I do agree it's horrible to have code that only a small set of people understand: such code is effectively dead. I agree, and think we should make improvements to the API whenever we can. but a fast, efficient, and self-documenting FST API is probably going to be elusive I think a lot of this could be fixed with examples and docs, which we've been working at too, e.g.: http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/util/fst/package-summary.html#package_description The biggest problem I have with documentation here is when it becomes out-of-date. We've made a lot of progress here, we have a javadocs-lint task that runs in hudson and checks all of our links and fails if any are dead, etc. But this does us no good for code samples. I think we need to seriously revisit/develop a plan for code samples in documentation. All the samples we have in various docs (e.g. package documentation) is very fragile, and it discourages me totally from adding any advanced examples or any more than are minimally necessary to get started, because I'm afraid of the manual maintenance cost. Instead I think we should setup a proper examples infrastructure, where these examples are actually compiled and such. We can still link to them in javadocs. Have a look at this example from the demo/ module: http://lucene.apache.org/core/4_0_0-ALPHA/demo/overview-summary.html#Location_of_the_source I think we should have more than just SearchFiles and IndexFiles and also move our examples here, rather than being inlined in the javadocs text. This way they are compile-time checked, and we can link to them from anywhere (its safe, and we have link-checkers that prove it). I'm open to any other ideas though: this is just the best one i have now. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412957#comment-13412957 ] Mark Miller commented on SOLR-3488: --- Yeah, this sounds like something we have to fix to me. There should not be a gap in serving requests on core reload. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3598) Provide option to allow aliased field to be included in query for EDisMax QParser
[ https://issues.apache.org/jira/browse/SOLR-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412962#comment-13412962 ] Jamie Johnson commented on SOLR-3598: - Just to make sure I understand you're saying create a pseudo field which we use for querying the actual fields? so basically pseudofield=realfield1,realfield2,realfield3 Provide option to allow aliased field to be included in query for EDisMax QParser - Key: SOLR-3598 URL: https://issues.apache.org/jira/browse/SOLR-3598 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 3.6, 4.0-ALPHA Reporter: Jamie Johnson Priority: Minor Attachments: alias.patch I currently have a situation where I'd like the original field included in the query, for instance I have several fields with differing granularity, name, firstname and lastname. Some of my sources differentiate between these so I can fill out firstname and lastname, while others don't and I need to just place this information in the name field. When querying I'd like to be able to say name:Jamie and have it translated to name:Jamie first_name:Jamie last_name:Jamie. In order to do this it creates an alias cycle and the EDisMax Query parser throws an exception about it. Ideally there would be an option to include the original field as part of the query to support this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
I think a lot of this could be fixed with examples and docs, which We use a simple ANT task that extracts snippets of code from Java (very often unit tests) and include these in JavaDocs, unfortunately by post-processing. See the example here: http://download.carrot2.org/stable/javadoc/ and the sources (linked) are here: https://github.com/carrot2/carrot2/blob/df49d66087d0da9e87043e13a400ac148952a41c/applications/carrot2-examples/examples/org/carrot2/examples/clustering/ClusteringDocumentList.java As you can see there are simple tags of the form: [[[start:clustering-document-list-intro]]] ... [[[end:clustering-document-list-intro]]] these get extracted to separate files which are then included in the javadocs. Crude, but works. I'm sure it could be improved and probably other folks have come up with a similar idea, I just don't know of such attempts. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
On Thu, Jul 12, 2012 at 1:26 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I think a lot of this could be fixed with examples and docs, which We use a simple ANT task that extracts snippets of code from Java (very often unit tests) and include these in JavaDocs, unfortunately by post-processing. See the example here: http://download.carrot2.org/stable/javadoc/ and the sources (linked) are here: https://github.com/carrot2/carrot2/blob/df49d66087d0da9e87043e13a400ac148952a41c/applications/carrot2-examples/examples/org/carrot2/examples/clustering/ClusteringDocumentList.java As you can see there are simple tags of the form: [[[start:clustering-document-list-intro]]] ... [[[end:clustering-document-list-intro]]] these get extracted to separate files which are then included in the javadocs. Crude, but works. I'm sure it could be improved and probably other folks have come up with a similar idea, I just don't know of such attempts. Thats more sophisticated than what we do with the javadocs linksource option in the demo. The key advantage there is that we never want to link to any trunk/development code as it may not exist. So with linksource (like my example), it basically makes an htmlized version of your code included in the javadocs. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
On Thu, Jul 12, 2012 at 1:29 PM, Robert Muir rcm...@gmail.com wrote: Thats more sophisticated than what we do with the javadocs linksource option in the demo. The key advantage there is that we never want to link to any trunk/development code as it may not exist. So with linksource (like my example), it basically makes an htmlized version of your code included in the javadocs. But your view source button seems to also do this? So we would just want to omit the github link at the bottom I think? I like this solution... how much code is it... can we have it? -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412997#comment-13412997 ] Ryan McKinley commented on LUCENE-4173: --- I'm fine removing it from the lucene strategies -- the motivation for this feature was to copy the same shape to multiple strategies and compare the behavior. this can be implemented at the solr level though... Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4207) speed up our slowest tests
[ https://issues.apache.org/jira/browse/LUCENE-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413004#comment-13413004 ] Michael Garski commented on LUCENE-4207: I have a similar MacBook to Christian (OSX Lion, Mid 2010, Core i7, 8GB RAM, 480GB SSD, ~90% full) and running ant test takes 20-25 minutes to execute. I have not run the stats that Dawid posted previously, those times are just what I have seen in the past few months. speed up our slowest tests -- Key: LUCENE-4207 URL: https://issues.apache.org/jira/browse/LUCENE-4207 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Was surprised to hear from Christian that lucene/solr tests take him 40 minutes on a modern mac. This is too much. Lets look at the slowest tests and make them reasonable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License
[ https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4217: -- Attachment: LUCENE-4217.patch Hi, attached is a patch with a complete overhaul of the Clover reporting on Lucene + Solr: - Clover is loaded by IVY from Maven Central - The License file was committed to lucene/tools/clover and is automatically used. This is possible according to the mail from Nicolas Muldon: {noformat} On Fri, Dec 18, 2009 at 1:33 AM, Nicholas Muldoon nmuld...@atlassian.com wrote: - Hi, Atlassian are excited to be presenting Apache with a site license for Clover 2.6. This Clover license can be used for any code that is under an org.apache package. Further, this license can be used by any developer on their machine in conjunction with our Eclipse or IntelliJ plugins for development on an org.apache project. {noformat} Also Mike me talked to Nicholas and Nick Pellow, and we got the following response: {noformat} On Sat, Dec 19, 2009 at 10:38 PM, Nick Pellow npel...@atlassian.com wrote: Hi Mike, That would be great if you could forward this to committ...@apache.org. The license is available to anyone working on the org.apache.* be it in IDEA/Eclipse/Ant/Maven locally, or on a central build server. Since the license will only instrument and report coverage on org.apache packages, please mention that it is fine to commit this license to each project if it makes running builds easier. ie just check out the project and run with Clover, without the need for the extra step of locating and installing the clover license. Cheers, Nick On 19/12/2009, at 1:11 AM, Michael McCandless wrote: Woops, I meant The only restriction is that it will only test coverage of packages under org.apache, below. Mike On Fri, Dec 18, 2009 at 9:05 AM, Michael McCandless luc...@mikemccandless.com wrote: Since this generous offer extends beyond Lucene... I'd like to forward this to committ...@apache.org, pointing to where the license is available (https://svn.apache.org/repos/private/committers/donated-licenses/cl over/2.6.x), explaining that Lucene upgraded (providing the link to our coverage report), etc. But I wanted to confirm with you all first: is this OK? This license may be used by anyone? The only restriction is that it will only test coverage of packages under org.apache.lucene? I can draft something up and run it by you all first, if this makes sense... {noformat} - The ANT tasks were cleaned up and now work per module without crazy filesets. Only test-framework is not clovered, as it was explicitely disabled by the managers of the new buildsystem. Unfortunately if your make compile-core in test-framework depend on clover, it will correctly clover it, but as its in src/ and not /test it will be counted as source code and not test code and appears in the report as such. I left it disabled for now until we find a solution. - Solr now reports everything also coverage on all referred Lucene modules (cool!) If you want to run a test build with clover, do: {noformat} # must be cleaned first on top-level, so all half baked code is gone ant clean # go to lucene or solr ant -Drun.clover=true test generate-clover-reports {noformat} This downloads clover from Maven central and runs all tests with clover and publishes the report. The target folder changed a bit (clenaup), we must change Jenkins config/scripts (I can do). For Lucene and Solr the Cloverage Database is always placed in Lucene's build folder (as before), this is why you must clean on top-level. Load clover.jar from ivy-cachepath andy ship sources with License - Key: LUCENE-4217 URL: https://issues.apache.org/jira/browse/LUCENE-4217 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-4217.patch When clover granted use the license for their clover-2.6.3.jar file they allowed us to ship this license file to every developer. Currently clover setup is very hard for users, so this issue will make it simple. If you want to run tests with clover, just pass -Drun.clover=true to ant clean test. ANT will then download clover via IVY and point it to the license file in our tools folder. The license is supplemented by the original mail from Atlassian, that everybody is allowed to use it with code in the org.apache. java package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License
[ https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413044#comment-13413044 ] Uwe Schindler commented on LUCENE-4217: --- I wanted to mention: The code in the attached patch is ASF of course, but the License File is of course not Apache License. But there is only one checkbox in JIRA! Load clover.jar from ivy-cachepath andy ship sources with License - Key: LUCENE-4217 URL: https://issues.apache.org/jira/browse/LUCENE-4217 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-4217.patch When clover granted use the license for their clover-2.6.3.jar file they allowed us to ship this license file to every developer. Currently clover setup is very hard for users, so this issue will make it simple. If you want to run tests with clover, just pass -Drun.clover=true to ant clean test. ANT will then download clover via IVY and point it to the license file in our tools folder. The license is supplemented by the original mail from Atlassian, that everybody is allowed to use it with code in the org.apache. java package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2891 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2891/ 3 tests failed. REGRESSION: org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([F8B2450BF5C21177:D98FB31590BF391B]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:487) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:454) at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.add1document(TestSqlEntityProcessorDelta3.java:83) at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete(TestSqlEntityProcessorDelta3.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: java.lang.RuntimeException: REQUEST FAILED:
[jira] [Commented] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License
[ https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413050#comment-13413050 ] Uwe Schindler commented on LUCENE-4217: --- We should maybe exclude the License file from the Source ZIP/TGZ file, but keep it in SVN? it's just an excludes.../ Load clover.jar from ivy-cachepath andy ship sources with License - Key: LUCENE-4217 URL: https://issues.apache.org/jira/browse/LUCENE-4217 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-4217.patch When clover granted use the license for their clover-2.6.3.jar file they allowed us to ship this license file to every developer. Currently clover setup is very hard for users, so this issue will make it simple. If you want to run tests with clover, just pass -Drun.clover=true to ant clean test. ANT will then download clover via IVY and point it to the license file in our tools folder. The license is supplemented by the original mail from Atlassian, that everybody is allowed to use it with code in the org.apache. java package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413055#comment-13413055 ] Robert Muir commented on LUCENE-4100: - {quote} Your index at 1) does not have to be 'optimized' (it does not have to consist of one index segment only). In fact, maxscore can be more efficient with multiple segments because multiple maxscores are computed for many frequent terms for subsets of documents, resulting in tighter bounds and more effective pruning. {quote} I've been thinking about this a lot lately: while what you say is true, thats because you reprocess all segments with IndexRewriter (which is fine for a static collection). But this algorithm in general is not rank safe with incremental indexing: the problem is that when doing actual scoring, scores consist of per-segment/within document stats (term frequency, document length), but also are affected by collection-wide statistics from many other segments (IDF, average document length, ...) or even machines in a distributed collection. So I think for this to work and remain rank-safe, we cannot write the entire score into the segment, because the score at actual search time is dependent on all the other segments being searched. Instead I think this can only work when we can easily factor out an impact (e.g. in the case of DefaultSimilarity the indexed maxscore excludes the IDF component, this is instead multiplied in at search time). I don't see how it can be rank-safe with algorithms like BM25 and incremental indexing, where parameters like average document length are not simple multiplicative factors into the formula: and determine exactly how important tf versus document length play a role in the score, but I'll think about it some more. Maxscore - Efficient Scoring Key: LUCENE-4100 URL: https://issues.apache.org/jira/browse/LUCENE-4100 Project: Lucene - Java Issue Type: Improvement Components: core/codecs, core/query/scoring, core/search Affects Versions: 4.0-ALPHA Reporter: Stefan Pohl Labels: api-change, patch, performance Fix For: 4.0 Attachments: contrib_maxscore.tgz, maxscore.patch At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first published in the IR domain in 1995 by H. Turtle J. Flood, that I find deserves more attention among Lucene users (and developers). I implemented a proof of concept and did some performance measurements with example queries and lucenebench, the package of Mike McCandless, resulting in very significant speedups. This ticket is to get started the discussion on including the implementation into Lucene's codebase. Because the technique requires awareness about it from the Lucene user/developer, it seems best to become a contrib/module package so that it consciously can be chosen to be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413080#comment-13413080 ] Yonik Seeley commented on SOLR-3488: bq. There should not be a gap in serving requests on core reload. Just to clarify: it's more a practical gap than a real gap... it should be impossible for a query to not be serviced - it's just that a cold core could take longer to service the query than desired. But it *should* be pretty easy to allow waiting for that searcher in the new core. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413081#comment-13413081 ] Erik Hatcher commented on SOLR-1725: bq. i plan to cmmit backport to 4x in the next 24 hours. Hoss - you go! Thank you for wrangling this one and polishing out the pedantic details needed to get it to this state. Way +1. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Assignee: Erik Hatcher Labels: UpdateProcessor Fix For: 4.1 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413086#comment-13413086 ] Mark Miller commented on SOLR-3488: --- Ah, did not catch it was just a timeout issue. Was wondering what the problem could be. Yeah, not as bad I thought then. An option would be nice. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4218) contrary to documentation Document.get(field) on numeric field returns null
Jamie created LUCENE-4218: - Summary: contrary to documentation Document.get(field) on numeric field returns null Key: LUCENE-4218 URL: https://issues.apache.org/jira/browse/LUCENE-4218 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 4.0-ALPHA Environment: Darwin e4-ce-8f-0f-c2-b0.dummy.porta.siemens.net 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64 Reporter: Jamie Priority: Critical A call to Numeric num = indexableField.numericValue() comes up with a correct value, whereas Document.get(field) yields null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
Thats more sophisticated than what we do with the javadocs linksource option in the demo. We don't even sometimes include full sources of these snippets. They are demonstrational and we just want to make sure they compile/ run cleanly (that's why they're typically part of tests, not core sources). But your view source button seems to also do this? So we would just I believe that's js/css magic but I'm not sure. I like this solution... how much code is it... can we have it? Sure. Staszek put it together, it's really nothing fancy. We only have a binary in c2 repository but I'm sure we can make it available -- I'll ask Staszek to put it on github. One thing I forgot was that we generate that overview from an XML/XSL file (using xincludes) and this is probably an overkill for Lucene. I'd be much faster/ easier to just html-ize those snippets and include them directly with replacement patterns (even an ANT copy/filter would do here). Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4207) speed up our slowest tests
[ https://issues.apache.org/jira/browse/LUCENE-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413109#comment-13413109 ] Dawid Weiss commented on LUCENE-4207: - It may be the I/O overhead... those tests are generating lots of files, maybe with a nearly full disk things slow down a lot (?). speed up our slowest tests -- Key: LUCENE-4207 URL: https://issues.apache.org/jira/browse/LUCENE-4207 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Was surprised to hear from Christian that lucene/solr tests take him 40 minutes on a modern mac. This is too much. Lets look at the slowest tests and make them reasonable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
bq. I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm I did a similar thing but: 1) based on entire words as individual tokens (instead of letter-by-letter), 2) all words present in input patterns can be encoded as a separate data structure which maps to an unique integer 3) the matcher is essentially tracking the following: Match { automaton_arc toNextNode; final int matchStart; int matchLength; } you then process the input word-by-word and advance each Match if there is an arc leaving toNextNode and matching the current word. If the toNextNode arc is final then you've hit a match and need to record it (it may not be the longest match so if you only care about the longest matches then additional processing is required). You create new Match objects and discard mismatched existing Matches as you process the input. Essentially, it's as if you tried to walk down in the automaton starting on every single position in the input. This may seem costly but in reality the matches are infrequent compared to the input text and they are rarely very, very long (to create lots of states). I used the above approach for entity matching and it worked super-fast. All this said, an Aho-Corasick transition graph would of course be more efficient. The question is how much more efficient and how much code/ work you'll need to put into it to make it work :) Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
bq. You create new Match objects and discard mismatched existing Matches I didn't say that explicitly but obviously you don't need to create new objects when you're doing this. The pool of match states can be only as big as the longest pattern so you can pool them and reuse. Zero allocation cost. Dawid On Thu, Jul 12, 2012 at 9:50 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: bq. I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm I did a similar thing but: 1) based on entire words as individual tokens (instead of letter-by-letter), 2) all words present in input patterns can be encoded as a separate data structure which maps to an unique integer 3) the matcher is essentially tracking the following: Match { automaton_arc toNextNode; final int matchStart; int matchLength; } you then process the input word-by-word and advance each Match if there is an arc leaving toNextNode and matching the current word. If the toNextNode arc is final then you've hit a match and need to record it (it may not be the longest match so if you only care about the longest matches then additional processing is required). You create new Match objects and discard mismatched existing Matches as you process the input. Essentially, it's as if you tried to walk down in the automaton starting on every single position in the input. This may seem costly but in reality the matches are infrequent compared to the input text and they are rarely very, very long (to create lots of states). I used the above approach for entity matching and it worked super-fast. All this said, an Aho-Corasick transition graph would of course be more efficient. The question is how much more efficient and how much code/ work you'll need to put into it to make it work :) Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
Thanks for your explanation, I already had a very rough idea of the approach. Can Aho-Corasick be implemented with Lucene's FST? Again, the SynonymFilterFactory said this RE Aho-Corasick: // This really amounts to adding a .* // closure to the FST and then determinizing it. You didn't mention FST once and that's the API I'm having trouble groking. ~ David On Jul 12, 2012, at 3:51 PM, Dawid Weiss [via Lucene] wrote: bq. I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithmhttp://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm I did a similar thing but: 1) based on entire words as individual tokens (instead of letter-by-letter), 2) all words present in input patterns can be encoded as a separate data structure which maps to an unique integer 3) the matcher is essentially tracking the following: Match { automaton_arc toNextNode; final int matchStart; int matchLength; } you then process the input word-by-word and advance each Match if there is an arc leaving toNextNode and matching the current word. If the toNextNode arc is final then you've hit a match and need to record it (it may not be the longest match so if you only care about the longest matches then additional processing is required). You create new Match objects and discard mismatched existing Matches as you process the input. Essentially, it's as if you tried to walk down in the automaton starting on every single position in the input. This may seem costly but in reality the matches are infrequent compared to the input text and they are rarely very, very long (to create lots of states). I used the above approach for entity matching and it worked super-fast. All this said, an Aho-Corasick transition graph would of course be more efficient. The question is how much more efficient and how much code/ work you'll need to put into it to make it work :) Dawid
Re: SynonymFilter, FST, and Aho-Corasick algorithm
This comment was probably made against Brics library FST implementation because you can't really add a .* to the algorithm that builds the FST incrementally is not possible because it accepts fixed strings (and builds an already determinized automaton). That's part of the reason my runtime solution was suboptimal -- the tradeoff is that you can construct the FST very efficiently from millions of input entries but don't need to manipulate it. Brics will probably bail out with an OOM if you try to manipulate large graphs. I'd gladly share my code because it was Lucene based... but I can't -- paid consulting job, sorry. Shouldn't be too hard to rewrite from scratch though, really. Dawid On Thu, Jul 12, 2012 at 10:07 PM, Smiley, David W. dsmi...@mitre.org wrote: Thanks for your explanation, I already had a very rough idea of the approach. Can Aho-Corasick be implemented with Lucene's FST? Again, the SynonymFilterFactory said this RE Aho-Corasick: // This really amounts to adding a .* // closure to the FST and then determinizing it. You didn't mention FST once and that's the API I'm having trouble groking. ~ David On Jul 12, 2012, at 3:51 PM, Dawid Weiss [via Lucene] wrote: bq. I rather like Wikipedia's definition: http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm I did a similar thing but: 1) based on entire words as individual tokens (instead of letter-by-letter), 2) all words present in input patterns can be encoded as a separate data structure which maps to an unique integer 3) the matcher is essentially tracking the following: Match { automaton_arc toNextNode; final int matchStart; int matchLength; } you then process the input word-by-word and advance each Match if there is an arc leaving toNextNode and matching the current word. If the toNextNode arc is final then you've hit a match and need to record it (it may not be the longest match so if you only care about the longest matches then additional processing is required). You create new Match objects and discard mismatched existing Matches as you process the input. Essentially, it's as if you tried to walk down in the automaton starting on every single position in the input. This may seem costly but in reality the matches are infrequent compared to the input text and they are rarely very, very long (to create lots of states). I used the above approach for entity matching and it worked super-fast. All this said, an Aho-Corasick transition graph would of course be more efficient. The question is how much more efficient and how much code/ work you'll need to put into it to make it work :) Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr posting question
Hi all, I received a report of a problem with posting data to Solr. The post method is a multi-part form, so if you inspect it, it looks something like this: boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 abc;def;ghi ---boundary--- The problem is that, for form data, multiple values for an attribute are supposed to just be repeated form elements, e.g.: boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 abc;def;ghi ---boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 second value ---boundary--- What's happening, though, when this is posted to Solr is that any semicolons in the data are being interpreted as multi-value separators. So when the above is posted, Solr apparently thinks that metadata_attribute_name has 4 values, abc, def, ghi, and second value, rather than two values, abc;def;ghi and second value. Is this intended behavior, and if so, how am I supposed to escape ; characters when communicating to Solr in this way? Karl
[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl
[ https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4211: Attachment: LUCENE-4211.patch Updated patch with docsEnum and docsAndPositionsEnums asserts/state machines. Also added a AssertingDirectoryReader which ensures subreaders are wrapped with AssertingAtomicReaders. Also wraps termvectors with AssertingFields. TODO: # add state machine/asserts to TermsEnum # add an AssertingCodec that does these checks always, put it in rotation so that any code (e.g. solr) not necessarily using newSearcher() from LuceneTestCase still gets these checks. There is a problem with a function query and fieldcache insanity, i dont understand this FCInvisibleReader etc goign on here: {noformat} ant test -Dtestcase=TestOrdValues -Dtests.method=testReverseOrdFieldRank -Dtests.seed=E54A53902AE23DE0 -Dtests.slow=true -Dtests.locale=fr_CH -Dtests.timezone=America/Argentina/Cordoba -Dtests.file.encoding=UTF-8 {noformat} Its possible this is unrelated to the patch... in LuceneTestCase.maybeWrapReader: add an asserting impl Key: LUCENE-4211 URL: https://issues.apache.org/jira/browse/LUCENE-4211 Project: Lucene - Java Issue Type: Task Components: general/test Reporter: Robert Muir Attachments: LUCENE-4211.patch, LUCENE-4211.patch It would be nice to wrap with FIR here sometimes, one that returns AssertingFields, etc etc. This way we could check if consumers are doing bogus things (like reading nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its exhausted, or things like that). This would also be nice to catch tests that do this rather than doing crazy debugging over whats not really a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-1725. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 Committed revision 1360931. trunk Committed revision 1360952. 4x Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Assignee: Erik Hatcher Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413246#comment-13413246 ] Stefan Pohl commented on LUCENE-4100: - Thanks for the feedback! You're on spot with everything you're saying. Yes, the methods as suggested in the different papers have (semi-)static indexes in mind, that is, such that batch-index many new documents, then recompute maxscores (hence, IndexRewriter) and roll out the new version of the indexes. This is a Lucene use-case common to many large installations (or part thereof) and as such important. Moreover, this approach can easily be generalized to the other Similarities, without that they necessarily have to know about maxscore, and can be simplified by some minor API changes within Lucene. The PoC code as-is might be of help to showcase dependencies in general, and such that currently are not well supported within Lucene (because there was no need for it yet). If you really want to go the full distance: I already thought about doing maxscore live and got some ideas in this regard, see below. Comments to your thoughts: [PostingsWriter] You're right. For simplicity, I was computing each term's overall contribution (as explained in the talk), including all but query-dependent factors. You can consider this as un-quantized impacts (in the sense of Anh et al.) which necessitates a second pass over a static index, hence IndexRewriter. As a side note: I noticed a drop in the PKLookup benchmark, suggesting that it might be better not to extend the size of dictionary items, but to store maxscores in the beginning of inverted lists, or next to skip data. This effect should be smaller or disappear though when maxscores are not stored for many terms. [Length normalization] Yes, this might be a necessary dependency. It should be a general design-principle though to have as many as possible statistics at hand everywhere, as long as it doesn't hurt performance in terms of efficiency. [splitting impacts / incremental indexing] Yes, this would be more intrusive, requiring Similarity-dependent maxscore computations. Here is how it could work: Very exotic scoring functions simply don't have to support maxscore and will thus fall back to the current score-all behaviour. DefaultSimilarity is simple, but BM25 and LMDirichlet can't as easily be factored out, as you correctly point out, but we could come up with bounds for collection statistics (those that go into the score) within which it is safe to use maxscore, otherwise we fallback to score-all until a merge occurs, or we notify the user to better do a merge/optimize, or Lucene does a segment-rewrite with new maxscore and bound computations on basis of more current collection stats. I got first ideas for an algorithm to compute these bounds. [docfreq=1 treatment] Definitely agree. Possibly, terms with docfreq x=10? could not store a maxscore. x configurable and default to be evaluated; x should be stored in index so that it can be determined which terms don't contain maxscores. Having a special treatment for these terms (not considering them for exclusion within the algorithm) allows for easier exchange of the core of the algorithm to get the WAND algorithm, or also to ignore a maxscore for a term for which collection stats went out of bounds. [maxscores per posting ranges] +1. As indicated in the description, having multiple maxscores per term can be more efficient, possibly leading to tighter bounds and more skipping. Chakrabarti'11 opted for one extreme, computing a maxscore for each compressed posting block, whereas the optimal choice might have been a multiple of blocks, or a postings range not well aligned with block size. Optimal choice will be very dependent on skip list implementation and its parameters, but also posting de-compression overhead. The question is how to get access to this codec-dependent information inside of the scoring algorithm, tunneled through the TermQuery? [store 4 bytes per maxscore] Possible. As long as the next higher representable real number is stored (ceil, not floor), no docs will be missed and the algorithm remains correct. But because of more loose bounds the efficiency gain will be affected at some point with too few bits. If the score is anyway factored out, it might be better to simply store all document-dependent stats (TF, doclen) of the document with the maximum score contribution (as ints) instead of one aggregate intermediate float score contribution. [implementation inside codec] Please be aware that while terms are at some point excluded from merging, they still are advanced to the docs in other lists to gain complete document knowledge and compute exact scores. Maxscores can also be used to minimize how often this happens, but the gains are often compensated by the more complex scoring. Still having to skip
[jira] [Updated] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License
[ https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4217: -- Attachment: LUCENE-4217.patch Small patch improvements and correct mail extract in README. Load clover.jar from ivy-cachepath andy ship sources with License - Key: LUCENE-4217 URL: https://issues.apache.org/jira/browse/LUCENE-4217 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-4217.patch, LUCENE-4217.patch When clover granted use the license for their clover-2.6.3.jar file they allowed us to ship this license file to every developer. Currently clover setup is very hard for users, so this issue will make it simple. If you want to run tests with clover, just pass -Drun.clover=true to ant clean test. ANT will then download clover via IVY and point it to the license file in our tools folder. The license is supplemented by the original mail from Atlassian, that everybody is allowed to use it with code in the org.apache. java package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl
[ https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4211: Attachment: LUCENE-4211.patch OK i got past the insanity issue: thankfully Uwe already had a reusable hack in place in LuceneTestCase. But then I added AssertingCodec/AssertingPostingsFormat that use these checks, and started running 'ant test -Dtests.codec=Asserting' and there are some bugs in tests I think in LuceneTestCase.maybeWrapReader: add an asserting impl Key: LUCENE-4211 URL: https://issues.apache.org/jira/browse/LUCENE-4211 Project: Lucene - Java Issue Type: Task Components: general/test Reporter: Robert Muir Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch It would be nice to wrap with FIR here sometimes, one that returns AssertingFields, etc etc. This way we could check if consumers are doing bogus things (like reading nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its exhausted, or things like that). This would also be nice to catch tests that do this rather than doing crazy debugging over whats not really a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl
[ https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413266#comment-13413266 ] Robert Muir commented on LUCENE-4211: - {noformat} [junit4:junit4] Suite: org.apache.lucene.index.TestDocsAndPositions [junit4:junit4] FAILURE 0.04s J3 | TestDocsAndPositions.testRandomPositions [junit4:junit4] Throwable #1: java.lang.AssertionError: nextDoc() called after iterator is exhausted! [junit4:junit4]at __randomizedtesting.SeedInfo.seed([35D7F00507FD5A9D:4BF3893054577754]:0) [junit4:junit4]at org.apache.lucene.index.AssertingAtomicReader$AssertingDocsAndPositionsEnum.nextDoc(AssertingAtomicReader.java:207) [junit4:junit4]at org.apache.lucene.index.TestDocsAndPositions.testRandomPositions(TestDocsAndPositions.java:178) [junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit4:junit4]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit4:junit4]at java.lang.reflect.Method.invoke(Method.java:597) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) [junit4:junit4]at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) [junit4:junit4]at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) [junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) [junit4:junit4]at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) [junit4:junit4]at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) [junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) [junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) [junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) [junit4:junit4]at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) [junit4:junit4]at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) [junit4:junit4]at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) [junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) [junit4:junit4]at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) [junit4:junit4]at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) [junit4:junit4]at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) [junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4]at
[jira] [Commented] (SOLR-3598) Provide option to allow aliased field to be included in query for EDisMax QParser
[ https://issues.apache.org/jira/browse/SOLR-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413268#comment-13413268 ] Jan Høydahl commented on SOLR-3598: --- Yes, {{f.fieldname.qf}} will wire up fieldname as a valid pseudo field to be queried, even if it does not exist in your index schema. Can you test it and report back if it solved your use case? Provide option to allow aliased field to be included in query for EDisMax QParser - Key: SOLR-3598 URL: https://issues.apache.org/jira/browse/SOLR-3598 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 3.6, 4.0-ALPHA Reporter: Jamie Johnson Priority: Minor Attachments: alias.patch I currently have a situation where I'd like the original field included in the query, for instance I have several fields with differing granularity, name, firstname and lastname. Some of my sources differentiate between these so I can fill out firstname and lastname, while others don't and I need to just place this information in the name field. When querying I'd like to be able to say name:Jamie and have it translated to name:Jamie first_name:Jamie last_name:Jamie. In order to do this it creates an alias cycle and the EDisMax Query parser throws an exception about it. Ideally there would be an option to include the original field as part of the query to support this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl
[ https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413280#comment-13413280 ] Robert Muir commented on LUCENE-4211: - Mike committed fixes already for both these seeds... thanks Mike! Continuing testing... in LuceneTestCase.maybeWrapReader: add an asserting impl Key: LUCENE-4211 URL: https://issues.apache.org/jira/browse/LUCENE-4211 Project: Lucene - Java Issue Type: Task Components: general/test Reporter: Robert Muir Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch It would be nice to wrap with FIR here sometimes, one that returns AssertingFields, etc etc. This way we could check if consumers are doing bogus things (like reading nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its exhausted, or things like that). This would also be nice to catch tests that do this rather than doing crazy debugging over whats not really a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SynonymFilter, FST, and Aho-Corasick algorithm
Some responses below: Mike McCandless http://blog.mikemccandless.com On Thu, Jul 12, 2012 at 11:08 AM, Smiley, David W. dsmi...@mitre.org wrote: Hello. I'm embarking on developing code similar to the SynonymFilter but which merely needs to record out of band to the analysis where there is matching text in the input tokens to the corpus in the FST. I'm calling this a keyword tagger in which I shove text through it and when it's done it tells me at what offsets there is a match to a corpus of keyword phrases, and to what keywords/phrases they were exactly. It doesn't have to inject or modify the token stream because the results of this are going elsewhere. Although, it would be a fine approach to only omit the tags as I call them as a way of consuming the results, but I'm not indexing them so it doesn't matter. I noticed the following TODOs at the start: // TODO: maybe we should resolve token - wordID then run // FST on wordIDs, for better perf? I intend on doing this since my matching keyword/phrases are often more than one word, and I expect this will save memory and be faster. Be sure to test this is really faster: you'll need to add a step to resolve word - id (eg via hashmap) which may net/net add cost because the FST can incrementally (quickly) determine a word doesn't exist with a given prefix. FST can also do better sharing (less RAM) of shared prefixes/suffixes. // TODO: a more efficient approach would be Aho/Corasick's // algorithm // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm // It improves over the current approach here // because it does not fully re-start matching at every // token. For example if one pattern is a b c x // and another is b c d and the input is a b c d, on // trying to parse a b c x but failing when you got to x, // rather than starting over again your really should // immediately recognize that b c d matches at the next // input. I suspect this won't matter that much in // practice, but it's possible on some set of synonyms it // will. We'd have to modify Aho/Corasick to enforce our // conflict resolving (eg greedy matching) because that algo // finds all matches. This really amounts to adding a .* // closure to the FST and then determinizing it. Could someone please clarify how the problem in the example above is to be fixed? At the end it states how to solve it, but I don't know how to do that and I'm not sure if there is anything more to it since after all if it's as easy as that last sentence sounds then it would have been done already ;-) The FSTs we create are not malleable so implementing what that crazy comment says would not be easy. However, there is a cool paper that Robert found: http://www.cis.uni-muenchen.de/people/Schulz/Pub/dictle5.ps That I think does not require heavily modifying the minimal FST (just augmenting it w/ additional arcs that you follow on failure to match). I think it's basically Aho Corasick, done as an FST (which eg you can then compose with other FSTs to compile a chain of replacements into a single FST ... at least this was my quick understanding). Still, I would first try the obvious approach (use FST the way SynFilter does) and see if it's fast enough. I think Aho Corasick only really matters if your patterns have high overlap after shifting (eg b and ab). Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl
[ https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4211: --- Attachment: LUCENE-4211.patch Patch, just adding checks to AssertingTermsEnum. Tests pass with it ... in LuceneTestCase.maybeWrapReader: add an asserting impl Key: LUCENE-4211 URL: https://issues.apache.org/jira/browse/LUCENE-4211 Project: Lucene - Java Issue Type: Task Components: general/test Reporter: Robert Muir Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch It would be nice to wrap with FIR here sometimes, one that returns AssertingFields, etc etc. This way we could check if consumers are doing bogus things (like reading nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its exhausted, or things like that). This would also be nice to catch tests that do this rather than doing crazy debugging over whats not really a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS
[ https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413310#comment-13413310 ] Jan Høydahl commented on SOLR-3613: --- bq. I also don't think we should force solr. for all the system properties. If someone ads the ability to optionally check for the webapp prefix, then I think we should still be free to use zkHost, collection.*, etc, in the examples/doc. Why not? It is consistent, short and concise. I was first thinking that the solr. prefix is better had as a convention rather than code? But say we do as you propose and add prefix logic so that given ${myProp:foo}, we'll look for: # {{solr.myProp}} # else look for {{myProp}} In this case we would need to change all literal {{solr.*}} props in all xml config files. I see two drawbacks with this approach; one is that the examples then promote the use of short form while we'd like to encourage use of namespaced form and the other is that if webapp XYZ sets {{myProp}}, and we have not explicitly set {{solr.myProp}} then Solr will pick up a faulty value for it. This last could very well happen for generic opts like the ${host:} currently defined in solr.xml. So I still think it is better to require a {{solr.}} prefix for all sys props and leave in the {{solr.}} prefix in config files as today. Another problematic one from solr.xml is this: hostPort=${jetty.port:}. It assumes Jetty as Java Application Server, and it feels awkward to say {{-Djetty.port=8080}} to tell SolrCloud that Tomcat is running on port 8080. Imagine an ops guy reading the Solr bootstrap script, scratching his head. If all we do is read the value and add +1000 to pick the port for our internal ZK, why not be explicit instead and have a {{solr.localZkPort}} prop? (No API to get the web containers port? In that case we could support relative values and default to value of +1000 which would behave as today, but less to specify on cmdLine). While in picky mode :-) I'd prefer {{zkRun}} to be {{solr.localZkRun}} to distinguish that this starts a *local* Zk as opposed to the remote one in {{zkHost}}. Also, the prop {{zkHost}} is misleading, in that it takes a list of host:port; perhaps {{solr.zkServers}} is more clear? {quote} bq. a thin HTTP layer around Lucene I've certainly never thought of Solr as that {quote} Well, not a pure HTTP layer, but still thin in as in the sense that Lucene does as much of the core features as possible Namespace Solr's JAVA OPTIONS - Key: SOLR-3613 URL: https://issues.apache.org/jira/browse/SOLR-3613 Project: Solr Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Jan Høydahl Fix For: 4.0 Solr being a web-app, should play nicely in a setting where users deploy it on a shared appServer. To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid name clashes and for clarity when reading your appserver startup script. We currently do that with most: {{solr.solr.home, solr.data.dir, solr.abortOnConfigurationError, solr.directoryFactory, solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we fail to do so. Before release of 4.0 we should make sure to clean this up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413312#comment-13413312 ] Michael McCandless commented on LUCENE-3892: Thanks Billy, I'll commit! One thing I noticed: I think we shouldn't separately read numBytes and the int header? Can't we do a single readVInt(), and that encodes numBytes as well as format (bit width and format, once we tie into oal.util.packed APIs)? Also, we shouldn't encode numInts at all, ie, this should be fixed for the whole segment, and not written per block. Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413314#comment-13413314 ] Michael McCandless commented on LUCENE-3892: I didn't commit lucene/core/src/java/org/apache/lucene/codecs/pfor/ForPostingsFormat.java -- your IDE had changed it to a wildcard import (I prefer we stick with individual imports). Was the numBits==0 case for all 0s not all 1s? We may want to have it mean all 1s instead? Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS
[ https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413316#comment-13413316 ] Mark Miller commented on SOLR-3613: --- bq. Another problematic one from solr.xml is this: hostPort=${jetty.port:}. It assumes Jetty as Java Application Server, Yup - another way that we have made the user experience better by assuming jetty. This is exactly what I meant - this keeps you from having to specify the port twice on the cmd line - silly when you should just be using jetty and we know the port. I have been optimizing for jetty for a while now. bq. it feels awkward to say -Djetty.port=8080 to tell SolrCloud that Tomcat is running on port 8080 They are free to change it - it's in solr.xml. I'd rather have our default system not be awkward than worry about Tomcat being awkward. This is exactly what I've been talking about. For too long we have been awkward for every thing rather than good for one thing. Namespace Solr's JAVA OPTIONS - Key: SOLR-3613 URL: https://issues.apache.org/jira/browse/SOLR-3613 Project: Solr Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Jan Høydahl Fix For: 4.0 Solr being a web-app, should play nicely in a setting where users deploy it on a shared appServer. To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid name clashes and for clarity when reading your appserver startup script. We currently do that with most: {{solr.solr.home, solr.data.dir, solr.abortOnConfigurationError, solr.directoryFactory, solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we fail to do so. Before release of 4.0 we should make sure to clean this up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS
[ https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413323#comment-13413323 ] Mark Miller commented on SOLR-3613: --- bq. I'd prefer {{zkRun}} to be {{solr.localZkRun}} to distinguish that this starts a *local* Zk as opposed to the remote one in {{zkHost}}. -0 - I like zkRun as it's short and sweet - your are running zk, or you are not and connecting to an external zk. I wouldn't fight very hard though. Yonik named it, I'll defer to you guys. bq. Also, the prop {{zkHost}} is misleading, in that it takes a list of host:port; perhaps {{solr.zkServers}} is more clear? The zk guys call it a connectString. I like zkHost because it's short, works fine with a single host url, and easily docable about using more, but again, not something I'm going to fight hard for. Personally, I liked the brevity of something like java -DzkRun -DzkHost start.jar and how that works for examples as compared to what we are getting to now: java -Dsolr.zkServers -Dsolr.localRunZk start.jar. Just starts to get dense fast. I also think doc is perfectly sufficient on top of the current names. Namespace Solr's JAVA OPTIONS - Key: SOLR-3613 URL: https://issues.apache.org/jira/browse/SOLR-3613 Project: Solr Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Jan Høydahl Fix For: 4.0 Solr being a web-app, should play nicely in a setting where users deploy it on a shared appServer. To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid name clashes and for clarity when reading your appserver startup script. We currently do that with most: {{solr.solr.home, solr.data.dir, solr.abortOnConfigurationError, solr.directoryFactory, solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we fail to do so. Before release of 4.0 we should make sure to clean this up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr posting question
what request handler are you using? csv? If you point to the /admin/dump handler, what do you get? http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/DumpRequestHandler.java If there is a problem with how this gets though, we will need to fix something in: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers#MultipartRequestParser ryan On Thu, Jul 12, 2012 at 1:17 PM, karl.wri...@nokia.com wrote: Hi all, I received a report of a problem with posting data to Solr. The post method is a multi-part form, so if you inspect it, it looks something like this: boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 abc;def;ghi ---boundary--- The problem is that, for form data, multiple values for an attribute are supposed to just be repeated form elements, e.g.: boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 abc;def;ghi ---boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 second value ---boundary--- What’s happening, though, when this is posted to Solr is that any semicolons in the data are being interpreted as multi-value separators. So when the above is posted, Solr apparently thinks that “metadata_attribute_name” has 4 values, “abc”, “def”, “ghi”, and “second value”, rather than two values, “abc;def;ghi” and “second value”. Is this intended behavior, and if so, how am I supposed to escape “;” characters when communicating to Solr in this way? Karl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring
[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413335#comment-13413335 ] Robert Muir commented on LUCENE-4100: - {quote} As a side note: I noticed a drop in the PKLookup benchmark, suggesting that it might be better not to extend the size of dictionary items, but to store maxscores in the beginning of inverted lists, or next to skip data. This effect should be smaller or disappear though when maxscores are not stored for many terms. {quote} I wouldn't worry about this, I noticed a few things that might speed that up: 1. Currently it does writeVInt(Float.floatToIntBits(term.maxscoreScore)) . But I think this should be writeInt, not writeVInt? So I think currently we often write 5 bytes here, with all the vint checks for each byte, and as an Int it would always be 4 and faster. 2. Yes, with low freq terms (e.g. docFreq skipMinimum), its probably best to just omit this at both read and write time. Then PK lookup would be fine. 3. As far as 4 bytes ceiling, my motivation there was not to save in the term dictionary, but instead to make these smaller and allow us to add these at regular intervals. We can take advantage of a few things, e.g. it should never be a negative number for a well-formed Similarity (i think that would screw up the algorithm looking at your tests anyway). {quote} DefaultSimilarity is simple, but BM25 and LMDirichlet can't as easily be factored out, as you correctly point out, but we could come up with bounds for collection statistics (those that go into the score) within which it is safe to use maxscore, otherwise we fallback to score-all until a merge occurs, or we notify the user to better do a merge/optimize, or Lucene does a segment-rewrite with new maxscore and bound computations on basis of more current collection stats. I got first ideas for an algorithm to compute these bounds. {quote} Ok, I'm not sure I totally see how the bounds computation can work, but if it can we might be ok in general. If the different segments are somewhat homogeneous then these stats should pretty much be very close anyway. The other idea i had was more intrusive, adding a computeImpact() etc to Similarity or whatever. {quote} If the score is anyway factored out, it might be better to simply store all document-dependent stats (TF, doclen) of the document with the maximum score contribution (as ints) instead of one aggregate intermediate float score contribution. {quote} That might be a good idea. with TF as a vint and doclen as a byte, we would typically only have two bytes but not actually lose any information (by default, all these sims encode doclen as a byte anyway). {quote} [implementation inside codec] Please be aware that while terms are at some point excluded from merging, they still are advanced to the docs in other lists to gain complete document knowledge and compute exact scores. Maxscores can also be used to minimize how often this happens, but the gains are often compensated by the more complex scoring. Still having to skip inside of excluded terms complicates your suggested implementation. But we definitely should consider architecture alternatives. The MaxscoreCollector, for instance, does currently only have a user interface function, keeping track of the top-k and their entry threshold could well be done inside the Maxscorer. I was thinking though to extend the MaxscoreCollector to provide different scoring information, e.g. an approximation of the number of hits next to the actual number of scored documents (currently totalHits). {quote} My current line of thinking is even crazier, but I don't yet have anything close to a plan. As a start, I do think that IndexSearcher.search() methods should take a Score Mode of sorts from the user (some enum), which would allow Lucene to do less work if its not necessary. We would pass this down via Weight.scorer() as a parameter... solely looking at the search side I think this would open up opportunities in general for us to optimize things: e.g. instantiate the appropriate Collector impl, and for Weights to create the most optimal Scorers. Not yet sure how it would tie into the code API. I started hacking up on a prototype that looks like this (I might have tried to refactor too hard also shoving the Sort options in here...) {noformat} /** * Different modes of search. */ public enum ScoreMode { /** * No guarantees that the ranking is correct, * the results may come back in a different order than if all * documents were actually scored. Total hit count may be * unavailable or approximate. */ APPROXIMATE, /** * Ranking is the same as {@link COMPLETE}, but total hit * count may be unavailable or approximate. */ SAFE, /** * Guarantees complete iteration over all documents, but scores * may be unavailable. */
[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413340#comment-13413340 ] Mark Miller commented on SOLR-3488: --- Commit improved tests and reload command in a moment. Another thing we will need before too long is a way to get a response I think? Right now, the client can't learn of the success or failure of the command. It's just int he Overseers logs. To get notified, I suppose the call would have to block and then get a result from the overseer. I suppose that could be done by something like: create a new emphemeral node for each job - client watches the node - when overseer is done, it sets the result as data on the node - client gets a watch notify and reads the result? Then how to clean up? Not sure about the idea overall, brainstorming ... don't see a simple way to have the over seer do the work in an async fashion and have the client easily get the results of that. Create a Collections API for SolrCloud -- Key: SOLR-3488 URL: https://issues.apache.org/jira/browse/SOLR-3488 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr posting question
: I received a report of a problem with posting data to Solr. The post : method is a multi-part form, so if you inspect it, it looks something : like this: ... : What's happening, though, when this is posted to Solr is that any : semicolons in the data are being interpreted as multi-value separators. : So when the above is posted, Solr apparently thinks that : metadata_attribute_name has 4 values, abc, def, ghi, and second : value, rather than two values, abc;def;ghi and second value. karl: can you be more specific about how exactly someone can recreate the the problem? specificly: where do you see 4 values? Using the HTML form below, and submitting to nc -l 8983 i was able to recreate nearly exactly the MIME content you mentioned. when i killed nc, ran the solr example in it's place, and resubmited the form, the echoParams output showed me that solr was recognizing the expected two values for metadata_attribute_name ... html headtitleTest of form data/title/head body form method=POST action=http://localhost:8983/solr/collection1/select; enctype=multipart/form-data input type=text name=q value=solr / input type=text name=echoParams value=all / input type=text name=metadata_attribute_name value=abc;def;ghi / input type=text name=metadata_attribute_name value=second value / input type=submit/ /form /body /html ... arr name=metadata_attribute_namestrabc;def;ghi/strstrsecond value/str/arr -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 14854 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/14854/ 1 tests failed. REGRESSION: org.apache.solr.handler.dataimport.TestSqlEntityProcessorDeltaPrefixedPk.testDeltaImport_replace_resolvesUnprefixedPk Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([46265CE1240EBBF6:44E12C48F69BB57A]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:487) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:454) at org.apache.solr.handler.dataimport.TestSqlEntityProcessorDeltaPrefixedPk.testDeltaImport_replace_resolvesUnprefixedPk(TestSqlEntityProcessorDeltaPrefixedPk.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0'] xml response was: ?xml version=1.0 encoding=UTF-8? response lst
RE: Solr posting question
I'll need to ask the reporter for more details since it appears the answer is not simple. It may even be an app server issue. Thanks Karl Sent from my Windows Phone -Original Message- From: ext Chris Hostetter Sent: 7/12/2012 8:29 PM To: dev@lucene.apache.org Subject: Re: Solr posting question - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org