date:20120712

Does it mean we can remove that additional mirror from ivy settings?

Dawid

-- Forwarded message --
From: Niclas Hedhman nic...@hedhman.org
Date: Thu, Jul 12, 2012 at 3:20 AM
Subject: Re: Maven Central is probably blocked in China
To: Brian Fox bri...@infinity.nu
Cc: Maven Developers List d...@maven.apache.org

Yes.
repo1
repo2
central
central01
central02

all resolve now to hosts that are not blocked. Well done!!

Cheers
Niclas

On Thu, Jul 12, 2012 at 6:20 AM, Brian Fox bri...@infinity.nu wrote:
 Niclas, I'm told it's working now. Can you confirm?

 On Tue, Jul 10, 2012 at 1:11 PM, Brian Fox bri...@infinity.nu wrote:

 The network team confirmed that this is only Unicom with the issue. They
 are looking at alternate routes that would hopefully work.

 On Mon, Jul 9, 2012 at 5:31 PM, Niclas Hedhman nic...@hedhman.org wrote:

 Ok, good to know that it is not completely blocked. It is likely that
 there are multiple FirewallOps across China regions and the (I think)
 3 ISPs (China Telecom, China Mobile and Unicom).

 As I mentioned, the edgecast address couldn't be reached, but the
 akamai one could.

 I am personally in downtown Shanghai, using China Unicom's Fiber To
 The Building.

 I am seen as 58.246.154.81 from the outside at the moment, can reach
 your a978.g1.akamai.net, but not wpc.829D.edgecastcdn.net.

 I can also VPN to Beijing, to a 163.com datacenter (which I think is a
 China Telecom subsidiary), having IP number 60.191.221.179. From
 there, both hosts above are reachable.

 So, yes, it seems to be regionalized or per ISP (which makes this less
 of a problem than I thought). I also mentioned that I am personally on
 VPN and I am not really affected, but developers I have met are not
 willing to pay for that service and don't have it.

 Cheers
 Niclas

 On Tue, Jul 10, 2012 at 1:58 AM, Brian Fox bri...@infinity.nu wrote:
  Niclas,
  We are seeing a lot of traffic to Central from China, so this certainly
  isn't a case of the Great Firewall blocking everything, rather it seems
  a
  little more localized. Can you send more more info about your source ip
  and
  geo location that we could use to see what's up? Possibly we can get
  the
  traffic routed to a China friendly ip.

  On Mon, Jul 9, 2012 at 12:08 PM, Brian E. Fox bri...@infinity.nu
  wrote:

  Hi Nicolas, this isn't intentional of course. Let me see what I can
  dig up
  based in your traces.

  --Brian (mobile)

  On Jul 7, 2012, at 11:45 PM, Niclas Hedhman nic...@hedhman.org
  wrote:

   (I am not subscribed, so please CC me on any responses)

   I live in China. I normally have a VPN enabled to circumvent various
   blocking (YouTube, Twitter, ++) that the Chinese government has in
   place. I normally don't think much about it. But, today I had my
   computer rebooted and couldn't build a project, because Maven
   Central
   couldn't be reached.

   So, before I realized that my VPN wasn't running I tracerouted a
   bit.

   repo1 resolved to

   niclas:~ niclas$ dig repo1.maven.org | grep ^[a-z]
   repo1.maven.org.1751INCNAMEcentral.maven.org.
   central.maven.org.212INCNAMEcentral02.maven.org.
   central02.maven.org.7112INCNAME
   wpc.829D.edgecastcdn.net.
   wpc.829D.edgecastcdn.net. 3164INCNAME
   gs1.wpc.edgecastcdn.net.
   gs1.wpc.edgecastcdn.net. 2292INA68.232.45.253

   and from that I also tried central01

   niclas:~ niclas$ dig central01.maven.org | grep ^[a-z]
   central01.maven.org.6477INCNAME
   central01.maven.org.edgesuite.net.
   central01.maven.org.edgesuite.net. 20877 IN CNAME
   a978.g1.akamai.net.
   a978.g1.akamai.net.4INA124.40.42.31
   a978.g1.akamai.net.4INA124.40.42.6

   And with tracerouting both (see below), it struck me that VPN might
   not be enabled and the IP on edgecastcdn.net is probably blocked
   by
   China potentially serving something they don't like, could be
   anything... Yeah, China is BAD, we all know that, but shouldn't we
   (Apache) try to minimize the problem for your ordinary Chinese
   developer, could be a student, hobbyist, small entrepreneur and so
   on,
   who isn't anti-government (most people here are quite content with
   the
   government) to be able to use Apache projects?

   The fact is now, that without reasonably reliable access to Maven
   Central, one can not really participate in many, many of the Java
   projects at ASF.

   I don't know how the DNS and host resolution is supposed to work,
   who
   is participating in the hosting and under what terms. But I think
   Maven/Sonatype should have in its interest to NOT EXCLUDE some
   staggering amount of Java programmers, and perhaps try to find a way
   to get a better SLA here. If you need help from someone to check
   from
   the inside the Great Firewall, just let me know...

   Cheers
   Niclas

   traceroute to gs1.wpc.edgecastcdn.net

[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task

2012-07-12 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412568#comment-13412568
 ] 

Dawid Weiss commented on LUCENE-3950:
-

{code}
+typedef resource=org/apache/rat/anttasks/antlib.xml 
uri=antlib:org.apache.rat.anttasks
   classpath
-fileset dir=. includes=rat*.jar/
+fileset dir=${common.dir}/tools/lib includes=apache-rat-0.8.jar/
   /classpath
 /typedef
{code}

I don't like this duplication of version numbers in ivy and ant files. I think 
it'd be nicer to use ivy's fileset or path to resolve these JARs if they're not 
part of the distribution? 

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-07-12 Thread Ibrahim (JIRA)

Ibrahim created LUCENE-4216:
---

 Summary: Token X exceeds length of provided text sized X
 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim


I'm facing this exception:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
exceeds length of provided text sized 170
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
at classes.myApp$16$1.run(myApp.java:1508)


I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 
4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. 
LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering 
this here to see if there is really a bug or it is something wrong in my code 
with v4. The code that im using:

final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font 
color=red, /font), new QueryScorer(query));
...
final TokenStream tokenStream = 
TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
analyzer);
final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
doc.get(Line), false, 10);


Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS

[
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412603#comment-13412603
]

Yonik Seeley commented on SOLR-3613:

bq. I'm not talking about precluding running as a webapp [...] so I'm going to
pimp [jetty] it out

I also don't think we should force solr. for all the system properties. If
someone ads the ability to optionally check for the webapp prefix, then I think
we should still be free to use zkHost, collection.*, etc, in the examples/doc.

bq. a thin HTTP layer around Lucene

I've certainly never thought of Solr as that. Solr had faceting, numerics,
etc, years before Lucene. Solr is about being a practical useful search
server... and lately more morphing into a NoSQL server with first-class
full-text search.

Namespace Solr's JAVA OPTIONS
-

Key: SOLR-3613
URL: https://issues.apache.org/jira/browse/SOLR-3613
Project: Solr
Issue Type: Improvement
Affects Versions: 4.0-ALPHA
Reporter: Jan Høydahl
Fix For: 4.0

Solr being a web-app, should play nicely in a setting where users deploy it
on a shared appServer.
To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid
name clashes and for clarity when reading your appserver startup script. We
currently do that with most: {{solr.solr.home, solr.data.dir,
solr.abortOnConfigurationError, solr.directoryFactory,
solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we
fail to do so.
Before release of 4.0 we should make sure to clean this up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task


[ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412605#comment-13412605
 ] 

Uwe Schindler commented on LUCENE-3950:
---

I had the same problem with this commit, but I remember that Robert said, there 
was actually a problem with RAT running from ivy:cachepath/. I would also 
really prefer to have this one only in cache, as we dont ship with this tool, 
so we dont have to take care of license,... We use all tasks from cachepatch 
(pegdown for converting markdown-HTML, cpptasks,...).

Side note: I am thinking about adding clover, too. The required license file 
can be shipped together with our src package in the tools directory (Atlassian 
allowed this to the ASF, because the license only allows to check org.apache.* 
packages) and clover-2.6.1.jar can be downloaded via Ivy.

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task

2012-07-12 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412611#comment-13412611
 ] 

Dawid Weiss commented on LUCENE-3950:
-

 but I remember that Robert said, there was actually a problem with RAT 
 running from ivy:cachepath/

Robert you recall what was that problem?

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-3950) load rat via ivy for rat-sources task


 [ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-3950:
---

  Assignee: Uwe Schindler

Hi, I reopen, as it works with cachepath. No fckd up lib folder with tools we 
dont need for compile. It is now behaving identical to cpptasks, junit, 
pegdown, maven-ant-tasks and all other build-tools. No License checks required.

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen updated SOLR-3524:
-

Attachment: SOLR-3524.patch

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


[ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412627#comment-13412627
 ] 

Christian Moen commented on SOLR-3524:
--

Patch updated due to recent configuration changes.

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


[ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412628#comment-13412628
 ] 

Christian Moen commented on SOLR-3524:
--

Committed revision 1360592 on {{trunk}}

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3950) load rat via ivy for rat-sources task


 [ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3950:
--

Attachment: LUCENE-3950-cachepath.patch

Patch. Works fine on different machines. I have no RAT in my .lib folder, maybe 
that was Robert's problem (conflict with cachepath)?

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950-cachepath.patch, LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3614) XML parsing in XPathEntityProcessor doesn't respect ENTITY declarations?

2012-07-12 Thread Thomas Beckers (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412639#comment-13412639
 ] 

Thomas Beckers commented on SOLR-3614:
--

I guess this behaviour was introduced with a fix for SOLR-964.

 XML parsing in XPathEntityProcessor doesn't respect ENTITY declarations?
 

 Key: SOLR-3614
 URL: https://issues.apache.org/jira/browse/SOLR-3614
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-3614.patch


 As reported by Michael Belenki on solr-user, pointing XPathEntityProcessor at 
 XML files that use DTD ENTITY declarations causes XML parse errors of the 
 form...
 {noformat}
 org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
 for xml, url:testdata.xml rows processed:0
 ...
 Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: 
 Undeclared general entity uuml
 ...
 {noformat}
 ...even when the entity is specifically declared.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


[ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412659#comment-13412659
 ] 

Christian Moen commented on SOLR-3524:
--

Committed revision 1360613 on {{branch_4x}}

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen resolved SOLR-3524.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.0

Thanks, Kazu and Ohtani-san!

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License

Uwe Schindler created LUCENE-4217:
-

 Summary: Load clover.jar from ivy-cachepath andy ship sources with 
License
 Key: LUCENE-4217
 URL: https://issues.apache.org/jira/browse/LUCENE-4217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0


When clover granted use the license for their clover-2.6.3.jar file they 
allowed us to ship this license file to every developer. Currently clover setup 
is very hard for users, so this issue will make it simple.

If you want to run tests with clover, just pass -Drun.clover=true to ant clean 
test. ANT will then download clover via IVY and point it to the license file in 
our tools folder. The license is supplemented by the original mail from 
Atlassian, that everybody is allowed to use it with code in the org.apache. 
java package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2566) + - operators allow any amount of whitespace

2012-07-12 Thread Karsten R. (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412673#comment-13412673
 ] 

Karsten R. commented on LUCENE-2566:


Should StandardQueryParser work like QueryParser?
In current branches and trunk TestQPHelper still contains the line
assertQueryEquals(a OR ! b, null, a -b);
(and also a - b is parsed as a -b)

 + - operators allow any amount of whitespace
 

 Key: LUCENE-2566
 URL: https://issues.apache.org/jira/browse/LUCENE-2566
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 3.6
Reporter: Yonik Seeley
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0-ALPHA, 3.6.1

 Attachments: LUCENE-2566-3x.patch, LUCENE-2566.patch


 As an example, (foo - bar) is treated like (foo -bar).
 It seems like for +- to be treated as unary operators, they should be 
 immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3950) load rat via ivy for rat-sources task


[ 
https://issues.apache.org/jira/browse/LUCENE-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412674#comment-13412674
 ] 

Robert Muir commented on LUCENE-3950:
-

{quote}
Robert you recall what was that problem?
{quote}

I think the problem was i tried to use the fine grained maven artifacts 
(rat-core + rat-tasks)

using the big 'rat' jar with all its dependencies in one thing works great, and 
if it works on cachepath, even better.

i dont care about actual jars, just that the task works :)

 load rat via ivy for rat-sources task
 -

 Key: LUCENE-3950
 URL: https://issues.apache.org/jira/browse/LUCENE-3950
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-3950-cachepath.patch, LUCENE-3950.patch


 we now fail the build on rat problems (LUCENE-1866),
 so we should make it easy to run rat-sources for people
 to test locally (it takes like 3 seconds total for the whole trunk)
 Also this is safer than putting rat in your ~/.ant/lib because that 
 adds some classes from commons to your ant classpath (which we currently
 wrongly use in compile).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
 Author: uschindler
 Date: Thu Jul 12 10:34:11 2012
 New Revision: 1360619

 URL: http://svn.apache.org/viewvc?rev=1360619view=rev
 Log:
 LUCENE-3950: Use ivy.cachepath for Apache RAT

 Removed:
 lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
 lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
 Modified:
 lucene/dev/trunk/lucene/build.xml
 lucene/dev/trunk/lucene/common-build.xml
 lucene/dev/trunk/lucene/tools/ivy.xml

 Modified: lucene/dev/trunk/lucene/build.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=1360619r1=1360618r2=1360619view=diff
 ==
 --- lucene/dev/trunk/lucene/build.xml (original)
 +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
 @@ -227,7 +227,7 @@
  /forbidden-apis
/target

 -  target name=resolve depends=resolve-tools
 +  target name=resolve
  sequential
ant dir=test-framework target=resolve inheritall=false
   propertyset refid=uptodate.and.compiled.properties/


This part of the commit is a bug, it should go back to depending upon
resolve-tools (or please remove the ASM.jar!!!)

Its easy to see the bug, try 'ant jar-checksums' from top-level and
watch what happens.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Replication and proxy settings

2012-07-12 Thread Gautier Koscielny

Hello,

 I work near Cambridge at the EMBL-EBI and I would like to contribute to SOLR.

Our current project involved 2 teams developing a full text search service 
based on SOLR 3.6.

We have had issues when trying to replicate a master copy of an index to a 
slave using HTTP proxy settings passed to the JRE.

To solve this issue, I've created a copy of the SnapPuller and the 
ReplicationHandler, modified the code to manage
proxy settings and modified the configuration of the SOLR slave to use this new 
handler.
We have tested the replication using proxy settings in our environment with 
success.

What I would like to do now is to apply these changes to the SnapPuller 
directly:
-  check proxy settings before creating the httpClient
-  apply the proxy setting to the httpClient HostConfiguration.

I don't want to patch 3.6 and would like to apply the change to SOLR 4.

Please, tell me what do you think about this change and if I can proceed.

What is the usual procedure to commit code? 

Best regards,
Gautier

RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

You committed this without documenting it anywhere. Sorry. If you want this 
fixed, open issue and commit it separately. But not together with unrelated 
stuff.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:40 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
 On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
  Author: uschindler
  Date: Thu Jul 12 10:34:11 2012
  New Revision: 1360619
 
  URL: http://svn.apache.org/viewvc?rev=1360619view=rev
  Log:
  LUCENE-3950: Use ivy.cachepath for Apache RAT
 
  Removed:
  lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
  lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
  lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
  Modified:
  lucene/dev/trunk/lucene/build.xml
  lucene/dev/trunk/lucene/common-build.xml
  lucene/dev/trunk/lucene/tools/ivy.xml
 
  Modified: lucene/dev/trunk/lucene/build.xml
  URL:
  http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=136
  0619r1=1360618r2=1360619view=diff
 
 
 ==
  
  --- lucene/dev/trunk/lucene/build.xml (original)
  +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
  @@ -227,7 +227,7 @@
   /forbidden-apis
 /target
 
  -  target name=resolve depends=resolve-tools
  +  target name=resolve
   sequential
 ant dir=test-framework target=resolve inheritall=false
propertyset refid=uptodate.and.compiled.properties/
 
 
 This part of the commit is a bug, it should go back to depending upon resolve-
 tools (or please remove the ASM.jar!!!)
 
 Its easy to see the bug, try 'ant jar-checksums' from top-level and watch what
 happens.
 
 --
 lucidimagination.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Well, it was pretty related.

I needed to add the checksums to commit this, or 'ant validate' would
fail and jenkins would have been very angry!

So i had to fix jar-checksums in order to commit!

On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote:
 You committed this without documenting it anywhere. Sorry. If you want this 
 fixed, open issue and commit it separately. But not together with unrelated 
 stuff.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:40 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

 On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
  Author: uschindler
  Date: Thu Jul 12 10:34:11 2012
  New Revision: 1360619
 
  URL: http://svn.apache.org/viewvc?rev=1360619view=rev
  Log:
  LUCENE-3950: Use ivy.cachepath for Apache RAT
 
  Removed:
  lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
  lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
  lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
  Modified:
  lucene/dev/trunk/lucene/build.xml
  lucene/dev/trunk/lucene/common-build.xml
  lucene/dev/trunk/lucene/tools/ivy.xml
 
  Modified: lucene/dev/trunk/lucene/build.xml
  URL:
  http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=136
  0619r1=1360618r2=1360619view=diff
 
 
 ==
  
  --- lucene/dev/trunk/lucene/build.xml (original)
  +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
  @@ -227,7 +227,7 @@
   /forbidden-apis
 /target
 
  -  target name=resolve depends=resolve-tools
  +  target name=resolve
   sequential
 ant dir=test-framework target=resolve inheritall=false
propertyset refid=uptodate.and.compiled.properties/
 

 This part of the commit is a bug, it should go back to depending upon 
 resolve-
 tools (or please remove the ASM.jar!!!)

 Its easy to see the bug, try 'ant jar-checksums' from top-level and watch 
 what
 happens.

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Nat validate works form me. There is a checksum for asm so where is the problem?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:49 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
 Well, it was pretty related.
 
 I needed to add the checksums to commit this, or 'ant validate' would fail and
 jenkins would have been very angry!
 
 So i had to fix jar-checksums in order to commit!
 
 On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote:
  You committed this without documenting it anywhere. Sorry. If you want this
 fixed, open issue and commit it separately. But not together with unrelated
 stuff.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 12:40 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
   Author: uschindler
   Date: Thu Jul 12 10:34:11 2012
   New Revision: 1360619
  
   URL: http://svn.apache.org/viewvc?rev=1360619view=rev
   Log:
   LUCENE-3950: Use ivy.cachepath for Apache RAT
  
   Removed:
   lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
   lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
   lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
   Modified:
   lucene/dev/trunk/lucene/build.xml
   lucene/dev/trunk/lucene/common-build.xml
   lucene/dev/trunk/lucene/tools/ivy.xml
  
   Modified: lucene/dev/trunk/lucene/build.xml
   URL:
   http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=
   136 0619r1=1360618r2=1360619view=diff
  
 
 
  ==
   
   --- lucene/dev/trunk/lucene/build.xml (original)
   +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
   @@ -227,7 +227,7 @@
/forbidden-apis
  /target
  
   -  target name=resolve depends=resolve-tools
   +  target name=resolve
sequential
  ant dir=test-framework target=resolve inheritall=false
 propertyset refid=uptodate.and.compiled.properties/
  
 
  This part of the commit is a bug, it should go back to depending upon
  resolve- tools (or please remove the ASM.jar!!!)
 
  Its easy to see the bug, try 'ant jar-checksums' from top-level and
  watch what happens.
 
  --
  lucidimagination.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 --
 lucidimagination.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

again, please run 'ant jar-checksums'

you will see the problem.

On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote:
 Nat validate works form me. There is a checksum for asm so where is the 
 problem?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:49 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

 Well, it was pretty related.

 I needed to add the checksums to commit this, or 'ant validate' would fail 
 and
 jenkins would have been very angry!

 So i had to fix jar-checksums in order to commit!

 On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote:
  You committed this without documenting it anywhere. Sorry. If you want this
 fixed, open issue and commit it separately. But not together with unrelated
 stuff.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 12:40 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
   Author: uschindler
   Date: Thu Jul 12 10:34:11 2012
   New Revision: 1360619
  
   URL: http://svn.apache.org/viewvc?rev=1360619view=rev
   Log:
   LUCENE-3950: Use ivy.cachepath for Apache RAT
  
   Removed:
   lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
   lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
   lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
   Modified:
   lucene/dev/trunk/lucene/build.xml
   lucene/dev/trunk/lucene/common-build.xml
   lucene/dev/trunk/lucene/tools/ivy.xml
  
   Modified: lucene/dev/trunk/lucene/build.xml
   URL:
   http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=
   136 0619r1=1360618r2=1360619view=diff
  
 
 
  ==
   
   --- lucene/dev/trunk/lucene/build.xml (original)
   +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
   @@ -227,7 +227,7 @@
/forbidden-apis
  /target
  
   -  target name=resolve depends=resolve-tools
   +  target name=resolve
sequential
  ant dir=test-framework target=resolve inheritall=false
 propertyset refid=uptodate.and.compiled.properties/
  
 
  This part of the commit is a bug, it should go back to depending upon
  resolve- tools (or please remove the ASM.jar!!!)
 
  Its easy to see the bug, try 'ant jar-checksums' from top-level and
  watch what happens.
 
  --
  lucidimagination.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 



 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3524) Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory


[ 
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412685#comment-13412685
 ] 

Christian Moen commented on SOLR-3524:
--

{{CHANGES.txt}} for some reason didn't make it into {{branch_4x}}.  Fixed this 
in revision 1360622.

 Make discard-punctuation feature in Kuromoji configurable from 
 JapaneseTokenizerFactory
 ---

 Key: SOLR-3524
 URL: https://issues.apache.org/jira/browse/SOLR-3524
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6
Reporter: Kazuaki Hiraga
Assignee: Christian Moen
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3524.patch, SOLR-3524.patch, 
 kuromoji_discard_punctuation.patch.txt


 JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve 
 punctuation in Japanese text, although It has a parameter to change this 
 behavior.  JapaneseTokenizerFactory always set third parameter, which 
 controls this behavior, to true to remove punctuation.
 I would like to have an option I can configure this behavior by fieldtype 
 definition in schema.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

What does this f*cking task do? These checksums are a no-go for me. I hate them 
and please remove them completely! It took me a hour on the weekend to get this 
shitty task working!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:55 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
 again, please run 'ant jar-checksums'
 
 you will see the problem.
 
 On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote:
  Nat validate works form me. There is a checksum for asm so where is the
 problem?
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 12:49 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  Well, it was pretty related.
 
  I needed to add the checksums to commit this, or 'ant validate' would
  fail and jenkins would have been very angry!
 
  So i had to fix jar-checksums in order to commit!
 
  On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote:
   You committed this without documenting it anywhere. Sorry. If you
   want this
  fixed, open issue and commit it separately. But not together with
  unrelated stuff.
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 12:40 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
Author: uschindler
Date: Thu Jul 12 10:34:11 2012
New Revision: 1360619
   
URL: http://svn.apache.org/viewvc?rev=1360619view=rev
Log:
LUCENE-3950: Use ivy.cachepath for Apache RAT
   
Removed:
lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
Modified:
lucene/dev/trunk/lucene/build.xml
lucene/dev/trunk/lucene/common-build.xml
lucene/dev/trunk/lucene/tools/ivy.xml
   
Modified: lucene/dev/trunk/lucene/build.xml
URL:
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=
136 0619r1=1360618r2=1360619view=diff
   
  
 
 
   ==

--- lucene/dev/trunk/lucene/build.xml (original)
+++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
@@ -227,7 +227,7 @@
 /forbidden-apis
   /target
   
-  target name=resolve depends=resolve-tools
+  target name=resolve
 sequential
   ant dir=test-framework target=resolve inheritall=false
  propertyset refid=uptodate.and.compiled.properties/
   
  
   This part of the commit is a bug, it should go back to depending upon
   resolve- tools (or please remove the ASM.jar!!!)
  
   Its easy to see the bug, try 'ant jar-checksums' from top-level and
   watch what happens.
  
   --
   lucidimagination.com
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
 
 
 
  --
  lucidimagination.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
  commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 --
 lucidimagination.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

ok but currently they are required, if you add a 3rd party jar and
dont add a checksum for it, the build fails.

so if we add back the dependency to resolve-tools for top-level lucene
resolve (build.xml, only invoked a single time), then it all works.

-  target name=resolve depends=resolve-tools
+  target name=resolve

Otherwise, jar-checksums task will fail, because it will remove the
asm jar's checksum.

On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote:
 What does this f*cking task do? These checksums are a no-go for me. I hate 
 them and please remove them completely! It took me a hour on the weekend to 
 get this shitty task working!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:55 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

 again, please run 'ant jar-checksums'

 you will see the problem.

 On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote:
  Nat validate works form me. There is a checksum for asm so where is the
 problem?
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 12:49 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  Well, it was pretty related.
 
  I needed to add the checksums to commit this, or 'ant validate' would
  fail and jenkins would have been very angry!
 
  So i had to fix jar-checksums in order to commit!
 
  On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de wrote:
   You committed this without documenting it anywhere. Sorry. If you
   want this
  fixed, open issue and commit it separately. But not together with
  unrelated stuff.
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 12:40 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
Author: uschindler
Date: Thu Jul 12 10:34:11 2012
New Revision: 1360619
   
URL: http://svn.apache.org/viewvc?rev=1360619view=rev
Log:
LUCENE-3950: Use ivy.cachepath for Apache RAT
   
Removed:
lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
Modified:
lucene/dev/trunk/lucene/build.xml
lucene/dev/trunk/lucene/common-build.xml
lucene/dev/trunk/lucene/tools/ivy.xml
   
Modified: lucene/dev/trunk/lucene/build.xml
URL:
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=
136 0619r1=1360618r2=1360619view=diff
   
  
 
 
   ==

--- lucene/dev/trunk/lucene/build.xml (original)
+++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
@@ -227,7 +227,7 @@
 /forbidden-apis
   /target
   
-  target name=resolve depends=resolve-tools
+  target name=resolve
 sequential
   ant dir=test-framework target=resolve inheritall=false
  propertyset refid=uptodate.and.compiled.properties/
   
  
   This part of the commit is a bug, it should go back to depending upon
   resolve- tools (or please remove the ASM.jar!!!)
  
   Its easy to see the bug, try 'ant jar-checksums' from top-level and
   watch what happens.
  
   --
   lucidimagination.com
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
 
 
 
  --
  lucidimagination.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
  commands, e-mail:

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Right, ever since then the jar-checksums task has not worked correctly.

I dont know how you added a checksum, maybe with 'openssl sha1'
yourself manually?

But i needed to add a checksum for the commit to succeed, so i had to
fix this task.

On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote:
 What is different to Sunday afternoon? I added asm-all-debug.jar with a 
 checksum generated by my local windows tools and it worked? I don’t care 
 about this ant task.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 1:03 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

 ok but currently they are required, if you add a 3rd party jar and dont add a
 checksum for it, the build fails.

 so if we add back the dependency to resolve-tools for top-level lucene 
 resolve
 (build.xml, only invoked a single time), then it all works.

 -  target name=resolve depends=resolve-tools
 +  target name=resolve

 Otherwise, jar-checksums task will fail, because it will remove the asm jar's
 checksum.

 On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote:
  What does this f*cking task do? These checksums are a no-go for me. I hate
 them and please remove them completely! It took me a hour on the weekend
 to get this shitty task working!
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 12:55 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  again, please run 'ant jar-checksums'
 
  you will see the problem.
 
  On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de wrote:
   Nat validate works form me. There is a checksum for asm so where is
   the
  problem?
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 12:49 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   Well, it was pretty related.
  
   I needed to add the checksums to commit this, or 'ant validate' would
   fail and jenkins would have been very angry!
  
   So i had to fix jar-checksums in order to commit!
  
   On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de
 wrote:
You committed this without documenting it anywhere. Sorry. If you
want this
   fixed, open issue and commit it separately. But not together with
   unrelated stuff.
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, July 12, 2012 12:40 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
build.xml common-build.xml tools/ivy.xml
tools/lib/apache-rat-0.8.jar.sha1
tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
   
On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
 Author: uschindler
 Date: Thu Jul 12 10:34:11 2012
 New Revision: 1360619

 URL: http://svn.apache.org/viewvc?rev=1360619view=rev
 Log:
 LUCENE-3950: Use ivy.cachepath for Apache RAT

 Removed:
 lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1
 lucene/dev/trunk/lucene/tools/lib/apache-rat-LICENSE-ASL.txt
 lucene/dev/trunk/lucene/tools/lib/apache-rat-NOTICE.txt
 Modified:
 lucene/dev/trunk/lucene/build.xml
 lucene/dev/trunk/lucene/common-build.xml
 lucene/dev/trunk/lucene/tools/ivy.xml

 Modified: lucene/dev/trunk/lucene/build.xml
 URL:

 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=
 136 0619r1=1360618r2=1360619view=diff

   
  
 
 
==
 
 --- lucene/dev/trunk/lucene/build.xml (original)
 +++ lucene/dev/trunk/lucene/build.xml Thu Jul 12 10:34:11 2012
 @@ -227,7 +227,7 @@
  /forbidden-apis
/target

 -  target name=resolve

RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Aha - I have no idea what you are talking about. Fix it, for me it works! I 
refuse to add endless chains of depends everywhere just to get a stupid tool xy 
working in compination yx on foobar.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 1:07 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
 Right, ever since then the jar-checksums task has not worked correctly.
 
 I dont know how you added a checksum, maybe with 'openssl sha1'
 yourself manually?
 
 But i needed to add a checksum for the commit to succeed, so i had to fix this
 task.
 
 On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote:
  What is different to Sunday afternoon? I added asm-all-debug.jar with a
 checksum generated by my local windows tools and it worked? I don’t care
 about this ant task.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 1:03 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  ok but currently they are required, if you add a 3rd party jar and
  dont add a checksum for it, the build fails.
 
  so if we add back the dependency to resolve-tools for top-level
  lucene resolve (build.xml, only invoked a single time), then it all works.
 
  -  target name=resolve depends=resolve-tools
  +  target name=resolve
 
  Otherwise, jar-checksums task will fail, because it will remove the
  asm jar's checksum.
 
  On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de wrote:
   What does this f*cking task do? These checksums are a no-go for me.
   I hate
  them and please remove them completely! It took me a hour on the
  weekend to get this shitty task working!
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 12:55 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   again, please run 'ant jar-checksums'
  
   you will see the problem.
  
   On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de
 wrote:
Nat validate works form me. There is a checksum for asm so where is
the
   problem?
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, July 12, 2012 12:49 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
build.xml common-build.xml tools/ivy.xml
tools/lib/apache-rat-0.8.jar.sha1
tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
   
Well, it was pretty related.
   
I needed to add the checksums to commit this, or 'ant validate' would
fail and jenkins would have been very angry!
   
So i had to fix jar-checksums in order to commit!
   
On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler u...@thetaphi.de
  wrote:
 You committed this without documenting it anywhere. Sorry. If you
 want this
fixed, open issue and commit it separately. But not together with
unrelated stuff.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:40 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
 build.xml common-build.xml tools/ivy.xml
 tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-
 NOTICE.txt

 On Thu, Jul 12, 2012 at 6:34 AM,  uschind...@apache.org wrote:
  Author: uschindler
  Date: Thu Jul 12 10:34:11 2012
  New Revision: 1360619
 
  URL: http://svn.apache.org/viewvc?rev=1360619view=rev
  Log:
  LUCENE-3950: Use ivy.cachepath for Apache RAT
 
  Removed:
  lucene/dev/trunk/lucene/tools/lib/apache-rat-0.8.jar.sha1

RE: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Can you simply fix it for me?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 1:20 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
 its not an endless chain of depends, tools/ is not really a real module and 
 had
 no dependencies before.
 
 So its excluded from the ordinary modules-crawl (no documentation, javadocs,
 tests, packaging is done for it).
 
 Now it has dependencies, so its important that resolve defer to it if we want 
 it
 to work within IDEs such as eclipse, and if we want tasks like jar-checksum to
 work.
 
 On Thu, Jul 12, 2012 at 7:11 AM, Uwe Schindler u...@thetaphi.de wrote:
  Aha - I have no idea what you are talking about. Fix it, for me it works! I
 refuse to add endless chains of depends everywhere just to get a stupid tool 
 xy
 working in compination yx on foobar.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 1:07 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  Right, ever since then the jar-checksums task has not worked correctly.
 
  I dont know how you added a checksum, maybe with 'openssl sha1'
  yourself manually?
 
  But i needed to add a checksum for the commit to succeed, so i had to
  fix this task.
 
  On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote:
   What is different to Sunday afternoon? I added asm-all-debug.jar
   with a
  checksum generated by my local windows tools and it worked? I don’t
  care about this ant task.
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 1:03 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   ok but currently they are required, if you add a 3rd party jar and
   dont add a checksum for it, the build fails.
  
   so if we add back the dependency to resolve-tools for top-level
   lucene resolve (build.xml, only invoked a single time), then it all 
   works.
  
   -  target name=resolve depends=resolve-tools
   +  target name=resolve
  
   Otherwise, jar-checksums task will fail, because it will remove the
   asm jar's checksum.
  
   On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de
 wrote:
What does this f*cking task do? These checksums are a no-go for me.
I hate
   them and please remove them completely! It took me a hour on the
   weekend to get this shitty task working!
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, July 12, 2012 12:55 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
build.xml common-build.xml tools/ivy.xml
tools/lib/apache-rat-0.8.jar.sha1
tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
   
again, please run 'ant jar-checksums'
   
you will see the problem.
   
On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de
  wrote:
 Nat validate works form me. There is a checksum for asm so where
 is
 the
problem?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:49 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
 build.xml common-build.xml tools/ivy.xml
 tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-
 NOTICE.txt

 Well, it was pretty related.

 I needed to add the checksums to commit this, or 'ant validate'
 would
 fail and jenkins would have been very angry!

 So i had to fix jar-checksums in order to commit!

 On Thu, Jul 12, 2012 at 6:47 AM, Uwe Schindler
 u...@thetaphi.de
   wrote:
  You committed this without

Re: Replication and proxy settings

2012-07-12 Thread Erick Erickson

Gautier:

It's perfectly appropriate to open a JIRA and attach your code as a
patch, thanks!

If you haven't already seen it, this describes now to make patches, etc...

http://wiki.apache.org/solr/HowToContribute

Most IDEs have the ability to create them too, and svn -diff is easy
to do (execute that from the
dir that contains both solr and lucene for ease-of-use, please).

We typically name patches as SOLR-.patch, where ### is the JIRA number.

Here's the JIRA link: https://issues.apache.org/jira/browse/solr, you
have to create
a user ID to add JIRAs/patches.

Code is always welcome!

Best
Erick

On Thu, Jul 12, 2012 at 6:43 AM, Gautier Koscielny kosci...@ebi.ac.uk wrote:
 Hello,

  I work near Cambridge at the EMBL-EBI and I would like to contribute to
 SOLR.

 Our current project involved 2 teams developing a full text search service
 based on SOLR 3.6.

 We have had issues when trying to replicate a master copy of an index to a
 slave using HTTP proxy settings passed to the JRE.

 To solve this issue, I've created a copy of the SnapPuller and the
 ReplicationHandler, modified the code to manage
 proxy settings and modified the configuration of the SOLR slave to use this
 new handler.
 We have tested the replication using proxy settings in our environment with
 success.

 What I would like to do now is to apply these changes to the SnapPuller
 directly:
 -  check proxy settings before creating the httpClient
 -  apply the proxy setting to the httpClient HostConfiguration.

 I don't want to patch 3.6 and would like to apply the change to SOLR 4.

 Please, tell me what do you think about this change and if I can proceed.

 What is the usual procedure to commit code?

 Best regards,
 Gautier

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml

Ahm...
+   SystemProperty 
name=lucidworksLogsHome//request._mm_dd.log

Maybe change this name!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: markrmil...@apache.org [mailto:markrmil...@apache.org]
 Sent: Thursday, July 12, 2012 1:43 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/
 solr/example/ solr/example/etc/jetty.xml
 
 Author: markrmiller
 Date: Thu Jul 12 11:42:50 2012
 New Revision: 1360640
 
 URL: http://svn.apache.org/viewvc?rev=1360640view=rev
 Log:
 add a commented out example to jetty.xml for configuring a request log
 
 Modified:
 lucene/dev/branches/branch_4x/   (props changed)
 lucene/dev/branches/branch_4x/solr/   (props changed)
 lucene/dev/branches/branch_4x/solr/example/   (props changed)
 lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml
 
 Modified: lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/et
 c/jetty.xml?rev=1360640r1=1360639r2=1360640view=diff
 
 ==
 --- lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml (original)
 +++ lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Thu Jul 12
 11:42:50 2012
 @@ -87,6 +87,32 @@
/New
  /Set
 
 +!--
 === --
 +!-- Configure Request Log   --
 +!--
 === --
 +!--
 +Ref id=Handlers
 +  Call name=addHandler
 +Arg
 +  New id=RequestLog
 class=org.eclipse.jetty.server.handler.RequestLogHandler
 +Set name=requestLog
 +  New id=RequestLogImpl
 class=org.eclipse.jetty.server.NCSARequestLog
 +Set name=filename
 +   SystemProperty
 name=lucidworksLogsHome//request._mm_dd.log
 +/Set
 +Set name=filenameDateFormat_mm_dd/Set
 +Set name=retainDays90/Set
 +Set name=appendtrue/Set
 +Set name=extendedfalse/Set
 +Set name=logCookiesfalse/Set
 +Set name=LogTimeZoneUTC/Set
 +  /New
 +/Set
 +  /New
 +/Arg
 +  /Call
 +/Ref
 +--
 
  !-- ===
 --
  !-- extra options   --



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3617) Consider adding start scripts.

Mark Miller created SOLR-3617:
-

 Summary: Consider adding start scripts.
 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller


I've always found that starting Solr with java -jar start.jar is a little odd 
if you are not a java guy, but I think there are bigger pros than looking less 
odd in shipping some start scripts.

Not only do you get a cleaner start command:
sh solr.sh or solr.bat or something
But you also can do a couple other little nice things:
* it becomes fairly obvious for a new casual user to see how to start the 
system without reading doc.
* you can make the working dir the location of the script - this lets you call 
the start script from another dir and still have all the relative dir setup 
work.
* have an out of the box place to save startup params like -Xmx.
* we could have multiple start scripts - say solr-dev.sh that logged to the 
console and default to sys default for RAM - and also solr-prod which was fully 
configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc.

You would still of course be able to make the java cmd directly - and that is 
probably what you would do when it's time to run as a service - but these could 
be good starter scripts to get people on the right track and improve the 
initial user experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3617) Consider adding start scripts.


[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412723#comment-13412723
 ] 

Mark Miller commented on SOLR-3617:
---

Thoughts?

 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller

 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/ solr/example/ solr/example/etc/jetty.xml

2012-07-12 Thread Mark Miller

It's supposed to be 'logs'. I changed it in the example dir I was testing with, 
but missed it in the actual src it seems.

On Jul 12, 2012, at 7:49 AM, Uwe Schindler wrote:

 Ahm...
 +   SystemProperty 
 name=lucidworksLogsHome//request._mm_dd.log
 
 Maybe change this name!
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
 -Original Message-
 From: markrmil...@apache.org [mailto:markrmil...@apache.org]
 Sent: Thursday, July 12, 2012 1:43 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1360640 - in /lucene/dev/branches/branch_4x: ./ solr/
 solr/example/ solr/example/etc/jetty.xml
 
 Author: markrmiller
 Date: Thu Jul 12 11:42:50 2012
 New Revision: 1360640
 
 URL: http://svn.apache.org/viewvc?rev=1360640view=rev
 Log:
 add a commented out example to jetty.xml for configuring a request log
 
 Modified:
lucene/dev/branches/branch_4x/   (props changed)
lucene/dev/branches/branch_4x/solr/   (props changed)
lucene/dev/branches/branch_4x/solr/example/   (props changed)
lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml
 
 Modified: lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/et
 c/jetty.xml?rev=1360640r1=1360639r2=1360640view=diff
 
 ==
 --- lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml (original)
 +++ lucene/dev/branches/branch_4x/solr/example/etc/jetty.xml Thu Jul 12
 11:42:50 2012
 @@ -87,6 +87,32 @@
   /New
 /Set
 
 +!--
 === --
 +!-- Configure Request Log   --
 +!--
 === --
 +!--
 +Ref id=Handlers
 +  Call name=addHandler
 +Arg
 +  New id=RequestLog
 class=org.eclipse.jetty.server.handler.RequestLogHandler
 +Set name=requestLog
 +  New id=RequestLogImpl
 class=org.eclipse.jetty.server.NCSARequestLog
 +Set name=filename
 +   SystemProperty
 name=lucidworksLogsHome//request._mm_dd.log
 +/Set
 +Set name=filenameDateFormat_mm_dd/Set
 +Set name=retainDays90/Set
 +Set name=appendtrue/Set
 +Set name=extendedfalse/Set
 +Set name=logCookiesfalse/Set
 +Set name=LogTimeZoneUTC/Set
 +  /New
 +/Set
 +  /New
 +/Arg
 +  /Call
 +/Ref
 +--
 
 !-- ===
 --
 !-- extra options   --
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com












-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

Yes, ill fix!

On Thu, Jul 12, 2012 at 7:25 AM, Uwe Schindler u...@thetaphi.de wrote:
 Can you simply fix it for me?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 1:20 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene: build.xml
 common-build.xml tools/ivy.xml tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt

 its not an endless chain of depends, tools/ is not really a real module and 
 had
 no dependencies before.

 So its excluded from the ordinary modules-crawl (no documentation, javadocs,
 tests, packaging is done for it).

 Now it has dependencies, so its important that resolve defer to it if we 
 want it
 to work within IDEs such as eclipse, and if we want tasks like jar-checksum 
 to
 work.

 On Thu, Jul 12, 2012 at 7:11 AM, Uwe Schindler u...@thetaphi.de wrote:
  Aha - I have no idea what you are talking about. Fix it, for me it works! I
 refuse to add endless chains of depends everywhere just to get a stupid tool 
 xy
 working in compination yx on foobar.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Thursday, July 12, 2012 1:07 PM
  To: dev@lucene.apache.org
  Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
  build.xml common-build.xml tools/ivy.xml
  tools/lib/apache-rat-0.8.jar.sha1
  tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
 
  Right, ever since then the jar-checksums task has not worked correctly.
 
  I dont know how you added a checksum, maybe with 'openssl sha1'
  yourself manually?
 
  But i needed to add a checksum for the commit to succeed, so i had to
  fix this task.
 
  On Thu, Jul 12, 2012 at 7:06 AM, Uwe Schindler u...@thetaphi.de wrote:
   What is different to Sunday afternoon? I added asm-all-debug.jar
   with a
  checksum generated by my local windows tools and it worked? I don’t
  care about this ant task.
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
   -Original Message-
   From: Robert Muir [mailto:rcm...@gmail.com]
   Sent: Thursday, July 12, 2012 1:03 PM
   To: dev@lucene.apache.org
   Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
   build.xml common-build.xml tools/ivy.xml
   tools/lib/apache-rat-0.8.jar.sha1
   tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
  
   ok but currently they are required, if you add a 3rd party jar and
   dont add a checksum for it, the build fails.
  
   so if we add back the dependency to resolve-tools for top-level
   lucene resolve (build.xml, only invoked a single time), then it all 
   works.
  
   -  target name=resolve depends=resolve-tools
   +  target name=resolve
  
   Otherwise, jar-checksums task will fail, because it will remove the
   asm jar's checksum.
  
   On Thu, Jul 12, 2012 at 6:59 AM, Uwe Schindler u...@thetaphi.de
 wrote:
What does this f*cking task do? These checksums are a no-go for me.
I hate
   them and please remove them completely! It took me a hour on the
   weekend to get this shitty task working!
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, July 12, 2012 12:55 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
build.xml common-build.xml tools/ivy.xml
tools/lib/apache-rat-0.8.jar.sha1
tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-NOTICE.txt
   
again, please run 'ant jar-checksums'
   
you will see the problem.
   
On Thu, Jul 12, 2012 at 6:53 AM, Uwe Schindler u...@thetaphi.de
  wrote:
 Nat validate works form me. There is a checksum for asm so where
 is
 the
problem?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Thursday, July 12, 2012 12:49 PM
 To: dev@lucene.apache.org
 Subject: Re: svn commit: r1360619 - in /lucene/dev/trunk/lucene:
 build.xml common-build.xml tools/ivy.xml
 tools/lib/apache-rat-0.8.jar.sha1
 tools/lib/apache-rat-LICENSE-ASL.txt tools/lib/apache-rat-
 NOTICE.txt

 Well, it was pretty related.

 I needed to add the checksums to commit this, or 'ant validate'
 would
 fail and jenkins would have been very angry!

 So i had to fix jar-checksums in order to commit!

 On Thu, Jul 12,

Re: Replication and proxy settings

2012-07-12 Thread Gautier Koscielny

Hi Erick,

I'll follow these guidelines.
I'll open a IRA ticket for 3.6.1 

Thank you
Gautier


On 12 Jul 2012, at 12:38, Erick Erickson wrote:

 Gautier:
 
 It's perfectly appropriate to open a JIRA and attach your code as a
 patch, thanks!
 
 If you haven't already seen it, this describes now to make patches, etc...
 
 http://wiki.apache.org/solr/HowToContribute
 
 Most IDEs have the ability to create them too, and svn -diff is easy
 to do (execute that from the
 dir that contains both solr and lucene for ease-of-use, please).
 
 We typically name patches as SOLR-.patch, where ### is the JIRA number.
 
 Here's the JIRA link: https://issues.apache.org/jira/browse/solr, you
 have to create
 a user ID to add JIRAs/patches.
 
 Code is always welcome!
 
 Best
 Erick
 
 On Thu, Jul 12, 2012 at 6:43 AM, Gautier Koscielny kosci...@ebi.ac.uk wrote:
 Hello,
 
 I work near Cambridge at the EMBL-EBI and I would like to contribute to
 SOLR.
 
 Our current project involved 2 teams developing a full text search service
 based on SOLR 3.6.
 
 We have had issues when trying to replicate a master copy of an index to a
 slave using HTTP proxy settings passed to the JRE.
 
 To solve this issue, I've created a copy of the SnapPuller and the
 ReplicationHandler, modified the code to manage
 proxy settings and modified the configuration of the SOLR slave to use this
 new handler.
 We have tested the replication using proxy settings in our environment with
 success.
 
 What I would like to do now is to apply these changes to the SnapPuller
 directly:
 -  check proxy settings before creating the httpClient
 -  apply the proxy setting to the httpClient HostConfiguration.
 
 I don't want to patch 3.6 and would like to apply the change to SOLR 4.
 
 Please, tell me what do you think about this change and if I can proceed.
 
 What is the usual procedure to commit code?
 
 Best regards,
 Gautier
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3617) Consider adding start scripts.


[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412727#comment-13412727
 ] 

Robert Muir commented on SOLR-3617:
---

Somewhat related: it might be worth adding /etc/init.d-type start/stop/etc 
scripts, at least
one that works for Linux. I'm sure people have these already themselves or are 
writing their own.


 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller

 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3617) Consider adding start scripts.


[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412728#comment-13412728
 ] 

Uwe Schindler commented on SOLR-3617:
-

As fan of Ubuntu, please also add a upstart config (/etc/init/solr.conf), thats 
much easier to write than stupid shell script with the well known 
start|stop|... switch :-)

 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller

 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3618) Enable replication of master using proxy settings

2012-07-12 Thread Gautier Koscielny (JIRA)

Gautier Koscielny created SOLR-3618:
---

 Summary: Enable replication of master using proxy settings
 Key: SOLR-3618
 URL: https://issues.apache.org/jira/browse/SOLR-3618
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Affects Versions: 3.6.1
Reporter: Gautier Koscielny
 Fix For: 3.6.1


Check whether system properties http.proxyHost and http.proxyPort are set 
to initialize the httpClient instance properly in the SnapPuller class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized


 [ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-3377:
--

Assignee: Yonik Seeley

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4192) SpatialStrategy: Remove isPolyField() and createField(shape)

2012-07-12 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-4192.
--

Resolution: Fixed

Committed to 4x  trunk

 SpatialStrategy: Remove isPolyField() and createField(shape)
 

 Key: LUCENE-4192
 URL: https://issues.apache.org/jira/browse/LUCENE-4192
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0

 Attachments: 
 LUCENE-4192_remove_spatial_isPolyField_and_createField.patch


 On SpatialStrategy, I think the presence of isPolyField() and the 
 single-field createField(shape) is a but much.  They were probably copied 
 from Solr's FieldType design without really thinking much if they were really 
 needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3609) Pin down the Solr webapp to a specific directory rather than a unique random directory.

[
https://issues.apache.org/jira/browse/SOLR-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller resolved SOLR-3609.
---

Resolution: Fixed

Pin down the Solr webapp to a specific directory rather than a unique random
directory.
---

Key: SOLR-3609
URL: https://issues.apache.org/jira/browse/SOLR-3609
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-3609.patch

I'd like to pin down the extracted webapp dir to a constant known location. I
think this is a better user experience, in that the location is fixed, and it
also would allow us to write scripts that can find all of our jars.
For example, there is currently some functionality in ZkController.main to
handle some ZooKeeper tasks before starting Solr - something you often want
to be able to do. There are more tools that would be nice to add. To create
the best user experience for these tools, it would be great to have an
example/cloud-tools directory with some simple scripts to make for easy cmd
line execution. These scripts will need to be able to easily locate the
webapps jars.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3460) Improve cmd line config bootstrap tool.


 [ 
https://issues.apache.org/jira/browse/SOLR-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3460:
--

Fix Version/s: 5.0

 Improve cmd line config bootstrap tool.
 ---

 Key: SOLR-3460
 URL: https://issues.apache.org/jira/browse/SOLR-3460
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0

 Attachments: SOLR-3460.patch, SOLR-3460.patch


 Improve cmd line tool for bootstrapping config sets. Rather than take a 
 config set name and directory, make it work like -Dboostrap_conf=true and 
 read solr.xml to find config sets. Config sets will be named after the 
 collection and auto linked to the identically named collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys

2012-07-12 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412825#comment-13412825
 ] 

David Smiley commented on LUCENE-4173:
--

+0.  I am guessing that Ryan added this concept so that it would be easier to 
demonstrate easily index a variety of shapes in a variety of different ways, 
ignoring cases where some strategy doesn't handle some particular shape.  But I 
think this feature if you could call it that, has dubious value otherwise.  
Clearly it does and should default to false so it won't harm anyone if they 
leave this alone.  

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4173.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SynonymFilter, FST, and Aho-Corasick algorithm

2012-07-12 Thread Smiley, David W.

Hello.
  I'm embarking on developing code similar to the SynonymFilter but which 
merely needs to record out of band to the analysis where there is matching text 
in the input tokens to the corpus in the FST.  I'm calling this a keyword 
tagger in which I shove text through it and when it's done it tells me at what 
offsets there is a match to a corpus of keyword  phrases, and to what 
keywords/phrases they were exactly.  It doesn't have to inject or modify the 
token stream because the results of this are going elsewhere.  Although, it 
would be a fine approach to only omit the tags as I call them as a way of 
consuming the results, but I'm not indexing them so it doesn't matter.

  I noticed the following TODOs at the start:

// TODO: maybe we should resolve token - wordID then run
// FST on wordIDs, for better perf?

I intend on doing this since my matching keyword/phrases are often more than 
one word, and I expect this will save memory and be faster.

// TODO: a more efficient approach would be Aho/Corasick's
// algorithm
// http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
// It improves over the current approach here
// because it does not fully re-start matching at every
// token.  For example if one pattern is a b c x
// and another is b c d and the input is a b c d, on
// trying to parse a b c x but failing when you got to x,
// rather than starting over again your really should
// immediately recognize that b c d matches at the next
// input.  I suspect this won't matter that much in
// practice, but it's possible on some set of synonyms it
// will.  We'd have to modify Aho/Corasick to enforce our
// conflict resolving (eg greedy matching) because that algo
// finds all matches.  This really amounts to adding a .*
// closure to the FST and then determinizing it.

Could someone please clarify how the problem in the example above is to be 
fixed?  At the end it states how to solve it, but I don't know how to do that 
and I'm not sure if there is anything more to it since after all if it's as 
easy as that last sentence sounds then it would have been done already ;-)

This code is intense!  I wish FSTs were better documented.  For example, there 
are no javadocs on public members of FST.Arc like output and 
nextFinalOutput which are pertinent since SynonymFilter directly accesses 
them.  IMO the state of FSTs is such that those that wrote them know how they 
work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't 
touch them because we don't know how.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412872#comment-13412872
 ] 

Mark Miller commented on SOLR-3488:
---

I'm going to add a collection RELOAD command, and beef up the tests a little. 
Still more to do after that.

 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud

2012-07-12 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412874#comment-13412874
 ] 

Markus Jelsma commented on SOLR-3488:
-

Is it intended for a collection RELOAD action to reload all collection cores 
immediately? That implies downtime i assume?

 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

Knowing how fsts work and being comfortable with the api that evolved
through a series of exploratory patches are two different things. I like my
fsa api much better and there was an effort to do something similar for
lucene but i gave up at some point because the speed of development killed
me.

Can you describe what youre trying to achieve in more detail? Ive used fsts
for pattern matching (sequences of arbitrary length) and my experience is
that simple state trackers work wery well (even if they may seem to do lots
of spurious tracking).
On Jul 12, 2012 5:09 PM, Smiley, David W. dsmi...@mitre.org wrote:

 Hello.
   I'm embarking on developing code similar to the SynonymFilter but which
 merely needs to record out of band to the analysis where there is matching
 text in the input tokens to the corpus in the FST.  I'm calling this a
 keyword tagger in which I shove text through it and when it's done it
 tells me at what offsets there is a match to a corpus of keyword  phrases,
 and to what keywords/phrases they were exactly.  It doesn't have to inject
 or modify the token stream because the results of this are going elsewhere.
  Although, it would be a fine approach to only omit the tags as I call
 them as a way of consuming the results, but I'm not indexing them so it
 doesn't matter.

   I noticed the following TODOs at the start:

 // TODO: maybe we should resolve token - wordID then run
 // FST on wordIDs, for better perf?

 I intend on doing this since my matching keyword/phrases are often more
 than one word, and I expect this will save memory and be faster.

 // TODO: a more efficient approach would be Aho/Corasick's
 // algorithm
 //
 http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
 // It improves over the current approach here
 // because it does not fully re-start matching at every
 // token.  For example if one pattern is a b c x
 // and another is b c d and the input is a b c d, on
 // trying to parse a b c x but failing when you got to x,
 // rather than starting over again your really should
 // immediately recognize that b c d matches at the next
 // input.  I suspect this won't matter that much in
 // practice, but it's possible on some set of synonyms it
 // will.  We'd have to modify Aho/Corasick to enforce our
 // conflict resolving (eg greedy matching) because that algo
 // finds all matches.  This really amounts to adding a .*
 // closure to the FST and then determinizing it.

 Could someone please clarify how the problem in the example above is to be
 fixed?  At the end it states how to solve it, but I don't know how to do
 that and I'm not sure if there is anything more to it since after all if
 it's as easy as that last sentence sounds then it would have been done
 already ;-)

 This code is intense!  I wish FSTs were better documented.  For example,
 there are no javadocs on public members of FST.Arc like output and
 nextFinalOutput which are pertinent since SynonymFilter directly accesses
 them.  IMO the state of FSTs is such that those that wrote them know how
 they work (Robert, McCandless, Weiss) and seemingly everyone else like me
 doesn't touch them because we don't know how.

 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412888#comment-13412888
 ] 

Mark Miller commented on SOLR-3488:
---

bq. Is it intended for a collection RELOAD action to reload all collection 
cores immediately? 

Yes, at least initially. Essentially a convenience method for reloading your 
cores to pick up changed config or settings. There may be other ways we allow 
that to happen more automatically eventually, but at a minimum we need the 
ability to trigger a collection wide reload. There are things to consider for a 
truly massive cluster - do you really want every node trying to read the new 
configs form zk at the same time? That's in the future if I end up working on 
it. We'd have to see how many servers it takes before you end up with a 
problem, if it is indeed a problem at all.


bq. That implies downtime i assume?

I'm not sure why? Core reloads don't involve any down time.

 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

2012-07-12 Thread Smiley, David W.


On Jul 12, 2012, at 11:51 AM, Dawid Weiss wrote:

Knowing how fsts work and being comfortable with the api that evolved through a 
series of exploratory patches are two different things. I like my fsa api much 
better and there was an effort to do something similar for lucene but i gave up 
at some point because the speed of development killed me.

Do you mean it was slow to coordinate / get consensus or…?  Just curious.

Can you describe what youre trying to achieve in more detail? Ive used fsts for 
pattern matching (sequences of arbitrary length) and my experience is that 
simple state trackers work wery well (even if they may seem to do lots of 
spurious tracking).

I rather like Wikipedia's definition:  
http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm

The number of names I want to handle is in the millions and so use of Lucene's 
FST is essential as I see it.

~ David


On Jul 12, 2012 5:09 PM, Smiley, David W. 
dsmi...@mitre.orgmailto:dsmi...@mitre.org wrote:
Hello.
  I'm embarking on developing code similar to the SynonymFilter but which 
merely needs to record out of band to the analysis where there is matching text 
in the input tokens to the corpus in the FST.  I'm calling this a keyword 
tagger in which I shove text through it and when it's done it tells me at what 
offsets there is a match to a corpus of keyword  phrases, and to what 
keywords/phrases they were exactly.  It doesn't have to inject or modify the 
token stream because the results of this are going elsewhere.  Although, it 
would be a fine approach to only omit the tags as I call them as a way of 
consuming the results, but I'm not indexing them so it doesn't matter.

  I noticed the following TODOs at the start:

// TODO: maybe we should resolve token - wordID then run
// FST on wordIDs, for better perf?

I intend on doing this since my matching keyword/phrases are often more than 
one word, and I expect this will save memory and be faster.

// TODO: a more efficient approach would be Aho/Corasick's
// algorithm
// http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
// It improves over the current approach here
// because it does not fully re-start matching at every
// token.  For example if one pattern is a b c x
// and another is b c d and the input is a b c d, on
// trying to parse a b c x but failing when you got to x,
// rather than starting over again your really should
// immediately recognize that b c d matches at the next
// input.  I suspect this won't matter that much in
// practice, but it's possible on some set of synonyms it
// will.  We'd have to modify Aho/Corasick to enforce our
// conflict resolving (eg greedy matching) because that algo
// finds all matches.  This really amounts to adding a .*
// closure to the FST and then determinizing it.

Could someone please clarify how the problem in the example above is to be 
fixed?  At the end it states how to solve it, but I don't know how to do that 
and I'm not sure if there is anything more to it since after all if it's as 
easy as that last sentence sounds then it would have been done already ;-)

This code is intense!  I wish FSTs were better documented.  For example, there 
are no javadocs on public members of FST.Arc like output and 
nextFinalOutput which are pertinent since SynonymFilter directly accesses 
them.  IMO the state of FSTs is such that those that wrote them know how they 
work (Robert, McCandless, Weiss) and seemingly everyone else like me doesn't 
touch them because we don't know how.

~ David Smiley
-
To unsubscribe, e-mail: 
dev-unsubscr...@lucene.apache.orgmailto:dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 
dev-h...@lucene.apache.orgmailto:dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

The development was too fast for me to keep up. And by the time i had some
concept of the api mike wrote about million lines of code that would have
to be rewritten ;)

The current api isn't bad. Its fast.

I asked for an example of what you're trying to do because then i'd be able
to tell you if what i used would work. The number of entries does not
matter. I did use fsts but simple fsts nothing special.
On Jul 12, 2012 6:05 PM, Smiley, David W. dsmi...@mitre.org wrote:


  On Jul 12, 2012, at 11:51 AM, Dawid Weiss wrote:

 Knowing how fsts work and being comfortable with the api that evolved
 through a series of exploratory patches are two different things. I like my
 fsa api much better and there was an effort to do something similar for
 lucene but i gave up at some point because the speed of development killed
 me.

 Do you mean it was slow to coordinate / get consensus or…?  Just curious.

 Can you describe what youre trying to achieve in more detail? Ive used
 fsts for pattern matching (sequences of arbitrary length) and my experience
 is that simple state trackers work wery well (even if they may seem to do
 lots of spurious tracking).

 I rather like Wikipedia's definition:  http://en.wikipedia.org/wiki/Aho
 –Corasick_string_matching_algorithm

  The number of names I want to handle is in the millions and so use of
 Lucene's FST is essential as I see it.

  ~ David


  On Jul 12, 2012 5:09 PM, Smiley, David W. dsmi...@mitre.org wrote:

 Hello.
   I'm embarking on developing code similar to the SynonymFilter but which
 merely needs to record out of band to the analysis where there is matching
 text in the input tokens to the corpus in the FST.  I'm calling this a
 keyword tagger in which I shove text through it and when it's done it
 tells me at what offsets there is a match to a corpus of keyword  phrases,
 and to what keywords/phrases they were exactly.  It doesn't have to inject
 or modify the token stream because the results of this are going elsewhere.
  Although, it would be a fine approach to only omit the tags as I call
 them as a way of consuming the results, but I'm not indexing them so it
 doesn't matter.

   I noticed the following TODOs at the start:

 // TODO: maybe we should resolve token - wordID then run
 // FST on wordIDs, for better perf?

 I intend on doing this since my matching keyword/phrases are often more
 than one word, and I expect this will save memory and be faster.

 // TODO: a more efficient approach would be Aho/Corasick's
 // algorithm
 //
 http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
 // It improves over the current approach here
 // because it does not fully re-start matching at every
 // token.  For example if one pattern is a b c x
 // and another is b c d and the input is a b c d, on
 // trying to parse a b c x but failing when you got to x,
 // rather than starting over again your really should
 // immediately recognize that b c d matches at the next
 // input.  I suspect this won't matter that much in
 // practice, but it's possible on some set of synonyms it
 // will.  We'd have to modify Aho/Corasick to enforce our
 // conflict resolving (eg greedy matching) because that algo
 // finds all matches.  This really amounts to adding a .*
 // closure to the FST and then determinizing it.

 Could someone please clarify how the problem in the example above is to
 be fixed?  At the end it states how to solve it, but I don't know how to do
 that and I'm not sure if there is anything more to it since after all if
 it's as easy as that last sentence sounds then it would have been done
 already ;-)

 This code is intense!  I wish FSTs were better documented.  For example,
 there are no javadocs on public members of FST.Arc like output and
 nextFinalOutput which are pertinent since SynonymFilter directly accesses
 them.  IMO the state of FSTs is such that those that wrote them know how
 they work (Robert, McCandless, Weiss) and seemingly everyone else like me
 doesn't touch them because we don't know how.

 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

2012-07-12 Thread Michael McCandless

On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com wrote:
 The development was too fast for me to keep up. And by the time i had some
 concept of the api mike wrote about million lines of code that would have to
 be rewritten ;)

Mike is very happy to help rewrite that code for a better FST API :)

We can and should also make incremental improvements.

I do agree it's horrible to have code that only a small set of people
understand: such code is effectively dead.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

The api shouldn't be the goal. Those initial fst changes were driven by
real needs an optimizations and are still great. The api will eventually
take better shape and form based on those use cases i'm sure.

That patch that I had tried to extract a common notion of output and its
grammar. It was half baked anyway so no big loss.
On Jul 12, 2012 6:20 PM, Michael McCandless luc...@mikemccandless.com
wrote:

 On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com
 wrote:
  The development was too fast for me to keep up. And by the time i had
 some
  concept of the api mike wrote about million lines of code that would
 have to
  be rewritten ;)

 Mike is very happy to help rewrite that code for a better FST API :)

 We can and should also make incremental improvements.

 I do agree it's horrible to have code that only a small set of people
 understand: such code is effectively dead.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized


[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412920#comment-13412920
 ] 

Yonik Seeley commented on SOLR-3377:


Thanks Bernd, this looks like an improvement.
After some ad-hoc testing, it seems we still have problems with q=(+id:42)

Another minor concern: the change to clause.field to exclude things like '(' 
also means that when it's not a valid lucene query, our reconstructed query 
will currently drop the paren.

Example: A query of (a:b with a qf=id correctly produces id:(a:b
but a query of (id:b produces id:b
That type of thing should really only affect exact match type fields where 
punctuation isn't dropped - not sure how much of an issue it really is.

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud

2012-07-12 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412927#comment-13412927
 ] 

Markus Jelsma commented on SOLR-3488:
-

Thanks for claryfing, it makes sense. About the downtime on core reload, a load 
balancer pinging Solr's admin/ping handler will definately mark the node as 
down; the ping request will time out for up to a few seconds or even longer in 
case of many firstSearcher events.



 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

On Thu, Jul 12, 2012 at 12:19 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Thu, Jul 12, 2012 at 12:10 PM, Dawid Weiss dawid.we...@gmail.com wrote:
The development was too fast for me to keep up. And by the time i had some
concept of the api mike wrote about million lines of code that would have to
be rewritten ;)

Mike is very happy to help rewrite that code for a better FST API :)

We can and should also make incremental improvements.

I do agree it's horrible to have code that only a small set of people
understand: such code is effectively dead.

I agree, and think we should make improvements to the API whenever we can.

but a fast, efficient, and self-documenting FST API is probably going
to be elusive

I think a lot of this could be fixed with examples and docs, which
we've been working at too, e.g.:
http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/util/fst/package-summary.html#package_description

The biggest problem I have with documentation here is when it becomes
out-of-date. We've made a lot of progress here, we have a
javadocs-lint task that runs in hudson and checks all of our links and
fails if any are dead, etc.

But this does us no good for code samples. I think we need to
seriously revisit/develop a plan for code samples in documentation.
All the samples we have in various docs (e.g. package documentation)
is very fragile, and it discourages me totally from adding any
advanced examples or any more than are minimally necessary to get
started, because I'm afraid of the manual maintenance cost.

Instead I think we should setup a proper examples infrastructure,
where these examples are actually compiled and such. We can still link
to them in javadocs.
Have a look at this example from the demo/ module:
http://lucene.apache.org/core/4_0_0-ALPHA/demo/overview-summary.html#Location_of_the_source

I think we should have more than just SearchFiles and IndexFiles and
also move our examples here, rather than being inlined in the javadocs
text. This way they are compile-time checked, and we can link to them
from anywhere (its safe, and we have link-checkers that prove it).

I'm open to any other ideas though: this is just the best one i have now.

--
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412957#comment-13412957
 ] 

Mark Miller commented on SOLR-3488:
---

Yeah, this sounds like something we have to fix to me. There should not be a 
gap in serving requests on core reload.

 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3598) Provide option to allow aliased field to be included in query for EDisMax QParser

2012-07-12 Thread Jamie Johnson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412962#comment-13412962
 ] 

Jamie Johnson commented on SOLR-3598:
-

Just to make sure I understand you're saying create a pseudo field which we use 
for querying the actual fields?  so basically

pseudofield=realfield1,realfield2,realfield3

 Provide option to allow aliased field to be included in query for EDisMax 
 QParser
 -

 Key: SOLR-3598
 URL: https://issues.apache.org/jira/browse/SOLR-3598
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Affects Versions: 3.6, 4.0-ALPHA
Reporter: Jamie Johnson
Priority: Minor
 Attachments: alias.patch


 I currently have a situation where I'd like the original field included in 
 the query, for instance I have several fields with differing granularity, 
 name, firstname and lastname.  Some of my sources differentiate between these 
 so I can fill out firstname and lastname, while others don't and I need to 
 just place this information in the name field.  When querying I'd like to be 
 able to say name:Jamie and have it translated to name:Jamie first_name:Jamie 
 last_name:Jamie.  In order to do this it creates an alias cycle and the 
 EDisMax Query parser throws an exception about it.  Ideally there would be an 
 option to include the original field as part of the query to support this use 
 case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

 I think a lot of this could be fixed with examples and docs, which

We use a simple ANT task that extracts snippets of code from Java
(very often unit tests) and include these in JavaDocs, unfortunately
by post-processing. See the example here:

http://download.carrot2.org/stable/javadoc/

and the sources (linked) are here:

https://github.com/carrot2/carrot2/blob/df49d66087d0da9e87043e13a400ac148952a41c/applications/carrot2-examples/examples/org/carrot2/examples/clustering/ClusteringDocumentList.java

As you can see there are simple tags of the form:

[[[start:clustering-document-list-intro]]]
...
[[[end:clustering-document-list-intro]]]

these get extracted to separate files which are then included in the
javadocs. Crude, but works. I'm sure it could be improved and probably
other folks have come up with a similar idea, I just don't know of
such attempts.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

On Thu, Jul 12, 2012 at 1:26 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 I think a lot of this could be fixed with examples and docs, which

 We use a simple ANT task that extracts snippets of code from Java
 (very often unit tests) and include these in JavaDocs, unfortunately
 by post-processing. See the example here:

 http://download.carrot2.org/stable/javadoc/

 and the sources (linked) are here:

 https://github.com/carrot2/carrot2/blob/df49d66087d0da9e87043e13a400ac148952a41c/applications/carrot2-examples/examples/org/carrot2/examples/clustering/ClusteringDocumentList.java

 As you can see there are simple tags of the form:

 [[[start:clustering-document-list-intro]]]
 ...
 [[[end:clustering-document-list-intro]]]

 these get extracted to separate files which are then included in the
 javadocs. Crude, but works. I'm sure it could be improved and probably
 other folks have come up with a similar idea, I just don't know of
 such attempts.

Thats more sophisticated than what we do with the javadocs
linksource option in the demo.
The key advantage there is that we never want to link to any
trunk/development code as it may not exist.
So with linksource (like my example), it basically makes an htmlized
version of your code included in the javadocs.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

On Thu, Jul 12, 2012 at 1:29 PM, Robert Muir rcm...@gmail.com wrote:

 Thats more sophisticated than what we do with the javadocs
 linksource option in the demo.
 The key advantage there is that we never want to link to any
 trunk/development code as it may not exist.
 So with linksource (like my example), it basically makes an htmlized
 version of your code included in the javadocs.


But your view source button seems to also do this? So we would just
want to omit the github link at the bottom I think?

I like this solution... how much code is it... can we have it?

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys

2012-07-12 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412997#comment-13412997
 ] 

Ryan McKinley commented on LUCENE-4173:
---

I'm fine removing it from the lucene strategies -- the motivation for this 
feature was to copy the same shape to multiple strategies and compare the 
behavior.

this can be implemented at the solr level though...


 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4173.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4207) speed up our slowest tests

2012-07-12 Thread Michael Garski (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413004#comment-13413004
 ] 

Michael Garski commented on LUCENE-4207:


I have a similar MacBook to Christian (OSX Lion, Mid 2010, Core i7, 8GB RAM, 
480GB SSD, ~90% full) and running ant test takes 20-25 minutes to execute. I 
have not run the stats that Dawid posted previously, those times are just what 
I have seen in the past few months.

 speed up our slowest tests
 --

 Key: LUCENE-4207
 URL: https://issues.apache.org/jira/browse/LUCENE-4207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir

 Was surprised to hear from Christian that lucene/solr tests take him 40 
 minutes on a modern mac.
 This is too much. Lets look at the slowest tests and make them reasonable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License

[
https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-4217:
--

Attachment: LUCENE-4217.patch

Hi,

attached is a patch with a complete overhaul of the Clover reporting on Lucene
+ Solr:

- Clover is loaded by IVY from Maven Central
- The License file was committed to lucene/tools/clover and is automatically
used. This is possible according to the mail from Nicolas Muldon:

{noformat}
On Fri, Dec 18, 2009 at 1:33 AM, Nicholas Muldoon nmuld...@atlassian.com
wrote:
-

Hi,
Atlassian are excited to be presenting Apache with a site license for Clover
2.6.

This Clover license can be used for any code that is under an org.apache
package. Further, this license can be used by any developer on their machine
in conjunction with our Eclipse or IntelliJ plugins for development on an
org.apache project.
{noformat}

Also Mike me talked to Nicholas and Nick Pellow, and we got the following
response:

{noformat}
On Sat, Dec 19, 2009 at 10:38 PM, Nick Pellow npel...@atlassian.com wrote:
Hi Mike,

That would be great if you could forward this to committ...@apache.org.
The license is available to anyone working on the org.apache.* be it
in IDEA/Eclipse/Ant/Maven locally, or on a central build server.

Since the license will only instrument and report coverage on
org.apache packages, please mention that it is fine to commit this
license to each project if it makes running builds easier. ie just
check out the project and run with Clover, without the need for the
extra step of locating and installing the clover license.

Cheers,
Nick

On 19/12/2009, at 1:11 AM, Michael McCandless wrote:

Woops, I meant The only restriction is that it will only test
coverage of packages under org.apache, below.

Mike

On Fri, Dec 18, 2009 at 9:05 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Since this generous offer extends beyond Lucene...

I'd like to forward this to committ...@apache.org, pointing to where
the license is available

(https://svn.apache.org/repos/private/committers/donated-licenses/cl
over/2.6.x), explaining that Lucene upgraded (providing the link to
our coverage report), etc.

But I wanted to confirm with you all first: is this OK? This
license may be used by anyone? The only restriction is that it will
only test coverage of packages under org.apache.lucene?

I can draft something up and run it by you all first, if this makes
sense...
{noformat}

- The ANT tasks were cleaned up and now work per module without crazy filesets.
Only test-framework is not clovered, as it was explicitely disabled by the
managers of the new buildsystem. Unfortunately if your make compile-core in
test-framework depend on clover, it will correctly clover it, but as its in
src/ and not /test it will be counted as source code and not test code and
appears in the report as such. I left it disabled for now until we find a
solution.
- Solr now reports everything also coverage on all referred Lucene modules
(cool!)

If you want to run a test build with clover, do:

{noformat}
# must be cleaned first on top-level, so all half baked code is gone
ant clean
# go to lucene or solr
ant -Drun.clover=true test generate-clover-reports
{noformat}

This downloads clover from Maven central and runs all tests with clover and
publishes the report. The target folder changed a bit (clenaup), we must change
Jenkins config/scripts (I can do).

For Lucene and Solr the Cloverage Database is always placed in Lucene's build
folder (as before), this is why you must clean on top-level.

Load clover.jar from ivy-cachepath andy ship sources with License
-

Key: LUCENE-4217
URL: https://issues.apache.org/jira/browse/LUCENE-4217
Project: Lucene - Java
Issue Type: Improvement
Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 4.0, 5.0

Attachments: LUCENE-4217.patch

When clover granted use the license for their clover-2.6.3.jar file they
allowed us to ship this license file to every developer. Currently clover
setup is very hard for users, so this issue will make it simple.
If you want to run tests with clover, just pass -Drun.clover=true to ant
clean test. ANT will then download clover via IVY and point it to the license
file in our tools folder. The license is supplemented by the original mail
from Atlassian, that everybody is allowed to use it with code in the
org.apache. java package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License


[ 
https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413044#comment-13413044
 ] 

Uwe Schindler commented on LUCENE-4217:
---

I wanted to mention: The code in the attached patch is ASF of course, but the 
License File is of course not Apache License. But there is only one checkbox in 
JIRA!

 Load clover.jar from ivy-cachepath andy ship sources with License
 -

 Key: LUCENE-4217
 URL: https://issues.apache.org/jira/browse/LUCENE-4217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4217.patch


 When clover granted use the license for their clover-2.6.3.jar file they 
 allowed us to ship this license file to every developer. Currently clover 
 setup is very hard for users, so this issue will make it simple.
 If you want to run tests with clover, just pass -Drun.clover=true to ant 
 clean test. ANT will then download clover via IVY and point it to the license 
 file in our tools folder. The license is supplemented by the original mail 
 from Atlassian, that everybody is allowed to use it with code in the 
 org.apache. java package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2891 - Failure

2012-07-12 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2891/

3 tests failed.
REGRESSION:  
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([F8B2450BF5C21177:D98FB31590BF391B]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:487)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:454)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.add1document(TestSqlEntityProcessorDelta3.java:83)
at 
org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta3.testCompositePk_DeltaImport_delete(TestSqlEntityProcessorDelta3.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED:

[jira] [Commented] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License


[ 
https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413050#comment-13413050
 ] 

Uwe Schindler commented on LUCENE-4217:
---

We should maybe exclude the License file from the Source ZIP/TGZ file, but keep 
it in SVN? it's just an excludes.../

 Load clover.jar from ivy-cachepath andy ship sources with License
 -

 Key: LUCENE-4217
 URL: https://issues.apache.org/jira/browse/LUCENE-4217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4217.patch


 When clover granted use the license for their clover-2.6.3.jar file they 
 allowed us to ship this license file to every developer. Currently clover 
 setup is very hard for users, so this issue will make it simple.
 If you want to run tests with clover, just pass -Drun.clover=true to ant 
 clean test. ANT will then download clover via IVY and point it to the license 
 file in our tools folder. The license is supplemented by the original mail 
 from Atlassian, that everybody is allowed to use it with code in the 
 org.apache. java package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring


[ 
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413055#comment-13413055
 ] 

Robert Muir commented on LUCENE-4100:
-

{quote}
Your index at 1) does not have to be 'optimized' (it does not have to consist 
of one index segment only). In fact, maxscore can be more efficient with 
multiple segments because multiple maxscores are computed for many frequent 
terms for subsets of documents, resulting in tighter bounds and more effective 
pruning.
{quote}

I've been thinking about this a lot lately: while what you say is true, thats 
because you reprocess all segments with IndexRewriter (which is fine for a 
static collection).

But this algorithm in general is not rank safe with incremental indexing: the 
problem is that when doing actual scoring,
scores consist of per-segment/within document stats (term frequency, document 
length), but also are affected by collection-wide
statistics from many other segments (IDF, average document length, ...) or even 
machines in a distributed collection.

So I think for this to work and remain rank-safe, we cannot write the entire 
score into the segment, because the score
at actual search time is dependent on all the other segments being searched. 
Instead I think this can only work when
we can easily factor out an impact (e.g. in the case of DefaultSimilarity the 
indexed maxscore excludes the IDF component,
this is instead multiplied in at search time).

I don't see how it can be rank-safe with algorithms like BM25 and incremental 
indexing, where parameters like average document
length are not simple multiplicative factors into the formula: and determine 
exactly how important tf versus document length play
a role in the score, but I'll think about it some more.


 Maxscore - Efficient Scoring
 

 Key: LUCENE-4100
 URL: https://issues.apache.org/jira/browse/LUCENE-4100
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs, core/query/scoring, core/search
Affects Versions: 4.0-ALPHA
Reporter: Stefan Pohl
  Labels: api-change, patch, performance
 Fix For: 4.0

 Attachments: contrib_maxscore.tgz, maxscore.patch


 At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient 
 algorithm first published in the IR domain in 1995 by H. Turtle  J. Flood, 
 that I find deserves more attention among Lucene users (and developers).
 I implemented a proof of concept and did some performance measurements with 
 example queries and lucenebench, the package of Mike McCandless, resulting in 
 very significant speedups.
 This ticket is to get started the discussion on including the implementation 
 into Lucene's codebase. Because the technique requires awareness about it 
 from the Lucene user/developer, it seems best to become a contrib/module 
 package so that it consciously can be chosen to be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413080#comment-13413080
 ] 

Yonik Seeley commented on SOLR-3488:


bq. There should not be a gap in serving requests on core reload.

Just to clarify: it's more a practical gap than a real gap... it should be 
impossible for a query to not be serviced - it's just that a cold core could 
take longer to service the query than desired.  But it *should* be pretty easy 
to allow waiting for that searcher in the new core.


 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-07-12 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413081#comment-13413081
 ] 

Erik Hatcher commented on SOLR-1725:


bq. i plan to cmmit  backport to 4x in the next 24 hours.

Hoss - you go!  Thank you for wrangling this one and polishing out the pedantic 
details needed to get it to this state.  Way +1. 

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Erik Hatcher
  Labels: UpdateProcessor
 Fix For: 4.1

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud


[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413086#comment-13413086
 ] 

Mark Miller commented on SOLR-3488:
---

Ah, did not catch it was just a timeout issue. Was wondering what the problem 
could be.

Yeah, not as bad I thought then. An option would be nice.

 Create a Collections API for SolrCloud
 --

 Key: SOLR-3488
 URL: https://issues.apache.org/jira/browse/SOLR-3488
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
 SOLR-3488_2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4218) contrary to documentation Document.get(field) on numeric field returns null

2012-07-12 Thread Jamie (JIRA)

Jamie created LUCENE-4218:
-

 Summary: contrary to documentation Document.get(field) on numeric 
field returns null
 Key: LUCENE-4218
 URL: https://issues.apache.org/jira/browse/LUCENE-4218
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0-ALPHA
 Environment: Darwin e4-ce-8f-0f-c2-b0.dummy.porta.siemens.net 10.8.0 
Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; 
root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
Reporter: Jamie
Priority: Critical


A call to Numeric num = indexableField.numericValue() comes up with a correct 
value, whereas Document.get(field) yields null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

 Thats more sophisticated than what we do with the javadocs
 linksource option in the demo.

We don't even sometimes include full sources of these snippets. They
are demonstrational and we just want to make sure they compile/ run
cleanly (that's why they're typically part of tests, not core
sources).

 But your view source button seems to also do this? So we would just

I believe that's js/css magic but I'm not sure.

 I like this solution... how much code is it... can we have it?

Sure. Staszek put it together, it's really nothing fancy. We only have
a binary in c2 repository but I'm sure we can make it available --
I'll ask Staszek to put it on github. One thing I forgot was that we
generate that overview from an XML/XSL file (using xincludes) and this
is probably an overkill for Lucene. I'd be much faster/ easier to just
html-ize those snippets and include them directly with replacement
patterns (even an ANT copy/filter would do here).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4207) speed up our slowest tests

2012-07-12 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413109#comment-13413109
 ] 

Dawid Weiss commented on LUCENE-4207:
-

It may be the I/O overhead... those tests are generating lots of files, maybe 
with a nearly full disk things slow down a lot (?).

 speed up our slowest tests
 --

 Key: LUCENE-4207
 URL: https://issues.apache.org/jira/browse/LUCENE-4207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir

 Was surprised to hear from Christian that lucene/solr tests take him 40 
 minutes on a modern mac.
 This is too much. Lets look at the slowest tests and make them reasonable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

bq. I rather like Wikipedia's definition:
http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm

I did a similar thing but:

1) based on entire words as individual tokens (instead of letter-by-letter),
2) all words present in input patterns can be encoded as a separate
data structure which maps to an unique integer
3) the matcher is essentially tracking the following:

Match {
  automaton_arc toNextNode;
  final int matchStart;
  int matchLength;
}

you then process the input word-by-word and advance each Match if
there is an arc leaving toNextNode and matching the current word. If
the toNextNode arc is final then you've hit a match and need to record
it (it may not be the longest match so if you only care about the
longest matches then additional processing is required).

You create new Match objects and discard mismatched existing Matches
as you process the input. Essentially, it's as if you tried to walk
down in the automaton starting on every single position in the input.
This may seem costly but in reality the matches are infrequent
compared to the input text and they are rarely very, very long (to
create lots of states). I used the above approach for entity matching
and it worked super-fast.

All this said, an Aho-Corasick transition graph would of course be
more efficient. The question is how much more efficient and how much
code/ work you'll need to put into it to make it work :)

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

bq. You create new Match objects and discard mismatched existing Matches

I didn't say that explicitly but obviously you don't need to create
new objects when you're doing this. The pool of match states can be
only as big as the longest pattern so you can pool them and reuse.
Zero allocation cost.

Dawid

On Thu, Jul 12, 2012 at 9:50 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 bq. I rather like Wikipedia's definition:
 http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm

 I did a similar thing but:

 1) based on entire words as individual tokens (instead of letter-by-letter),
 2) all words present in input patterns can be encoded as a separate
 data structure which maps to an unique integer
 3) the matcher is essentially tracking the following:

 Match {
   automaton_arc toNextNode;
   final int matchStart;
   int matchLength;
 }

 you then process the input word-by-word and advance each Match if
 there is an arc leaving toNextNode and matching the current word. If
 the toNextNode arc is final then you've hit a match and need to record
 it (it may not be the longest match so if you only care about the
 longest matches then additional processing is required).

 You create new Match objects and discard mismatched existing Matches
 as you process the input. Essentially, it's as if you tried to walk
 down in the automaton starting on every single position in the input.
 This may seem costly but in reality the matches are infrequent
 compared to the input text and they are rarely very, very long (to
 create lots of states). I used the above approach for entity matching
 and it worked super-fast.

 All this said, an Aho-Corasick transition graph would of course be
 more efficient. The question is how much more efficient and how much
 code/ work you'll need to put into it to make it work :)

 Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

2012-07-12 Thread Smiley, David W.

Thanks for your explanation, I already had a very rough idea of the approach.
Can Aho-Corasick be implemented with Lucene's FST? Again, the
SynonymFilterFactory said this RE Aho-Corasick:
// This really amounts to adding a .*
// closure to the FST and then determinizing it.

You didn't mention FST once and that's the API I'm having trouble groking.

~ David

On Jul 12, 2012, at 3:51 PM, Dawid Weiss [via Lucene] wrote:

bq. I rather like Wikipedia's definition:
http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithmhttp://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm

I did a similar thing but:

1) based on entire words as individual tokens (instead of letter-by-letter),
2) all words present in input patterns can be encoded as a separate
data structure which maps to an unique integer
3) the matcher is essentially tracking the following:

Match {
automaton_arc toNextNode;
final int matchStart;
int matchLength;
}

you then process the input word-by-word and advance each Match if
there is an arc leaving toNextNode and matching the current word. If
the toNextNode arc is final then you've hit a match and need to record
it (it may not be the longest match so if you only care about the
longest matches then additional processing is required).

You create new Match objects and discard mismatched existing Matches
as you process the input. Essentially, it's as if you tried to walk
down in the automaton starting on every single position in the input.
This may seem costly but in reality the matches are infrequent
compared to the input text and they are rarely very, very long (to
create lots of states). I used the above approach for entity matching
and it worked super-fast.

All this said, an Aho-Corasick transition graph would of course be
more efficient. The question is how much more efficient and how much
code/ work you'll need to put into it to make it work :)

Dawid

Re: SynonymFilter, FST, and Aho-Corasick algorithm

This comment was probably made against Brics library FST
implementation because you can't really add a .* to the algorithm
that builds the FST incrementally is not possible because it accepts
fixed strings (and builds an already determinized automaton).

That's part of the reason my runtime solution was suboptimal -- the
tradeoff is that you can construct the FST very efficiently from
millions of input entries but don't need to manipulate it. Brics will
probably bail out with an OOM if you try to manipulate large graphs.

I'd gladly share my code because it was Lucene based... but I can't --
paid consulting job, sorry. Shouldn't be too hard to rewrite from
scratch though, really.

Dawid

On Thu, Jul 12, 2012 at 10:07 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Thanks for your explanation, I already had a very rough idea of the
 approach.  Can Aho-Corasick be implemented with Lucene's FST?  Again, the
 SynonymFilterFactory said this RE Aho-Corasick:
 // This really amounts to adding a .*
 // closure to the FST and then determinizing it.

 You didn't mention FST once and that's the API I'm having trouble groking.

 ~ David


 On Jul 12, 2012, at 3:51 PM, Dawid Weiss [via Lucene] wrote:

 bq. I rather like Wikipedia's definition:
 http://en.wikipedia.org/wiki/Aho–Corasick_string_matching_algorithm

 I did a similar thing but:

 1) based on entire words as individual tokens (instead of
 letter-by-letter),
 2) all words present in input patterns can be encoded as a separate
 data structure which maps to an unique integer
 3) the matcher is essentially tracking the following:

 Match {
   automaton_arc toNextNode;
   final int matchStart;
   int matchLength;
 }

 you then process the input word-by-word and advance each Match if
 there is an arc leaving toNextNode and matching the current word. If
 the toNextNode arc is final then you've hit a match and need to record
 it (it may not be the longest match so if you only care about the
 longest matches then additional processing is required).

 You create new Match objects and discard mismatched existing Matches
 as you process the input. Essentially, it's as if you tried to walk
 down in the automaton starting on every single position in the input.
 This may seem costly but in reality the matches are infrequent
 compared to the input text and they are rarely very, very long (to
 create lots of states). I used the above approach for entity matching
 and it worked super-fast.

 All this said, an Aho-Corasick transition graph would of course be
 more efficient. The question is how much more efficient and how much
 code/ work you'll need to put into it to make it work :)

 Dawid



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr posting question

2012-07-12 Thread karl.wright

Hi all,

I received a report of a problem with posting data to Solr.  The post method is 
a multi-part form, so if you inspect it, it looks something like this:


boundary---
Content-Disposition: form-data; name=metadata_attribute_name
Content-Type: text; charset=utf-8

abc;def;ghi
---boundary---


The problem is that, for form data, multiple values for an attribute are 
supposed to just be repeated form elements, e.g.:


boundary---
Content-Disposition: form-data; name=metadata_attribute_name
Content-Type: text; charset=utf-8

abc;def;ghi
---boundary---
Content-Disposition: form-data; name=metadata_attribute_name
Content-Type: text; charset=utf-8

second value
---boundary---



What's happening, though, when this is posted to Solr is that any semicolons in 
the data are being interpreted as multi-value separators.  So when the above is 
posted, Solr apparently thinks that metadata_attribute_name has 4 values, 
abc, def, ghi, and second value, rather than two values, abc;def;ghi 
and second value.

Is this intended behavior, and if so, how am I supposed to escape ; 
characters when communicating to Solr in this way?

Karl

[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl


 [ 
https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4211:


Attachment: LUCENE-4211.patch

Updated patch with docsEnum and docsAndPositionsEnums asserts/state machines.

Also added a AssertingDirectoryReader which ensures subreaders are wrapped with 
AssertingAtomicReaders.

Also wraps termvectors with AssertingFields.

TODO:
# add state machine/asserts to TermsEnum
# add an AssertingCodec that does these checks always, put it in rotation so 
that any code (e.g. solr) not necessarily using newSearcher() from 
LuceneTestCase still gets these checks.

There is a problem with a function query and fieldcache insanity, i dont 
understand this FCInvisibleReader etc goign on here:
{noformat}
ant test  -Dtestcase=TestOrdValues -Dtests.method=testReverseOrdFieldRank 
-Dtests.seed=E54A53902AE23DE0 -Dtests.slow=true -Dtests.locale=fr_CH 
-Dtests.timezone=America/Argentina/Cordoba -Dtests.file.encoding=UTF-8
{noformat}

Its possible this is unrelated to the patch...

 in LuceneTestCase.maybeWrapReader: add an asserting impl
 

 Key: LUCENE-4211
 URL: https://issues.apache.org/jira/browse/LUCENE-4211
 Project: Lucene - Java
  Issue Type: Task
  Components: general/test
Reporter: Robert Muir
 Attachments: LUCENE-4211.patch, LUCENE-4211.patch


 It would be nice to wrap with FIR here sometimes,
 one that returns AssertingFields, etc etc.
 This way we could check if consumers are doing bogus things (like reading 
 nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its 
 exhausted, or things like that).
 This would also be nice to catch tests that do this rather than doing
 crazy debugging over whats not really a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-07-12 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1725.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0

Committed revision 1360931. trunk
Committed revision 1360952. 4x


 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Erik Hatcher
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

2012-07-12 Thread Stefan Pohl (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413246#comment-13413246
]

Stefan Pohl commented on LUCENE-4100:
-

Thanks for the feedback! You're on spot with everything you're saying.

Yes, the methods as suggested in the different papers have (semi-)static
indexes in mind, that is, such that batch-index many new documents, then
recompute maxscores (hence, IndexRewriter) and roll out the new version of the
indexes. This is a Lucene use-case common to many large installations (or part
thereof) and as such important. Moreover, this approach can easily be
generalized to the other Similarities, without that they necessarily have to
know about maxscore, and can be simplified by some minor API changes within
Lucene. The PoC code as-is might be of help to showcase dependencies in
general, and such that currently are not well supported within Lucene (because
there was no need for it yet).

If you really want to go the full distance: I already thought about doing
maxscore live and got some ideas in this regard, see below.

Comments to your thoughts:

[PostingsWriter]
You're right. For simplicity, I was computing each term's overall contribution
(as explained in the talk), including all but query-dependent factors. You can
consider this as un-quantized impacts (in the sense of Anh et al.) which
necessitates a second pass over a static index, hence IndexRewriter.

As a side note: I noticed a drop in the PKLookup benchmark, suggesting that it
might be better not to extend the size of dictionary items, but to store
maxscores in the beginning of inverted lists, or next to skip data. This effect
should be smaller or disappear though when maxscores are not stored for many
terms.

[Length normalization]
Yes, this might be a necessary dependency. It should be a general
design-principle though to have as many as possible statistics at hand
everywhere, as long as it doesn't hurt performance in terms of efficiency.

[splitting impacts / incremental indexing]
Yes, this would be more intrusive, requiring Similarity-dependent maxscore
computations. Here is how it could work:
Very exotic scoring functions simply don't have to support maxscore and will
thus fall back to the current score-all behaviour.
DefaultSimilarity is simple, but BM25 and LMDirichlet can't as easily be
factored out, as you correctly point out, but we could come up with bounds for
collection statistics (those that go into the score) within which it is safe to
use maxscore, otherwise we fallback to score-all until a merge occurs, or we
notify the user to better do a merge/optimize, or Lucene does a segment-rewrite
with new maxscore and bound computations on basis of more current collection
stats. I got first ideas for an algorithm to compute these bounds.

[docfreq=1 treatment]
Definitely agree. Possibly, terms with docfreq x=10? could not store a
maxscore. x configurable and default to be evaluated; x should be stored in
index so that it can be determined which terms don't contain maxscores.
Having a special treatment for these terms (not considering them for exclusion
within the algorithm) allows for easier exchange of the core of the algorithm
to get the WAND algorithm, or also to ignore a maxscore for a term for which
collection stats went out of bounds.

[maxscores per posting ranges]
+1. As indicated in the description, having multiple maxscores per term can be
more efficient, possibly leading to tighter bounds and more skipping.
Chakrabarti'11 opted for one extreme, computing a maxscore for each compressed
posting block, whereas the optimal choice might have been a multiple of blocks,
or a postings range not well aligned with block size.
Optimal choice will be very dependent on skip list implementation and its
parameters, but also posting de-compression overhead.
The question is how to get access to this codec-dependent information inside of
the scoring algorithm, tunneled through the TermQuery?

[store 4 bytes per maxscore]
Possible. As long as the next higher representable real number is stored (ceil,
not floor), no docs will be missed and the algorithm remains correct. But
because of more loose bounds the efficiency gain will be affected at some point
with too few bits.

If the score is anyway factored out, it might be better to simply store all
document-dependent stats (TF, doclen) of the document with the maximum score
contribution (as ints) instead of one aggregate intermediate float score
contribution.

[implementation inside codec]
Please be aware that while terms are at some point excluded from merging, they
still are advanced to the docs in other lists to gain complete document
knowledge and compute exact scores. Maxscores can also be used to minimize how
often this happens, but the gains are often compensated by the more complex
scoring. Still having to skip

[jira] [Updated] (LUCENE-4217) Load clover.jar from ivy-cachepath andy ship sources with License

[
https://issues.apache.org/jira/browse/LUCENE-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-4217:
--

Attachment: LUCENE-4217.patch

Small patch improvements and correct mail extract in README.

Load clover.jar from ivy-cachepath andy ship sources with License
-

Attachments: LUCENE-4217.patch, LUCENE-4217.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl


 [ 
https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4211:


Attachment: LUCENE-4211.patch

OK i got past the insanity issue: thankfully Uwe already had a reusable hack in 
place in LuceneTestCase.

But then I added AssertingCodec/AssertingPostingsFormat that use these checks, 
and started running 'ant test -Dtests.codec=Asserting' and there are some bugs 
in tests I think

 in LuceneTestCase.maybeWrapReader: add an asserting impl
 

 Key: LUCENE-4211
 URL: https://issues.apache.org/jira/browse/LUCENE-4211
 Project: Lucene - Java
  Issue Type: Task
  Components: general/test
Reporter: Robert Muir
 Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch


 It would be nice to wrap with FIR here sometimes,
 one that returns AssertingFields, etc etc.
 This way we could check if consumers are doing bogus things (like reading 
 nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its 
 exhausted, or things like that).
 This would also be nice to catch tests that do this rather than doing
 crazy debugging over whats not really a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl


[ 
https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413266#comment-13413266
 ] 

Robert Muir commented on LUCENE-4211:
-

{noformat}
[junit4:junit4] Suite: org.apache.lucene.index.TestDocsAndPositions
[junit4:junit4] FAILURE 0.04s J3 | TestDocsAndPositions.testRandomPositions
[junit4:junit4] Throwable #1: java.lang.AssertionError: nextDoc() called 
after iterator is exhausted!
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([35D7F00507FD5A9D:4BF3893054577754]:0)
[junit4:junit4]at 
org.apache.lucene.index.AssertingAtomicReader$AssertingDocsAndPositionsEnum.nextDoc(AssertingAtomicReader.java:207)
[junit4:junit4]at 
org.apache.lucene.index.TestDocsAndPositions.testRandomPositions(TestDocsAndPositions.java:178)
[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[junit4:junit4]at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit4:junit4]at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit4:junit4]at java.lang.reflect.Method.invoke(Method.java:597)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
[junit4:junit4]at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
[junit4:junit4]at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
[junit4:junit4]at

[jira] [Commented] (SOLR-3598) Provide option to allow aliased field to be included in query for EDisMax QParser

2012-07-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413268#comment-13413268
 ] 

Jan Høydahl commented on SOLR-3598:
---

Yes, {{f.fieldname.qf}} will wire up fieldname as a valid pseudo field to 
be queried, even if it does not exist in your index schema. Can you test it and 
report back if it solved your use case?

 Provide option to allow aliased field to be included in query for EDisMax 
 QParser
 -

 Key: SOLR-3598
 URL: https://issues.apache.org/jira/browse/SOLR-3598
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Affects Versions: 3.6, 4.0-ALPHA
Reporter: Jamie Johnson
Priority: Minor
 Attachments: alias.patch


 I currently have a situation where I'd like the original field included in 
 the query, for instance I have several fields with differing granularity, 
 name, firstname and lastname.  Some of my sources differentiate between these 
 so I can fill out firstname and lastname, while others don't and I need to 
 just place this information in the name field.  When querying I'd like to be 
 able to say name:Jamie and have it translated to name:Jamie first_name:Jamie 
 last_name:Jamie.  In order to do this it creates an alias cycle and the 
 EDisMax Query parser throws an exception about it.  Ideally there would be an 
 option to include the original field as part of the query to support this use 
 case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl


[ 
https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413280#comment-13413280
 ] 

Robert Muir commented on LUCENE-4211:
-

Mike committed fixes already for both these seeds... thanks Mike!

Continuing testing...

 in LuceneTestCase.maybeWrapReader: add an asserting impl
 

 Key: LUCENE-4211
 URL: https://issues.apache.org/jira/browse/LUCENE-4211
 Project: Lucene - Java
  Issue Type: Task
  Components: general/test
Reporter: Robert Muir
 Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch


 It would be nice to wrap with FIR here sometimes,
 one that returns AssertingFields, etc etc.
 This way we could check if consumers are doing bogus things (like reading 
 nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its 
 exhausted, or things like that).
 This would also be nice to catch tests that do this rather than doing
 crazy debugging over whats not really a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SynonymFilter, FST, and Aho-Corasick algorithm

2012-07-12 Thread Michael McCandless

Some responses below:

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 12, 2012 at 11:08 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Hello.
   I'm embarking on developing code similar to the SynonymFilter but which 
 merely needs to record out of band to the analysis where there is matching 
 text in the input tokens to the corpus in the FST.  I'm calling this a 
 keyword tagger in which I shove text through it and when it's done it tells 
 me at what offsets there is a match to a corpus of keyword  phrases, and to 
 what keywords/phrases they were exactly.  It doesn't have to inject or modify 
 the token stream because the results of this are going elsewhere.  Although, 
 it would be a fine approach to only omit the tags as I call them as a way 
 of consuming the results, but I'm not indexing them so it doesn't matter.

   I noticed the following TODOs at the start:

 // TODO: maybe we should resolve token - wordID then run
 // FST on wordIDs, for better perf?

 I intend on doing this since my matching keyword/phrases are often more than 
 one word, and I expect this will save memory and be faster.

Be sure to test this is really faster: you'll need to add a step to
resolve word - id (eg via hashmap) which may net/net add cost because
the FST can incrementally (quickly) determine a word doesn't exist
with a given prefix.  FST can also do better sharing (less RAM) of
shared prefixes/suffixes.

 // TODO: a more efficient approach would be Aho/Corasick's
 // algorithm
 // http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
 // It improves over the current approach here
 // because it does not fully re-start matching at every
 // token.  For example if one pattern is a b c x
 // and another is b c d and the input is a b c d, on
 // trying to parse a b c x but failing when you got to x,
 // rather than starting over again your really should
 // immediately recognize that b c d matches at the next
 // input.  I suspect this won't matter that much in
 // practice, but it's possible on some set of synonyms it
 // will.  We'd have to modify Aho/Corasick to enforce our
 // conflict resolving (eg greedy matching) because that algo
 // finds all matches.  This really amounts to adding a .*
 // closure to the FST and then determinizing it.

 Could someone please clarify how the problem in the example above is to be 
 fixed?  At the end it states how to solve it, but I don't know how to do that 
 and I'm not sure if there is anything more to it since after all if it's as 
 easy as that last sentence sounds then it would have been done already ;-)

The FSTs we create are not malleable so implementing what that crazy
comment says would not be easy.

However, there is a cool paper that Robert found:

http://www.cis.uni-muenchen.de/people/Schulz/Pub/dictle5.ps

That I think does not require heavily modifying the minimal FST (just
augmenting it w/ additional arcs that you follow on failure to match).
 I think it's basically Aho Corasick, done as an FST (which eg you can
then compose with other FSTs to compile a chain of replacements into a
single FST ... at least this was my quick understanding).

Still, I would first try the obvious approach (use FST the way
SynFilter does) and see if it's fast enough.  I think Aho Corasick
only really matters if your patterns have high overlap after shifting
(eg b and ab).

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4211) in LuceneTestCase.maybeWrapReader: add an asserting impl

2012-07-12 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4211:
---

Attachment: LUCENE-4211.patch

Patch, just adding checks to AssertingTermsEnum.  Tests pass with it ...

 in LuceneTestCase.maybeWrapReader: add an asserting impl
 

 Key: LUCENE-4211
 URL: https://issues.apache.org/jira/browse/LUCENE-4211
 Project: Lucene - Java
  Issue Type: Task
  Components: general/test
Reporter: Robert Muir
 Attachments: LUCENE-4211.patch, LUCENE-4211.patch, LUCENE-4211.patch, 
 LUCENE-4211.patch


 It would be nice to wrap with FIR here sometimes,
 one that returns AssertingFields, etc etc.
 This way we could check if consumers are doing bogus things (like reading 
 nextDoc after it returned NO_MORE_DOCS, or TermsEnum.next after its 
 exhausted, or things like that).
 This would also be nice to catch tests that do this rather than doing
 crazy debugging over whats not really a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS

2012-07-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413310#comment-13413310
 ] 

Jan Høydahl commented on SOLR-3613:
---

bq. I also don't think we should force solr. for all the system properties. 
If someone ads the ability to optionally check for the webapp prefix, then I 
think we should still be free to use zkHost, collection.*, etc, in the 
examples/doc.

Why not? It is consistent, short and concise. I was first thinking that the 
solr. prefix is better had as a convention rather than code? But say we do as 
you propose and add prefix logic so that given ${myProp:foo}, we'll look for:
# {{solr.myProp}}
# else look for {{myProp}}

In this case we would need to change all literal {{solr.*}} props in all xml 
config files. I see two drawbacks with this approach; one is that the examples 
then promote the use of short form while we'd like to encourage use of 
namespaced form and the other is that if webapp XYZ sets {{myProp}}, and we 
have not explicitly set {{solr.myProp}} then Solr will pick up a faulty value 
for it. This last could very well happen for generic opts like the ${host:} 
currently defined in solr.xml.

So I still think it is better to require a {{solr.}} prefix for all sys props 
and leave in the {{solr.}} prefix in config files as today.

Another problematic one from solr.xml is this: hostPort=${jetty.port:}. It 
assumes Jetty as Java Application Server, and it  feels awkward to say 
{{-Djetty.port=8080}} to tell SolrCloud that Tomcat is running on port 8080. 
Imagine an ops guy reading the Solr bootstrap script, scratching his head. If 
all we do is read the value and add +1000 to pick the port for our internal ZK, 
why not be explicit instead and have a {{solr.localZkPort}} prop? (No API to 
get the web containers port? In that case we could support relative values and 
default to value of +1000 which would behave as today, but less to specify on 
cmdLine).

While in picky mode :-) I'd prefer {{zkRun}} to be {{solr.localZkRun}} to 
distinguish that this starts a *local* Zk as opposed to the remote one in 
{{zkHost}}. Also, the prop {{zkHost}} is misleading, in that it takes a list of 
host:port; perhaps {{solr.zkServers}} is more clear?

{quote}
bq. a thin HTTP layer around Lucene
I've certainly never thought of Solr as that
{quote}
Well, not a pure HTTP layer, but still thin in as in the sense that Lucene does 
as much of the core features as possible

 Namespace Solr's JAVA OPTIONS
 -

 Key: SOLR-3613
 URL: https://issues.apache.org/jira/browse/SOLR-3613
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA
Reporter: Jan Høydahl
 Fix For: 4.0


 Solr being a web-app, should play nicely in a setting where users deploy it 
 on a shared appServer.
 To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid 
 name clashes and for clarity when reading your appserver startup script. We 
 currently do that with most: {{solr.solr.home, solr.data.dir, 
 solr.abortOnConfigurationError, solr.directoryFactory, 
 solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we 
 fail to do so.
 Before release of 4.0 we should make sure to clean this up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-07-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413312#comment-13413312
 ] 

Michael McCandless commented on LUCENE-3892:


Thanks Billy, I'll commit!

One thing I noticed: I think we shouldn't separately read numBytes and the int 
header?  Can't we do a single readVInt(), and that encodes numBytes as well as 
format (bit width and format, once we tie into oal.util.packed APIs)?  Also, we 
shouldn't encode numInts at all, ie, this should be fixed for the whole 
segment, and not written per block.

 Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
 Simple9/16/64, etc.)
 -

 Key: LUCENE-3892
 URL: https://issues.apache.org/jira/browse/LUCENE-3892
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 4.1

 Attachments: LUCENE-3892-BlockTermScorer.patch, 
 LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, 
 LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, 
 LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, 
 LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, 
 LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, 
 LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, 
 LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
 LUCENE-3892_settings.patch, LUCENE-3892_settings.patch


 On the flex branch we explored a number of possible intblock
 encodings, but for whatever reason never brought them to completion.
 There are still a number of issues opened with patches in different
 states.
 Initial results (based on prototype) were excellent (see
 http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
 ).
 I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-07-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413314#comment-13413314
 ] 

Michael McCandless commented on LUCENE-3892:


I didn't commit 
lucene/core/src/java/org/apache/lucene/codecs/pfor/ForPostingsFormat.java -- 
your IDE had changed it to a wildcard import (I prefer we stick with individual 
imports).

Was the numBits==0 case for all 0s not all 1s?  We may want to have it mean all 
1s instead?

 Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
 Simple9/16/64, etc.)
 -

 Key: LUCENE-3892
 URL: https://issues.apache.org/jira/browse/LUCENE-3892
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Fix For: 4.1

 Attachments: LUCENE-3892-BlockTermScorer.patch, 
 LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, 
 LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, 
 LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, 
 LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, 
 LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, 
 LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, 
 LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
 LUCENE-3892_settings.patch, LUCENE-3892_settings.patch


 On the flex branch we explored a number of possible intblock
 encodings, but for whatever reason never brought them to completion.
 There are still a number of issues opened with patches in different
 states.
 Initial results (based on prototype) were excellent (see
 http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
 ).
 I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS

[
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413316#comment-13413316
]

Mark Miller commented on SOLR-3613:
---

bq. Another problematic one from solr.xml is this: hostPort=${jetty.port:}.
It assumes Jetty as Java Application Server,

Yup - another way that we have made the user experience better by assuming
jetty. This is exactly what I meant - this keeps you from having to specify the
port twice on the cmd line - silly when you should just be using jetty and we
know the port.

I have been optimizing for jetty for a while now.

bq. it feels awkward to say -Djetty.port=8080 to tell SolrCloud that Tomcat is
running on port 8080

They are free to change it - it's in solr.xml. I'd rather have our default
system not be awkward than worry about Tomcat being awkward. This is exactly
what I've been talking about. For too long we have been awkward for every thing
rather than good for one thing.

Namespace Solr's JAVA OPTIONS
-

Key: SOLR-3613
URL: https://issues.apache.org/jira/browse/SOLR-3613
Project: Solr
Issue Type: Improvement
Affects Versions: 4.0-ALPHA
Reporter: Jan Høydahl
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3613) Namespace Solr's JAVA OPTIONS

[
https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413323#comment-13413323
]

Mark Miller commented on SOLR-3613:
---

bq. I'd prefer {{zkRun}} to be {{solr.localZkRun}} to distinguish that this
starts a *local* Zk as opposed to the remote one in {{zkHost}}.

-0 - I like zkRun as it's short and sweet - your are running zk, or you are not
and connecting to an external zk. I wouldn't fight very hard though. Yonik
named it, I'll defer to you guys.

bq. Also, the prop {{zkHost}} is misleading, in that it takes a list of
host:port; perhaps {{solr.zkServers}} is more clear?

The zk guys call it a connectString. I like zkHost because it's short, works
fine with a single host url, and easily docable about using more, but again,
not something I'm going to fight hard for.

Personally, I liked the brevity of something like java -DzkRun -DzkHost
start.jar and how that works for examples as compared to what we are getting to
now: java -Dsolr.zkServers -Dsolr.localRunZk start.jar.

Just starts to get dense fast.

I also think doc is perfectly sufficient on top of the current names.

Namespace Solr's JAVA OPTIONS
-

Key: SOLR-3613
URL: https://issues.apache.org/jira/browse/SOLR-3613
Project: Solr
Issue Type: Improvement
Affects Versions: 4.0-ALPHA
Reporter: Jan Høydahl
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr posting question

2012-07-12 Thread Ryan McKinley

what request handler are you using?  csv?

If you point to the /admin/dump handler, what do you get?

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/DumpRequestHandler.java

If there is a problem with how this gets though, we will need to fix
something in:
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers#MultipartRequestParser

ryan





On Thu, Jul 12, 2012 at 1:17 PM,  karl.wri...@nokia.com wrote:
 Hi all,



 I received a report of a problem with posting data to Solr.  The post method
 is a multi-part form, so if you inspect it, it looks something like this:





 boundary---

 Content-Disposition: form-data; name=metadata_attribute_name

 Content-Type: text; charset=utf-8



 abc;def;ghi

 ---boundary---

 



 The problem is that, for form data, multiple values for an attribute are
 supposed to just be repeated form elements, e.g.:





 boundary---

 Content-Disposition: form-data; name=metadata_attribute_name

 Content-Type: text; charset=utf-8



 abc;def;ghi

 ---boundary---

 Content-Disposition: form-data; name=metadata_attribute_name

 Content-Type: text; charset=utf-8



 second value

 ---boundary---



 



 What’s happening, though, when this is posted to Solr is that any semicolons
 in the data are being interpreted as multi-value separators.  So when the
 above is posted, Solr apparently thinks that “metadata_attribute_name” has 4
 values, “abc”, “def”, “ghi”, and “second value”, rather than two values,
 “abc;def;ghi” and “second value”.



 Is this intended behavior, and if so, how am I supposed to escape “;”
 characters when communicating to Solr in this way?



 Karl





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

[
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413335#comment-13413335
]

Robert Muir commented on LUCENE-4100:
-

{quote}
As a side note: I noticed a drop in the PKLookup benchmark, suggesting that it
might be better not to extend the size of dictionary items, but to store
maxscores in the beginning of inverted lists, or next to skip data. This effect
should be smaller or disappear though when maxscores are not stored for many
terms.
{quote}

I wouldn't worry about this, I noticed a few things that might speed that up:

1. Currently it does writeVInt(Float.floatToIntBits(term.maxscoreScore)) . But
I think this should be writeInt, not writeVInt?
So I think currently we often write 5 bytes here, with all the vint checks for
each byte, and as an Int it would always be 4 and faster.
2. Yes, with low freq terms (e.g. docFreq skipMinimum), its probably best to
just omit this at both read and write time. Then PK lookup would be fine.
3. As far as 4 bytes ceiling, my motivation there was not to save in the term
dictionary, but instead to make these smaller and allow us to add these at
regular intervals. We can take advantage of a few things, e.g. it should never
be a negative number for a
well-formed Similarity (i think that would screw up the algorithm looking at
your tests anyway).

{quote}
DefaultSimilarity is simple, but BM25 and LMDirichlet can't as easily be
factored out, as you correctly point out, but we could come up with bounds for
collection statistics (those that go into the score) within which it is safe to
use maxscore, otherwise we fallback to score-all until a merge occurs, or we
notify the user to better do a merge/optimize, or Lucene does a segment-rewrite
with new maxscore and bound computations on basis of more current collection
stats. I got first ideas for an algorithm to compute these bounds.
{quote}

Ok, I'm not sure I totally see how the bounds computation can work, but if it
can we might be ok in general. If the different segments are somewhat
homogeneous then these stats should pretty much be very close anyway.

The other idea i had was more intrusive, adding a computeImpact() etc to
Similarity or whatever.

{quote}
If the score is anyway factored out, it might be better to simply store all
document-dependent stats (TF, doclen) of the document with the maximum score
contribution (as ints) instead of one aggregate intermediate float score
contribution.
{quote}

That might be a good idea. with TF as a vint and doclen as a byte, we would
typically only have two bytes but not actually lose any information (by
default, all these sims encode doclen as a byte anyway).

{quote}
[implementation inside codec]
Please be aware that while terms are at some point excluded from merging, they
still are advanced to the docs in other lists to gain complete document
knowledge and compute exact scores. Maxscores can also be used to minimize how
often this happens, but the gains are often compensated by the more complex
scoring. Still having to skip inside of excluded terms complicates your
suggested implementation. But we definitely should consider architecture
alternatives. The MaxscoreCollector, for instance, does currently only have a
user interface function, keeping track of the top-k and their entry threshold
could well be done inside the Maxscorer.
I was thinking though to extend the MaxscoreCollector to provide different
scoring information, e.g. an approximation of the number of hits next to the
actual number of scored documents (currently totalHits).
{quote}

My current line of thinking is even crazier, but I don't yet have anything
close to a plan.

As a start, I do think that IndexSearcher.search() methods should take a Score
Mode of sorts from the user (some enum), which would allow Lucene to do less
work if its not necessary. We would pass this down via Weight.scorer() as a
parameter... solely
looking at the search side I think this would open up opportunities in general
for us to optimize things: e.g. instantiate the appropriate Collector impl, and
for Weights to create the most optimal Scorers. Not yet sure how it would tie
into the code API.

I started hacking up on a prototype that looks like this (I might have tried to
refactor too hard also shoving the Sort options in here...)
{noformat}
/**
* Different modes of search.
*/
public enum ScoreMode {
/**
* No guarantees that the ranking is correct,
* the results may come back in a different order than if all
* documents were actually scored. Total hit count may be
* unavailable or approximate.
*/
APPROXIMATE,
/**
* Ranking is the same as {@link COMPLETE}, but total hit
* count may be unavailable or approximate.
*/
SAFE,
/**
* Guarantees complete iteration over all documents, but scores
* may be unavailable.
*/

[jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud