[jira] [Updated] (LUCENE-5759) Add PackedInts.unsignedBitsRequired

2014-06-15 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5759:
-

Attachment: LUCENE-5759.patch

I don't think it's confusing: you first need to compute how many bits you 
require and then to bump it to the next value that is supported by the 
{{DirectWriter}} API? But I agree your idea makes the API a bit easier to use 
since there is a single method to call instead of 2, here is an updated patch.

> Add PackedInts.unsignedBitsRequired
> ---
>
> Key: LUCENE-5759
> URL: https://issues.apache.org/jira/browse/LUCENE-5759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5759.patch, LUCENE-5759.patch, LUCENE-5759.patch
>
>
> Across the code base, we have lots of:
> {code}
> long minValue, maxValue;
> final long delta = maxValue - minValue;
> final int bitsRequired = delta < 0 64 : Packedints.bitsRequired(delta);
> {code}
> {{Packedints.bitsRequired(delta)}} doesn't work directly in that case since 
> it expects a positive value. And that is important that it does so in order 
> to get an error instead of silently being super wasteful if a negative value 
> is provided.
> Yet in some cases such as the one depicted above, the value should be 
> interpreted as an unsigned long. So I propose to add another {{bitsRequired}} 
> method that would interpret the value as unsigned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6169) Really really remove ALIAS command

2014-06-15 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032137#comment-14032137
 ] 

Shalin Shekhar Mangar commented on SOLR-6169:
-

bq. In trunk we can just delete the method (it's not actually called from 
anywhere). In 4.x removing it would break binary compatibility for any plugin 
classes that extend CoreAdminHandler, so I propose to make the method just 
throw UnsupportedOperationException

+1

> Really really remove ALIAS command
> --
>
> Key: SOLR-6169
> URL: https://issues.apache.org/jira/browse/SOLR-6169
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.8.1, 5.0
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.9
>
> Attachments: SOLR-6169.patch
>
>
> The core admin ALIAS command was deprecated by SOLR-1637, in 2009.  The 
> method is, however, still there, marked as deprecated, five years later.  It 
> can probably be removed now...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_20-ea-b15) - Build # 4119 - Failure!

2014-06-15 Thread david.w.smi...@gmail.com
I’m on it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 15, 2014 at 10:30 PM, Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4119/
> Java: 32bit/jdk1.8.0_20-ea-b15 -client -XX:+UseParallelGC
>
> 1 tests failed.
> FAILED:
>  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects {#9
> seed=[9A471D2338218380:7D5C3DAFC19B7D24]}
>
> Error Message:
> Should have matched I#1:[-1526755-03-07T23:18:19.371 TO
> -1526755-04-01T00:22] Q:[-1526755-04 TO -1526755-04-01T02:41:51.480]
>
> Stack Trace:
> java.lang.AssertionError: Should have matched
> I#1:[-1526755-03-07T23:18:19.371 TO -1526755-04-01T00:22] Q:[-1526755-04 TO
> -1526755-04-01T02:41:51.480]
> at
> __randomizedtesting.SeedInfo.seed([9A471D2338218380:7D5C3DAFC19B7D24]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.fail(BaseNonFuzzySpatialOpStrategyTest.java:128)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperation(BaseNonFuzzySpatialOpStrategyTest.java:122)
> at
> org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperationRandomShapes(BaseNonFuzzySpatialOpStrategyTest.java:64)
> at
> org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects(DateNRStrategyTest.java:53)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at
> org.apache.lucene.

[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_20-ea-b15) - Build # 4119 - Failure!

2014-06-15 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4119/
Java: 32bit/jdk1.8.0_20-ea-b15 -client -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects {#9 
seed=[9A471D2338218380:7D5C3DAFC19B7D24]}

Error Message:
Should have matched I#1:[-1526755-03-07T23:18:19.371 TO -1526755-04-01T00:22] 
Q:[-1526755-04 TO -1526755-04-01T02:41:51.480]

Stack Trace:
java.lang.AssertionError: Should have matched I#1:[-1526755-03-07T23:18:19.371 
TO -1526755-04-01T00:22] Q:[-1526755-04 TO -1526755-04-01T02:41:51.480]
at 
__randomizedtesting.SeedInfo.seed([9A471D2338218380:7D5C3DAFC19B7D24]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.fail(BaseNonFuzzySpatialOpStrategyTest.java:128)
at 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperation(BaseNonFuzzySpatialOpStrategyTest.java:122)
at 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.testOperationRandomShapes(BaseNonFuzzySpatialOpStrategyTest.java:64)
at 
org.apache.lucene.spatial.prefix.DateNRStrategyTest.testIntersects(DateNRStrategyTest.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakCo

[jira] [Commented] (SOLR-3585) processing updates in multiple threads

2014-06-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032023#comment-14032023
 ] 

David Smiley commented on SOLR-3585:


That's an excellent point.  In fact, anyone using ConcurrentUpdateSolrServer 
(CUSS) doesn't, in effect, get the benefit of the updateLog either.

I think Solr should try to retain the same semantics one gets without using 
CUSS:  Once you close out the HTTP message you send to Solr (which on one 
extreme might be one document or an another a virtually endless stream of 
documents), that a successful HTTP response semantically means whatever you did 
is "safe" -- in the updateLog at least.  If there is no updateLog then there is 
no guarantee (there never was before this proposal either) and it'll return as 
soon as it gets into the indexed ramBuffer (beyond text analysis).  At least 
then if there's a schema related problem with Solr accepting the document then 
you'll know.  It would be nice if the response could include any errors on a 
per-document basis (by id); but that'd be a bonus.

I doubt you'd be able to easily re-use the CUSS logic in this implementation.  
I wish I had time to partake -- sounds like fun :-)

> processing updates in multiple threads
> --
>
> Key: SOLR-3585
> URL: https://issues.apache.org/jira/browse/SOLR-3585
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0-ALPHA, 5.0
>Reporter: Mikhail Khludnev
> Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch, 
> report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which 
> concurrently process the stream of commands. It may be beneficial for users 
> who streams many docs through single request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6169) Really really remove ALIAS command

2014-06-15 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-6169:


Attachment: SOLR-6169.patch

Trivial patch for trunk

> Really really remove ALIAS command
> --
>
> Key: SOLR-6169
> URL: https://issues.apache.org/jira/browse/SOLR-6169
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.8.1, 5.0
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.9
>
> Attachments: SOLR-6169.patch
>
>
> The core admin ALIAS command was deprecated by SOLR-1637, in 2009.  The 
> method is, however, still there, marked as deprecated, five years later.  It 
> can probably be removed now...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6169) Really really remove ALIAS command

2014-06-15 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032020#comment-14032020
 ] 

Alan Woodward commented on SOLR-6169:
-

In trunk we can just delete the method (it's not actually called from 
anywhere).  In 4.x removing it would break binary compatibility for any plugin 
classes that extend CoreAdminHandler, so I propose to make the method just 
throw UnsupportedOperationException.  As far as I can tell the code that's 
actually there is untested and broken anyway, so nobody should be using it.

> Really really remove ALIAS command
> --
>
> Key: SOLR-6169
> URL: https://issues.apache.org/jira/browse/SOLR-6169
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.8.1, 5.0
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.9
>
>
> The core admin ALIAS command was deprecated by SOLR-1637, in 2009.  The 
> method is, however, still there, marked as deprecated, five years later.  It 
> can probably be removed now...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6169) Really really remove ALIAS command

2014-06-15 Thread Alan Woodward (JIRA)
Alan Woodward created SOLR-6169:
---

 Summary: Really really remove ALIAS command
 Key: SOLR-6169
 URL: https://issues.apache.org/jira/browse/SOLR-6169
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.8.1, 5.0
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Fix For: 4.9


The core admin ALIAS command was deprecated by SOLR-1637, in 2009.  The method 
is, however, still there, marked as deprecated, five years later.  It can 
probably be removed now...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5762) Disable old codecs as much as possible

2014-06-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5762:


Attachment: LUCENE-5762.patch

Attached is a patch: 

Lucene42Norms is made read-only with a read-write version in test-framework.

Lucene45DocValues has a check like this that it does when it goes to write a 
field:
{code}
  void checkCanWrite(FieldInfo field) {
if ((field.getDocValuesType() == DocValuesType.NUMERIC || 
field.getDocValuesType() == DocValuesType.BINARY) && 
field.getDocValuesGen() != -1) {
  // ok
} else {
  throw new UnsupportedOperationException("this codec can only be used for 
reading");
}
  }
{code}

And of course a read-write version in test-framework that just allows anything.

> Disable old codecs as much as possible
> --
>
> Key: LUCENE-5762
> URL: https://issues.apache.org/jira/browse/LUCENE-5762
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5762.patch
>
>
> Currently, because of updatable docvalues, ancient codecs are not really 
> read-only... this is a real problem because we can get confused about 
> backwards compatibility or even introduce bugs.
> Its only necessary to make BINARY and NUMERIC work here, we should throw UOE 
> every other possible place and prevent use of old codecs to the greatest 
> extent possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5763) HTMLStripCharFilter += HTML5

2014-06-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031891#comment-14031891
 ] 

Steve Rowe edited comment on LUCENE-5763 at 6/15/14 2:00 PM:
-

Apparently the HTML5 named character entity set is almost a superset of 
HTML4's, but not quite: {{⟨}} and {{⟩}} expand to different 
characters.  I don't think this blocks switching, just something that needs to 
be documented.  Some background here: 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=14429


was (Author: steve_rowe):
Apparently the HTML5 named character entity set is almost a superset of 
HTML4's, but not quite: {{⟨}} and {{⟩}} expand to different 
characters.  I don't think this blocks switching, just something that needs to 
be documented.  Some background here: 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=14429

> HTMLStripCharFilter += HTML5 
> -
>
> Key: LUCENE-5763
> URL: https://issues.apache.org/jira/browse/LUCENE-5763
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Minor
>
> HTMLStripCharFilter knows some specific things about HTML4 (like named 
> character entities, which are converted to the corresponding characters), but 
> not about HTML5.
> HTML5 has way more named character entities: 2,231 vs 259 by my count.
> There's probably other stuff to do, e.g. there are new tags.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5763) HTMLStripCharFilter += HTML5

2014-06-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031891#comment-14031891
 ] 

Steve Rowe commented on LUCENE-5763:


Apparently the HTML5 named character entity set is almost a superset of 
HTML4's, but not quite: {{⟨}} and {{⟩}} expand to different 
characters.  I don't think this blocks switching, just something that needs to 
be documented.  Some background here: 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=14429

> HTMLStripCharFilter += HTML5 
> -
>
> Key: LUCENE-5763
> URL: https://issues.apache.org/jira/browse/LUCENE-5763
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Minor
>
> HTMLStripCharFilter knows some specific things about HTML4 (like named 
> character entities, which are converted to the corresponding characters), but 
> not about HTML5.
> HTML5 has way more named character entities: 2,231 vs 259 by my count.
> There's probably other stuff to do, e.g. there are new tags.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5763) HTMLStripCharFilter += HTML5

2014-06-15 Thread Steve Rowe (JIRA)
Steve Rowe created LUCENE-5763:
--

 Summary: HTMLStripCharFilter += HTML5 
 Key: LUCENE-5763
 URL: https://issues.apache.org/jira/browse/LUCENE-5763
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Steve Rowe
Priority: Minor


HTMLStripCharFilter knows some specific things about HTML4 (like named 
character entities, which are converted to the corresponding characters), but 
not about HTML5.

HTML5 has way more named character entities: 2,231 vs 259 by my count.

There's probably other stuff to do, e.g. there are new tags.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5755) Explore alternative build systems

2014-06-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031871#comment-14031871
 ] 

Steve Rowe commented on LUCENE-5755:


Over on the ASF Incubator list there is an ongoing discussion about source 
releases and bootstrapping gradle that may be relevant here: 
http://markmail.org/message/l7bqbuiwlmqub2rd

> Explore alternative build systems
> -
>
> Key: LUCENE-5755
> URL: https://issues.apache.org/jira/browse/LUCENE-5755
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> I am dissatisfied with how ANT and submodules currently work in Lucene/ Solr. 
> It's not even the tool's fault; it seems Lucene builds just hit the borders 
> of what it can do, especially in terms of submodule dependencies etc.
> I don't think Maven will help much too, given certain things I'd like to have 
> in the build (for example collect all tests globally for a single execution 
> phase at the end of the build, to support better load-balancing).
> I'd like to explore Gradle as an alternative. This task is a notepad for 
> thoughts and experiments.
> An example of a complex (?) gradle build is javafx, for example.
> http://hg.openjdk.java.net/openjfx/8/master/rt/file/f89b7dc932af/build.gradle



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5761) Remove DiskDocValuesFormat

2014-06-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5761:


Attachment: LUCENE-5761.patch

> Remove DiskDocValuesFormat
> --
>
> Key: LUCENE-5761
> URL: https://issues.apache.org/jira/browse/LUCENE-5761
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5761.patch
>
>
> I see users using this, i think they are unaware of the horrible tradeoffs it 
> makes.
> We don't e.g. have codecs that have the term dictionary entirely on disk or 
> other stupid things in lucene, so we shouldnt be stupid here either. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5762) Disable old codecs as much as possible

2014-06-15 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5762:
---

 Summary: Disable old codecs as much as possible
 Key: LUCENE-5762
 URL: https://issues.apache.org/jira/browse/LUCENE-5762
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Currently, because of updatable docvalues, ancient codecs are not really 
read-only... this is a real problem because we can get confused about backwards 
compatibility or even introduce bugs.

Its only necessary to make BINARY and NUMERIC work here, we should throw UOE 
every other possible place and prevent use of old codecs to the greatest extent 
possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5761) Remove DiskDocValuesFormat

2014-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031865#comment-14031865
 ] 

Michael McCandless commented on LUCENE-5761:


+1

> Remove DiskDocValuesFormat
> --
>
> Key: LUCENE-5761
> URL: https://issues.apache.org/jira/browse/LUCENE-5761
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> I see users using this, i think they are unaware of the horrible tradeoffs it 
> makes.
> We don't e.g. have codecs that have the term dictionary entirely on disk or 
> other stupid things in lucene, so we shouldnt be stupid here either. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5761) Remove DiskDocValuesFormat

2014-06-15 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5761:
---

 Summary: Remove DiskDocValuesFormat
 Key: LUCENE-5761
 URL: https://issues.apache.org/jira/browse/LUCENE-5761
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


I see users using this, i think they are unaware of the horrible tradeoffs it 
makes.

We don't e.g. have codecs that have the term dictionary entirely on disk or 
other stupid things in lucene, so we shouldnt be stupid here either. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5627) Positional joins

2014-06-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031828#comment-14031828
 ] 

Paul Elschot commented on LUCENE-5627:
--

I have started on code for a field schema for the positional joins.
So far this affects only the test code here; it involves replacing a lot of 
constants with references to the schema.

The idea is to post this schema here when it can also provide positional join 
queries to the extended SpanQueryParser.


> Positional joins
> 
>
> Key: LUCENE-5627
> URL: https://issues.apache.org/jira/browse/LUCENE-5627
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Paul Elschot
>Priority: Minor
>
> Prototype of analysis and search for labeled fragments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-06-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031826#comment-14031826
 ] 

Paul Elschot commented on LUCENE-5205:
--

bq.  I'm wondering if I should add FieldMaskingSpanQueries to the SpanOnlyParser

When two fields are indexed to allow a FieldMaskingSpanQuery (see  LUCENE-1494 
for an example) such an addition makes sense:

field1:[ v1 field2:v2]

Here the masking should be from field2 to field1.

There is a scoring issue for FieldMaskingSpanQuery, LUCENE-3723. So far I have 
avoided scoring in the label module...

For querying labeled fragments, a FieldMaskingSpanQuery should be used between 
two fragment fields that share their labeled positions, or when each fragment 
in one field consist of a single token. The first case happens in the label 
module for xml attribute names and attribute values. For the single token 
fragments case there is no special provision in the label module.




> [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
> classic QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Fix For: 4.9
>
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org