[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2017-02-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857800#comment-15857800
 ] 

ASF subversion and git services commented on LUCENE-6914:
-

Commit 4355335153bacdcf3d81052e07164f2158d5e587 in lucene-solr's branch 
refs/heads/branch_5_5 from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4355335 ]

LUCENE-6914: Fix ID in the change log.


> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Fix For: master (7.0), 6.3, 5.5.4
>
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584915#comment-15584915
 ] 

ASF subversion and git services commented on LUCENE-6914:
-

Commit 4498cc727f5a2f2db5ea8683b36a821b9b529ebb in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4498cc7 ]

LUCENE-6914: Fix issue ID in the change log.


> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584914#comment-15584914
 ] 

ASF subversion and git services commented on LUCENE-6914:
-

Commit 6df4b4d7284e649bc92fc49cb92e0b2efd0fdc2c in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6df4b4d ]

LUCENE-6914: Fix issue ID in the change log.


> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-10-17 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583280#comment-15583280
 ] 

Hoss Man commented on LUCENE-6914:
--

go for it [~jpountz] ... like i said before...

bq. Since i'm way out of my depth here i don't intend on commiting ... Anybody 
who understands this and thinks my patch looks good is welcome to run with it 
... no need to wait for me.

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-10-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575706#comment-15575706
 ] 

Adrien Grand commented on LUCENE-6914:
--

[~hossman] Let's commit this patch?

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-07-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386052#comment-15386052
 ] 

Adrien Grand commented on LUCENE-6914:
--

bq. now that we know the problem and can trivially reproduce with randomized 
tests, i'm not sure it's really relevant

I find explicit tests helpful because they are easier to dig into if they fail, 
so obvious problems can be more easily identified/fixed.

I think we should commit this patch, this filter is used in several of our 
language analyzers so this bug might be hitting quite a lot of users?

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2016-06-30 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356618#comment-15356618
 ] 

Adrien Grand commented on LUCENE-6914:
--

+1 to the patch

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2015-12-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035905#comment-15035905
 ] 

Robert Muir commented on LUCENE-6914:
-

looks buggy indeed. Can we also add a simple test (ퟙxퟡxퟠxퟜ -> 1x9x8x4) ?

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2015-12-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035911#comment-15035911
 ] 

Robert Muir commented on LUCENE-6914:
-

or maybe the doubleStruck test is good enough? i guess the only oddity is, it 
actually has whitespace in between the characters, but at a glance its hard to 
tell its not e.g. full-width digits. Maybe we should just replace the spaces 
with x's.

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2015-12-02 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036156#comment-15036156
 ] 

Hoss Man commented on LUCENE-6914:
--

i wrote it thinking it would be helpful to first demonstrate what worked (x, or 
space, or some non candidate char in between digits as a "gap"), but then when 
you replace those "gaps" when empty space it breaks.  now that we know the 
problem and can trivially reproduce with randomized tests, i'm not sure it's 
really relevant -- but you could always just clone those asserts and do a bunch 
of different varieties for the "gap" (single space, x, some wide supplemental 
character that isn't a digit, etc...)

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)

2015-11-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032559#comment-15032559
 ] 

Hoss Man commented on LUCENE-6914:
--

failure produced by attached test patch...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestDecimalDigitFilter -Dtests.method=testDoubleStruck 
-Dtests.seed=3126DECB8CE805E -Dtests.slow=true -Dtests.locale=ga 
-Dtests.timezone=Africa/Juba -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 0.03s | TestDecimalDigitFilter.testDoubleStruck <<<
   [junit4]> Throwable #1: org.junit.ComparisonFailure: term 0 
expected:<1[984]> but was:<1[ퟡ8ퟜ]>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([3126DECB8CE805E:92961DD9D4C68E38]:0)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:186)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:301)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:305)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:309)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:359)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:368)
   [junit4]>at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkOneTerm(BaseTokenStreamTestCase.java:429)
   [junit4]>at 
org.apache.lucene.analysis.core.TestDecimalDigitFilter.testDoubleStruck(TestDecimalDigitFilter.java:74)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
{noformat}

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -
>
> Key: LUCENE-6914
> URL: https://issues.apache.org/jira/browse/LUCENE-6914
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.4
>Reporter: Hoss Man
> Attachments: LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "ퟙퟡퟠퟜ" ("Double Struck" 1984) the filter produces "1ퟡ8ퟜ" (1, 
> double struck 9, 8, double struck 4)  add some non-decimal characters in 
> between the digits (ie: "ퟙxퟡxퟠxퟜ") and you get the expected output 
> ("1x9x8x4").  This doesn't affect all decimal characters though, as evident 
> by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, 
> shrink the string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org