[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415110#comment-16415110
 ] 

Adrien Grand commented on LUCENE-8175:
--

Thanks Robert!

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
> Fix For: trunk, 7.4
>
> Attachments: LUCENE-8175.patch
>
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414805#comment-16414805
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 43796e516932881da7abbc8cc379ec2661020f7e in lucene-solr's branch 
refs/heads/branch_7x from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=43796e5 ]

LUCENE-8175: upgrade icu4j to 61.1 which fixes concurrency issue


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
> Fix For: trunk, 7.4
>
> Attachments: LUCENE-8175.patch
>
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414737#comment-16414737
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 40c8792dbfe70e09ad6b4c1fae2cdcf62da9637e in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=40c8792 ]

LUCENE-8175: upgrade icu4j to 61.1 which fixes concurrency issue


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
> Attachments: LUCENE-8175.patch
>
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414731#comment-16414731
 ] 

Uwe Schindler commented on LUCENE-8175:
---

+1

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
> Attachments: LUCENE-8175.patch
>
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414728#comment-16414728
 ] 

Robert Muir commented on LUCENE-8175:
-

attached is a patch with the upgrade to 61.1

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
> Attachments: LUCENE-8175.patch
>
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414693#comment-16414693
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 9272f758ac2c8d7b127526d5dc5da8faa7aa3f9c in lucene-solr's branch 
refs/heads/branch_7x from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9272f75 ]

LUCENE-8175: move CHANGES entry to next release


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414691#comment-16414691
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit cf1a08ff5fdef084f1666aa402d90b4de268c4b2 in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf1a08f ]

LUCENE-8175: move CHANGES entry to next release


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414689#comment-16414689
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit c0b92e279423dbc6852ca2f9cce681604b44d19b in lucene-solr's branch 
refs/heads/branch_7x from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c0b92e2 ]

LUCENE-8175: un-revert "LUCENE-8125: ICUTokenizer support for emoji/emoji 
sequence tokens""

This was a casualty of war because it relied on new unicode stuff


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414680#comment-16414680
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 23bff7dbc207083af2ccb1b308c121ac18c36508 in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23bff7d ]

LUCENE-8175: un-revert "LUCENE-8125: ICUTokenizer support for emoji/emoji 
sequence tokens""

This was a casualty of war because it relied on new unicode stuff


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414674#comment-16414674
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit bdfe1e69e68ed584ea00fa22dbc4744fcb2451ac in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bdfe1e6 ]

LUCENE-8175: un-revert "LUCENE-8122: Updata autogenerated code after update to 
ICU4J 60.2"


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414677#comment-16414677
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit d25c18ea4788b8ff642dee939c4dc0edc5729fb4 in lucene-solr's branch 
refs/heads/branch_7x from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d25c18e ]

LUCENE-8175: un-revert "LUCENE-8122: Updata autogenerated code after update to 
ICU4J 60.2"


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414671#comment-16414671
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 2dcf263b5207243f6854c0e48d2496036f678eee in lucene-solr's branch 
refs/heads/branch_7x from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2dcf263 ]

LUCENE-8175: un-revert "LUCENE-8122: Upgrade analysis/icu to ICU 60.2"

the new icu version has been released that fixes the concurrency issue.


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414669#comment-16414669
 ] 

ASF subversion and git services commented on LUCENE-8175:
-

Commit 4522e45bdadd4268a9270135130fc28a7f46c627 in lucene-solr's branch 
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4522e45 ]

LUCENE-8175: un-revert "LUCENE-8122: Upgrade analysis/icu to ICU 60.2"

the new icu version has been released that fixes the concurrency issue.


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414665#comment-16414665
 ] 

Robert Muir commented on LUCENE-8175:
-

the new version is released. I will attempt to fight with these merge conflicts 
and revert the reverts. Then i'll make a patch for this issue to upgrade to the 
new version.


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400899#comment-16400899
 ] 

Adrien Grand commented on LUCENE-8175:
--

I just tried with the rc of icu4j-61 and I can't reproduce the bug. (y)

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-13 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396932#comment-16396932
 ] 

Robert Muir commented on LUCENE-8175:
-

Yes that is good. FWIW I plan to upgrade to the new version regardless, even if 
this test sometimes fails.

At the end of the day StandardAnalyzer is always an option for users that want 
more stability and backwards compatibility. The ICU integration is instead for 
the latest unicode capabilities. I think its ok to hold them back for a few 
months because of rare bugs, but there's a limit.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-03-13 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396904#comment-16396904
 ] 

Adrien Grand commented on LUCENE-8175:
--

A release candidate is available for testing: 
https://sourceforge.net/p/icu/mailman/message/36259676/. We should give it a 
try.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372206#comment-16372206
 ] 

Robert Muir commented on LUCENE-8175:
-

ICU responded to Adrien's email about release plans: 
https://sourceforge.net/p/icu/mailman/message/36233218/


> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370081#comment-16370081
 ] 

Adrien Grand commented on LUCENE-8175:
--

I just reverted LUCENE-8122 and LUCENE-8125.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365276#comment-16365276
 ] 

Adrien Grand commented on LUCENE-8175:
--

This sounds good to me. I'll wait a couple days in case they push out a release 
soon and revert otherwise.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365075#comment-16365075
 ] 

Robert Muir commented on LUCENE-8175:
-

Since it will easily trip with threads on CJK/Thai and other common scripts, it 
would be good to address it before the next release.

I think we should first give ICU a chance. But if we want to release before 
they have released a fix, it would be best to just revert LUCENE-8122, 
LUCENE-8125, and Uwe's regenerate commit: 
https://github.com/apache/lucene-solr/commit/b3677c1a091209409590de3ec6bafde089323598#diff-1a83715f3cabfb71b96f435072789417
 in the 7.x branch? We could always backport them for a future 7.x release in 
such a case.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364577#comment-16364577
 ] 

Robert Muir commented on LUCENE-8175:
-

Because we will still do break iteration for scripts such as Thai in such a 
case.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364575#comment-16364575
 ] 

Adrien Grand commented on LUCENE-8175:
--

Scratch that, it does reproduce, it just takes more iterations.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364554#comment-16364554
 ] 

Adrien Grand commented on LUCENE-8175:
--

This is what it looks like indeed, but I'm not familiar enough with the ICU4J 
code to be 100% sure. The bug doesn't reproduce if I construct 
{{DefaultICUTokenizerConfig}} with {{false}} to both ctor args.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364529#comment-16364529
 ] 

Robert Muir commented on LUCENE-8175:
-

thanks for debugging. i saw a jenkins failure too, seems like it impacts 
languages where dictionary break is used (cjk/thai/etc) ?

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8175) ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

2018-02-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364363#comment-16364363
 ] 

Adrien Grand commented on LUCENE-8175:
--

For the record, I can reproduce the issue all the time with {{ant beast 
-Dbeast.iters=100 -Dtestcase=TestICUTokenizer 
-Dtests.method=testRandomHugeStrings}}.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> 
>
> Key: LUCENE-8175
> URL: https://issues.apache.org/jira/browse/LUCENE-8175
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Critical
>
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org