[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737386#comment-16737386
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit 283b19a8da6ab9e0b7e9a75b132d3067218d5502 in lucene-solr's branch 
refs/heads/master from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=283b19a ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737384#comment-16737384
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit e8c65da6bb8be626242cfba18989e497180e82aa in lucene-solr's branch 
refs/heads/branch_7x from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e8c65da ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737385#comment-16737385
 ] 

ASF subversion and git services commented on LUCENE-8527:
-

Commit 0e903cab47e98c75d4fe0bb2a33a84e8f3c648ff in lucene-solr's branch 
refs/heads/branch_8x from Steven Rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0e903ca ]

LUCENE-8527: Upgrade JFlex to 1.7.0. StandardTokenizer and 
UAX29URLEmailTokenizer now support Unicode 9.0, and provide UTS#51 v11.0 Emoji 
tokenization with the '' token type.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2019-01-02 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732330#comment-16732330
 ] 

Steve Rowe commented on LUCENE-8527:


bq. \[W\]ith the default skeleton, JFlex 1.7.0 generates scanners that 
misbehaves when given a spoon-feeding reader (i.e. a reader that returns at 
least one char but fewer than the requested number of chars) \[\] I'll make 
a JFlex issue for this bug.

I created an issue: https://github.com/jflex-de/jflex/issues/538

bq. \[I\]nvocations of JFlex's Ant target inherit options used by previous 
invocations in the same Ant session. I'll make a JFlex issue for this bug.

There has been an ongoing effort to fix this, see: 
https://github.com/jflex-de/jflex/pull/258 , likely will be included in next 
JFlex release.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch, LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2018-12-09 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714304#comment-16714304
 ] 

Steve Rowe commented on LUCENE-8527:


FYI the patch does not include generated files, since that would make it much 
larger.  Run {{ant jflex}} in {{lucene/core}} and {{lucene/analysis/common}} to 
do the generation.

> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8527.patch
>
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2018-12-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713510#comment-16713510
 ] 

Robert Muir commented on LUCENE-8527:
-

It would be really nice. I don't think the tricky part is really segmentation 
at all (as far as finding breaks) but instead the problem of assigning the 
proper "label" to the token (tag it as a emoji type). 

So the stuff in the ICU tokenizer uses some properties to tag the "stuff 
between breaks" as emoji token type versus something else. I looked at latest 
jflex, it seems it would need those props? And its a little tricky, e.g. 
ordinary ascii digit 7 is [:Emoji:] in unicode. So thats why the isEmoji there 
is a bit crazy.


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Priority: Minor
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2018-12-07 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713437#comment-16713437
 ] 

Steve Rowe commented on LUCENE-8527:


[~rcmuir ] mentioned on LUCENE-8125 that StandardTokenizer should give such 
sequences the {{}} token type - see the logic in the {{icu}} module's 
{{BreakIteratorWrapper}}.

JFlex 1.7.0 supports Unicode 9.0, which, if I'm interpreting the discussion at 
http://www.unicode.org/L2/L2016/16315r-handling-seg-emoji.pdf properly, does 
not (fully) include Emoji sequence support (though customized rules that would 
do that properly in Unicode 9.0 are listed in that doc).

Should we include the (post-9.0) customized rules for Unicode 9.0?


> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Priority: Minor
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8527) Upgrade JFlex to 1.7.0

2018-12-07 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713295#comment-16713295
 ] 

Uwe Schindler commented on LUCENE-8527:
---

+1

> Upgrade JFlex to 1.7.0
> --
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build, modules/analysis
>Reporter: Steve Rowe
>Priority: Minor
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently: 
> [http://jflex.de/changelog.html#jflex-1.7.0].  We should upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org