[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-05-11 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281171#comment-15281171 ] Steve Rowe commented on LUCENE-6993: Hi Mike, my review of your latest patch: * All the on-or-after

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-05-04 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271376#comment-15271376 ] Steve Rowe commented on LUCENE-6993: Thanks for persisting Mike. I (and other JFlex community

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-05-04 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271363#comment-15271363 ] Mike Drob commented on LUCENE-6993: --- [~steve_rowe] - I see no movement coming from the JFlex community.

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-04-08 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232236#comment-15232236 ] Steve Rowe commented on LUCENE-6993: Hi Mike, sorry I haven't had the bandwidth to engage on this

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-04-08 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232225#comment-15232225 ] Mike Drob commented on LUCENE-6993: --- [~steve_rowe] - I pinged the jflex list about getting the release

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-22 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207199#comment-15207199 ] Mike Drob commented on LUCENE-6993: --- Any updates here? I'm not sure if there is anything I need to be

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-14 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193824#comment-15193824 ] Robert Muir commented on LUCENE-6993: - My mistake. thanks for the reminder. I have been working to

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-14 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193818#comment-15193818 ] Mike Drob commented on LUCENE-6993: --- [~rcmuir] - did you get a chance to look at this? Should I wait to

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-05 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181701#comment-15181701 ] Robert Muir commented on LUCENE-6993: - OK, thanks for the work! I will try to review all the changes

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-04 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180171#comment-15180171 ] Mike Drob commented on LUCENE-6993: --- There's a clean-jflex-legacy target that takes care of the

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-04 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180161#comment-15180161 ] Robert Muir commented on LUCENE-6993: - sorry, that just affects the cleaning part. But still, it

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-04 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180146#comment-15180146 ] Robert Muir commented on LUCENE-6993: - I don't understand this change: {code} - + {code}

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179311#comment-15179311 ] Robert Muir commented on LUCENE-6993: - If that test really took 50 minutes, there may be some issue

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-03 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178886#comment-15178886 ] Steve Rowe commented on LUCENE-6993: Mike, can you please exclude generated files from your patch?

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178765#comment-15178765 ] Robert Muir commented on LUCENE-6993: - {quote} Had issues with

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-03-02 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176520#comment-15176520 ] Robert Muir commented on LUCENE-6993: - I wouldnt change any of the ClassicTokenizer ranges, it should

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-29 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172324#comment-15172324 ] Mike Drob commented on LUCENE-6993: --- I think I am getting to a good place here, just a few more issues

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-26 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169658#comment-15169658 ] Robert Muir commented on LUCENE-6993: - Yeah its tricky. I kinda view classictokenizer as a tokenizer

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-26 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169638#comment-15169638 ] Steve Rowe commented on LUCENE-6993: {{ClassicTokenizer}} does have direct Unicode version

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-26 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169499#comment-15169499 ] Mike Drob commented on LUCENE-6993: --- Yea, Uwe understood my question. I wasn't planning on removing it,

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-26 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169481#comment-15169481 ] Uwe Schindler commented on LUCENE-6993: --- bq. Uwe Schindler has written that he still recommends

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-26 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168956#comment-15168956 ] Steve Rowe commented on LUCENE-6993: bq. Steve, did you get a chance to look at the buffer

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-25 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167852#comment-15167852 ] Mike Drob commented on LUCENE-6993: --- Steve, did you get a chance to look at the buffer adjustment? I'm

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155160#comment-15155160 ] Steve Rowe commented on LUCENE-6993: Yeah, the generated code underwent some changes there, so the

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155154#comment-15155154 ] Mike Drob commented on LUCENE-6993: --- Using newer version of jflex breaks our existing macros... {code}

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155062#comment-15155062 ] Steve Rowe commented on LUCENE-6993: +1 > Update UAX29URLEmailTokenizer TLDs to latest list, and

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155035#comment-15155035 ] Robert Muir commented on LUCENE-6993: - I think we should be ok. As far as i understand it, jflex will

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154949#comment-15154949 ] Mike Drob commented on LUCENE-6993: --- bq. I think we need to regenerate still, because there are new

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154936#comment-15154936 ] Mike Drob commented on LUCENE-6993: --- Question about what is proper behaviour in terms of backwards

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-19 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154933#comment-15154933 ] Robert Muir commented on LUCENE-6993: - I think we need to regenerate still, because there are new

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-18 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153306#comment-15153306 ] Steve Rowe commented on LUCENE-6993: I think you're right, Mike, I don't see any default word break

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-18 Thread Mike Drob (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153258#comment-15153258 ] Mike Drob commented on LUCENE-6993: --- Looking at http://unicode.org/reports/tr29/#Modifications I see

[jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0

2016-02-18 Thread Steve Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153209#comment-15153209 ] Steve Rowe commented on LUCENE-6993: [~mdrob], I haven't looked at your patch yet but there is a