[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2020-01-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019454#comment-17019454
 ] 

ASF subversion and git services commented on LUCENE-9053:
-

Commit 8147e491ce3905bb3543f2c7e34a4ecb60382b49 in lucene-solr's branch 
refs/heads/gradle-master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8147e49 ]

LUCENE-9053: improve FST's package-info.java comment to clarify required 
(Unicode code point) sort order for FST.Builder


> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2020-01-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018237#comment-17018237
 ] 

ASF subversion and git services commented on LUCENE-9053:
-

Commit 8147e491ce3905bb3543f2c7e34a4ecb60382b49 in lucene-solr's branch 
refs/heads/master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8147e49 ]

LUCENE-9053: improve FST's package-info.java comment to clarify required 
(Unicode code point) sort order for FST.Builder


> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2020-01-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006894#comment-17006894
 ] 

Michael McCandless commented on LUCENE-9053:


+1 to improve {{package-info.java}} in 
{{lucene/core/src/java/org/apache/lucene/util}}!

Maybe we should just say Unicode code point order, not UTF16 as Java's 
{{String.compareTo}} sorts?

> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2019-11-19 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978060#comment-16978060
 ] 

Robert Muir commented on LUCENE-9053:
-

I think the package-info.java example (which is where that comment comes from) 
could be improved, "unicode sorted order" is too vague for java :)

The example strings here are out-of-order for unicode codepoints (UTF-8/32), 
but they are sorted correctly in UTF-16 order (e.g. sorted according to java's 
String.compareTo). So its probably confusing to a java developer who will sort 
their stuff according to java's String class and then be surprised to get an 
exception.


> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2019-11-19 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977831#comment-16977831
 ] 

Michael Sokolov commented on LUCENE-9053:
-

Seem like a self-homograph-attack 
https://en.wikipedia.org/wiki/IDN_homograph_attack

> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org