[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019454#comment-17019454 ] ASF subversion and git services commented on LUCENE-9053: - Commit 8147e491ce3905bb3543f2c7e34a4ecb60382b49 in lucene-solr's branch refs/heads/gradle-master from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8147e49 ] LUCENE-9053: improve FST's package-info.java comment to clarify required (Unicode code point) sort order for FST.Builder > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"퐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018237#comment-17018237 ] ASF subversion and git services commented on LUCENE-9053: - Commit 8147e491ce3905bb3543f2c7e34a4ecb60382b49 in lucene-solr's branch refs/heads/master from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8147e49 ] LUCENE-9053: improve FST's package-info.java comment to clarify required (Unicode code point) sort order for FST.Builder > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"퐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006894#comment-17006894 ] Michael McCandless commented on LUCENE-9053: +1 to improve {{package-info.java}} in {{lucene/core/src/java/org/apache/lucene/util}}! Maybe we should just say Unicode code point order, not UTF16 as Java's {{String.compareTo}} sorts? > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"퐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978060#comment-16978060 ] Robert Muir commented on LUCENE-9053: - I think the package-info.java example (which is where that comment comes from) could be improved, "unicode sorted order" is too vague for java :) The example strings here are out-of-order for unicode codepoints (UTF-8/32), but they are sorted correctly in UTF-16 order (e.g. sorted according to java's String.compareTo). So its probably confusing to a java developer who will sort their stuff according to java's String class and then be surprised to get an exception. > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"퐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]
[ https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977831#comment-16977831 ] Michael Sokolov commented on LUCENE-9053: - Seem like a self-homograph-attack https://en.wikipedia.org/wiki/IDN_homograph_attack > java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c > 8b] vs input=[ef ac 81 67 75 72 65] > --- > > Key: LUCENE-9053 > URL: https://issues.apache.org/jira/browse/LUCENE-9053 > Project: Lucene - Core > Issue Type: Bug >Reporter: gitesh >Priority: Minor > > Even if the inputs are sorted in unicode order, I get following exception > while creating FST: > > {code:java} > // Input values (keys). These must be provided to Builder in Unicode sorted > order! > String inputValues[] = {"퐴", "figure", "flagship"}; > long outputValues[] = {5, 7, 12}; > PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); > Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs); > BytesRefBuilder scratchBytes = new BytesRefBuilder(); > IntsRefBuilder scratchInts = new IntsRefBuilder(); > for (int i = 0; i < inputValues.length; i++) { > scratchBytes.copyChars(inputValues[i]); > builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), > outputValues[i]); > } > FST fst = builder.finish(); > Long value = Util.get(fst, new BytesRef("figure")); > System.out.println(value); > {code} > Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are > using the ligature character{color} fl {color:#172b4d}above. {color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org