Github user phegstrom commented on a diff in the pull request: https://github.com/apache/spark/pull/22227#discussion_r214926165 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -394,12 +394,14 @@ public void substringSQL() { @Test public void split() { - assertTrue(Arrays.equals(fromString("ab,def,ghi").split(fromString(","), -1), - new UTF8String[]{fromString("ab"), fromString("def"), fromString("ghi")})); - assertTrue(Arrays.equals(fromString("ab,def,ghi").split(fromString(","), 2), - new UTF8String[]{fromString("ab"), fromString("def,ghi")})); - assertTrue(Arrays.equals(fromString("ab,def,ghi").split(fromString(","), 2), - new UTF8String[]{fromString("ab"), fromString("def,ghi")})); + UTF8String[] negativeAndZeroLimitCase = + new UTF8String[]{fromString("ab"), fromString("def"), fromString("ghi"), fromString("")}; + assertTrue(Arrays.equals(fromString("ab,def,ghi,").split(fromString(","), 0), + negativeAndZeroLimitCase)); + assertTrue(Arrays.equals(fromString("ab,def,ghi,").split(fromString(","), -1), --- End diff -- @HyukjinKwon the last two were duplicates: ``` new UTF8String[]{fromString("ab"), fromString("def,ghi")})); assertTrue(Arrays.equals(fromString("ab,def,ghi").split(fromString(","), 2), new UTF8String[]{fromString("ab"), fromString("def,ghi")})); ``` And I also thought it better to include the case where you do get an empty string (adding one more instance of the regex at the end). Want me to revert? My view is it's more exhaustive of the expected behavior, and also easier to see that limit = -1 should behave exactly like limit = 0.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org