[jira] [Commented] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data
[ https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386239#comment-16386239 ] Apache Spark commented on SPARK-23603: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/20739 > When the length of the json is in a range,get_json_object will result in > missing tail data > -- > > Key: SPARK-23603 > URL: https://issues.apache.org/jira/browse/SPARK-23603 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.2.0, 2.3.0 >Reporter: dzcxzl >Priority: Major > > Jackson(>=2.7.7) fixes the possibility of missing tail data when the length > of the value is in a range > [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7] > [https://github.com/FasterXML/jackson-core/issues/307] > > spark-shell: > > {code:java} > val value = "x" * 3000 > val json = s"""{"big": "$value"}""" > spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect > res0: Array[org.apache.spark.sql.Row] = Array([2991]) > {code} > correct result : 3000 > > > There are two solutions > One is > bump jackson version to 2.7.7 > The other one is > Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23603) When the length of the json is in a range,get_json_object will result in missing tail data
[ https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386235#comment-16386235 ] Apache Spark commented on SPARK-23603: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/20738 > When the length of the json is in a range,get_json_object will result in > missing tail data > -- > > Key: SPARK-23603 > URL: https://issues.apache.org/jira/browse/SPARK-23603 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.2.0, 2.3.0 >Reporter: dzcxzl >Priority: Major > > Jackson(>=2.7.7) fixes the possibility of missing tail data when the length > of the value is in a range > [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7] > [https://github.com/FasterXML/jackson-core/issues/307] > > spark-shell: > > {code:java} > val value = "x" * 3000 > val json = s"""{"big": "$value"}""" > spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect > res0: Array[org.apache.spark.sql.Row] = Array([2991]) > {code} > correct result : 3000 > > > There are two solutions > One is > bump jackson version to 2.7.7 > The other one is > Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org