Cazen Lee created SPARK-12537:
---------------------------------

             Summary: Add option to accept quoting of all character backslash 
quoting mechanism
                 Key: SPARK-12537
                 URL: https://issues.apache.org/jira/browse/SPARK-12537
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.5.2
            Reporter: Cazen Lee


We can provides the option to choose JSON parser can be enabled to accept 
quoting of all character or not.

For example, if JSON file that includes not listed by JSON backslash quoting 
specification, it returns corrupt_record

JSON File
<code>
{"name": "Cazen Lee", "price": "$10"}
{"name": "John Doe", "price": "\$20"}
{"name": "Tracy", "price": "$10"}
<code>

<code>
scala> df.show
+--------------------+---------+-----+
|     _corrupt_record|     name|price|
+--------------------+---------+-----+
|                null|Cazen Lee|  $10|
|{"name": "John Do...|     null| null|
|                null|    Tracy|  $10|
+--------------------+---------+-----+
<code>

And after apply this patch, we can enable allowBackslashEscapingAnyCharacter 
option like below

<code>
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", 
"true").json("/user/Cazen/test/test2.txt")
df: org.apache.spark.sql.DataFrame = [name: string, price: string]

scala> df.show
+---------+-----+
|     name|price|
+---------+-----+
|Cazen Lee|  $10|
| John Doe|  $20|
|    Tracy|  $10|
+---------+-----+
<code>

This issue similar to HIVE-11825, HIVE-12717.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to