[jira] [Updated] (SPARK-23428) Revert [SPARK-23094] Fix invalid character handling in JsonDataSource

2018-02-14 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-23428:

Summary: Revert [SPARK-23094]  Fix invalid character handling in 
JsonDataSource  (was: Revert )

> Revert [SPARK-23094]  Fix invalid character handling in JsonDataSource
> --
>
> Key: SPARK-23428
> URL: https://issues.apache.org/jira/browse/SPARK-23428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
>
> {noformat}
>   test("invalid json with leading nulls - from dataset") {
> import testImplicits._
> withTempDir { tempDir =>
>   val path = tempDir.getAbsolutePath
>   Seq("""{"firstName":"Chris", "lastName":"Baird"}""",
> """{"firstName":"Doug", 
> "lastName":"Rood"}""").toDS().write.mode("overwrite").text(path)
>   val schema = new StructType().add("a", 
> IntegerType).add("_corrupt_record", StringType)
>   val jsonDF = spark.read.schema(schema).option("mode", 
> "DROPMALFORMED").json(path)
>   checkAnswer(jsonDF, Seq(
> Row("Chris", "Baird"), Row("Doug", "Rood")
>   ))
> }
>   }
> {noformat}
> After this PR it returns a wrong answer. 
> {noformat}
> [null,null]
> [null,null]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23428) Revert

2018-02-14 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-23428:

Description: 
{noformat}
  test("invalid json with leading nulls - from dataset") {
import testImplicits._
withTempDir { tempDir =>
  val path = tempDir.getAbsolutePath
  Seq("""{"firstName":"Chris", "lastName":"Baird"}""",
"""{"firstName":"Doug", 
"lastName":"Rood"}""").toDS().write.mode("overwrite").text(path)
  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
  val jsonDF = spark.read.schema(schema).option("mode", 
"DROPMALFORMED").json(path)
  checkAnswer(jsonDF, Seq(
Row("Chris", "Baird"), Row("Doug", "Rood")
  ))
}
  }

{noformat}

After this PR it returns a wrong answer. 

{noformat}
[null,null]
[null,null]
{noformat}

  was:
  test("invalid json with leading nulls - from dataset") {
import testImplicits._
withTempDir { tempDir =>
  val path = tempDir.getAbsolutePath
  Seq("""{"firstName":"Chris", "lastName":"Baird"}""",
"""{"firstName":"Doug", 
"lastName":"Rood"}""").toDS().write.mode("overwrite").text(path)
  val schema = new StructType().add("a", 
IntegerType).add("_corrupt_record", StringType)
  val jsonDF = spark.read.schema(schema).option("mode", 
"DROPMALFORMED").json(path)
  checkAnswer(jsonDF, Seq(
Row("Chris", "Baird"), Row("Doug", "Rood")
  ))
}
  }

Now it returns 
{noformat}
[null,null]
[null,null]
{noformat}


> Revert 
> ---
>
> Key: SPARK-23428
> URL: https://issues.apache.org/jira/browse/SPARK-23428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
>
> {noformat}
>   test("invalid json with leading nulls - from dataset") {
> import testImplicits._
> withTempDir { tempDir =>
>   val path = tempDir.getAbsolutePath
>   Seq("""{"firstName":"Chris", "lastName":"Baird"}""",
> """{"firstName":"Doug", 
> "lastName":"Rood"}""").toDS().write.mode("overwrite").text(path)
>   val schema = new StructType().add("a", 
> IntegerType).add("_corrupt_record", StringType)
>   val jsonDF = spark.read.schema(schema).option("mode", 
> "DROPMALFORMED").json(path)
>   checkAnswer(jsonDF, Seq(
> Row("Chris", "Baird"), Row("Doug", "Rood")
>   ))
> }
>   }
> {noformat}
> After this PR it returns a wrong answer. 
> {noformat}
> [null,null]
> [null,null]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org