bersprockets commented on a change in pull request #23336: [SPARK-26378][SQL] 
Restore performance of queries against wide CSV tables
URL: https://github.com/apache/spark/pull/23336#discussion_r242220925
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala
 ##########
 @@ -40,14 +40,20 @@ class FailureSafeParser[IN](
   // set the bad record to this field, and set other fields according to the 
partial result or null.
   private val toResultRow: (Option[InternalRow], () => UTF8String) => 
InternalRow = {
     (row, badRecord) => {
-      var i = 0
-      while (i < actualSchema.length) {
-        val from = actualSchema(i)
-        resultRow(schema.fieldIndex(from.name)) = row.map(_.get(i, 
from.dataType)).orNull
-        i += 1
+      // save the value. Some implementations of badRecord do not like to be 
called twice
+      val badRec = badRecord()
+      if (badRec != null || corruptFieldIndex.isDefined) {
 
 Review comment:
   @MaxGekk 
   >I think partial result should be returned even the corrupt column is not 
defined.
   
   It works that way (I hope). For example, id2 is a DateType field that 
contains bad data:
   <pre>
   scala> sql("select id1, id2 from csvtbl").show
   sql("select id1, id2 from csvtbl").show
   +-----+----+
   |  id1| id2|
   +-----+----+
   |276.0|null|
   |176.0|null|
   |  2.0|null|
   |355.0|null|
   |263.0|null|
   |172.0|null|
   |196.0|null|
   |352.0|null|
   ....etc...
   </pre>
   No corrupt record field was specified in this case.
   
   I tried not to change the semantics of the partial result changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to