[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal

HyukjinKwon Sat, 16 Sep 2017 21:21:59 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19249#discussion_r139301428
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -619,7 +621,8 @@ def fromInternal(self, obj):
                 # it's already converted by pickler
                 return obj
             if self._needSerializeAnyField:
    -            values = [f.fromInternal(v) for f, v in zip(self.fields, obj)]
    +            values = [f.fromInternal(v) if n else v
    --- End diff --
    
    Could we run a benchmark with the worst case, when all columns are needed 
to be converted?  I think here we pay another if and extra element in the zip 
to prevert function call basically. This one looks okay practically but I guess 
we should also identify the downside.
    
    Also, let's add a comment here to describe what we are doing here and also 
add some links to this PR for other guys to refer the benchmarks.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19249: [SPARK-22032] Speed up StructType.fromInternal

Reply via email to