Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19249#discussion_r139301428
--- Diff: python/pyspark/sql/types.py ---
@@ -619,7 +621,8 @@ def fromInternal(self, obj):
# it's already converted by pickler
return obj
if self._needSerializeAnyField:
- values = [f.fromInternal(v) for f, v in zip(self.fields, obj)]
+ values = [f.fromInternal(v) if n else v
--- End diff --
Could we run a benchmark with the worst case, when all columns are needed
to be converted? I think here we pay another if and extra element in the zip
to prevert function call basically. This one looks okay practically but I guess
we should also identify the downside.
Also, let's add a comment here to describe what we are doing here and also
add some links to this PR for other guys to refer the benchmarks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]