[GitHub] [spark] Tagar commented on pull request #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

GitBox Sat, 25 Jul 2020 21:53:07 -0700


Tagar commented on pull request #26783:
URL: https://github.com/apache/spark/pull/26783#issuecomment-663935674



   Sorry for the uninitiated here.. 
   Just out of curiosity, that 3x performance improvement was for CPU execution?
   Reading a little bit on `awkward_array` - it can use cuda-kernels too 
   https://awkward-array.readthedocs.io/en/latest/index.html#more-documentation 
   Would be great to see what that improvement be on GPUs? 
   IMO this would be a great use case for PySpark UDF execution directly on 
GPUs,
   and deserves a separate `@numpy_udf` designation just like there is 
`@pandas_udf`. 
   Piggy backing on PandasUDF interface is confusing as this PR actually .. 
tries to avoid using Pandas. 
   Numba is another example that supports just-in-time compiling of Numpy logic 
to be 
   executed on GPUs 
   https://numba.pydata.org/numba-doc/latest/cuda/index.html
   My 2 cents.. I think it would be a great improvement! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Tagar commented on pull request #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

Reply via email to