Tagar commented on pull request #26783: URL: https://github.com/apache/spark/pull/26783#issuecomment-663935674
Sorry for the uninitiated here.. Just out of curiosity, that 3x performance improvement was for CPU execution? Reading a little bit on `awkward_array` - it can use cuda-kernels too https://awkward-array.readthedocs.io/en/latest/index.html#more-documentation Would be great to see what that improvement be on GPUs? IMO this would be a great use case for PySpark UDF execution directly on GPUs, and deserves a separate `@numpy_udf` designation just like there is `@pandas_udf`. Piggy backing on PandasUDF interface is confusing as this PR actually .. tries to avoid using Pandas. Numba is another example that supports just-in-time compiling of Numpy logic to be executed on GPUs https://numba.pydata.org/numba-doc/latest/cuda/index.html My 2 cents.. I think it would be a great improvement! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
