Markus Tretzmüller created SPARK-31704:
------------------------------------------

             Summary: PandasUDFType.GROUPED_AGG with Java 11
                 Key: SPARK-31704
                 URL: https://issues.apache.org/jira/browse/SPARK-31704
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.0.0
         Environment: java jdk: 11

python: 3.7

 
            Reporter: Markus Tretzmüller


Running the example from the 
[docs|https://spark.apache.org/docs/3.0.0-preview2/api/python/pyspark.sql.html#module-pyspark.sql.functions]
 gives an error with java 11. It works with java 8.


{code:python}
import findspark
findspark.init('/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7')
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql import Window
from pyspark.sql import SparkSession

if __name__ == '__main__':
    spark = SparkSession \
        .builder \
        .appName('test') \
        .getOrCreate()

    df = spark.createDataFrame(
        [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
        ("id", "v"))

    @pandas_udf("double", PandasUDFType.GROUPED_AGG)
    def mean_udf(v):
        return v.mean()

    w = (Window.partitionBy('id')
         .orderBy('v')
         .rowsBetween(-1, 0))
    df.withColumn('mean_v', mean_udf(df['v']).over(w)).show()
{code}


{noformat}
File 
"/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o81.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in 
stage 7.0 failed 1 times, most recent failure: Lost task 44.0 in stage 7.0 (TID 
37, 131.130.32.15, executor driver): java.lang.UnsupportedOperationException: 
sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
        at 
io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473)
        at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
        at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
        at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
        at 
org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222)
        at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:240)
        at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:132)
        at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:120)
        at 
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:94)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at 
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:101)
        at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:373)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932)
        at 
org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:213)
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to