Hi Spark users

Can somebody explain about the following WARN with exception? I am running
Spark 1.5.0 and the job was successful but I am wondering whether it's
totally OK to keep using Spark SQL window function

15/09/24 06:31:49 WARN TaskSetManager: Lost task 17.0 in stage 4.0 (TID
18907, rt-mesos14-sjc1.prod.uber.internal):
java.lang.NegativeArraySizeException
    at
org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:223)
    at org.apache.spark.unsafe.types.UTF8String.clone(UTF8String.java:827)
    at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown
Source)
    at
org.apache.spark.sql.execution.Window$$anonfun$8$$anon$1.next(Window.scala:325)
    at
org.apache.spark.sql.execution.Window$$anonfun$8$$anon$1.next(Window.scala:252)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:48)
    at
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86)
    at
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The window function I used is,

        lead(ts) over (partition by A, B, C order by ts) as next_ts,
        lead(status) over (partition by A, B, C order by ts) as next_status,

Thank you
Best, Jae

Reply via email to