Hi Spark users Can somebody explain about the following WARN with exception? I am running Spark 1.5.0 and the job was successful but I am wondering whether it's totally OK to keep using Spark SQL window function
15/09/24 06:31:49 WARN TaskSetManager: Lost task 17.0 in stage 4.0 (TID 18907, rt-mesos14-sjc1.prod.uber.internal): java.lang.NegativeArraySizeException at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:223) at org.apache.spark.unsafe.types.UTF8String.clone(UTF8String.java:827) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source) at org.apache.spark.sql.execution.Window$$anonfun$8$$anon$1.next(Window.scala:325) at org.apache.spark.sql.execution.Window$$anonfun$8$$anon$1.next(Window.scala:252) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:48) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) The window function I used is, lead(ts) over (partition by A, B, C order by ts) as next_ts, lead(status) over (partition by A, B, C order by ts) as next_status, Thank you Best, Jae