Hi, all: I test spark-1.5.*-bin-hadoop2.6, and find this problem, it’s easy to reproduce.
Environment: OS: CentOS release 6.5 (Final) 2.6.32-431.el6.x86_64 JVM: java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) When enable spark.unsafe.offHeap and spark.sql.tungsten.enabled, query “select distinct name from people” failed with java.lang.NullPointerException When disable spark.unsafe.offHeap or spark.sql.tungsten.enabled, it’s ok. $ pwd /data1/spark-1.5.2-bin-hadoop2.6 $ cat conf/spark-defaults.conf: spark.driver.memory 16g spark.unsafe.offHeap true spark.sql.tungsten.enabled true $ bin/beeline 0: jdbc:hive2://192.168.1.19:10000/default> show tables; +------------+--------------+--+ | tableName | isTemporary | +------------+--------------+--+ +------------+--------------+--+ No rows selected (0.66 seconds) 0: jdbc:hive2://192.168.1.19:10000/default> CREATE TABLE people USING org.apache.spark.sql.json OPTIONS (path "examples/src/main/resources/people.json"); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.378 seconds) 0: jdbc:hive2://192.168.1.19:10000/default> show tables; +------------+--------------+--+ | tableName | isTemporary | +------------+--------------+--+ | people | false | +------------+--------------+--+ 1 row selected (0.039 seconds) 0: jdbc:hive2://192.168.1.19:10000/default> select * from people; +-------+----------+--+ | age | name | +-------+----------+--+ | NULL | Michael | | 30 | Andy | | 19 | Justin | +-------+----------+--+ 3 rows selected (1.515 seconds) 0: jdbc:hive2://192.168.1.19:10000/default> select distinct name from people; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 5, localhost): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateResultProjection$3.apply(TungstenAggregationIterator.scala:306) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateResultProjection$3.apply(TungstenAggregationIterator.scala:305) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.next(TungstenAggregationIterator.scala:666) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.next(TungstenAggregationIterator.scala:76) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: (state=,code=0) 0: jdbc:hive2://192.168.1.19:10000/default> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org