Spark version is 2.2 and I think I am running into this issue 
https://issues.apache.org/jira/browse/SPARK-18016    as the dataset schema is 
pretty huge and nested

From: ARAVIND SETHURATHNAM <asethurath...@homeaway.com.INVALID>
Date: Monday, June 18, 2018 at 4:00 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Spark batch job: failed to compile: java.lang.NullPointerException


Hi,
We have a  spark job that reads AVRO data from a S3 location , does some 
processing and writes it back to S3. Of late it has been failing with the 
exception below,


Application application_1529346471665_0020 failed 1 times due to AM Container 
for appattempt_1529346471665_0020_000001 exited with exitCode: -104
For more detailed output, check application tracking 
page:http://10.122.49.134:8088/proxy/application_1529346471665_0020/Then, click 
on links to logs of each attempt.
Diagnostics: Container 
[pid=14249,containerID=container_1529346471665_0020_01_000001] is running 
beyond physical memory limits. Current usage: 23.4 GB of 22 GB physical memory 
used; 28.7 GB of 46.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1529346471665_0020_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) 
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14255 14249 14249 14249 (java) 23834 8203 30684336128 6142485 
/usr/java/default/bin/java -server -Xmx20480m 
-Djava.io.tmpdir=/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/tmp
 -Dspring.profiles.active=stage 
-Dspark.yarn.app.container.log.dir=/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001
 -XX:MaxPermSize=512m org.apache.spark.deploy.yarn.ApplicationMaster --class 
com.homeaway.omnihub.OmnitaskApp --jar 
/tmp/spark-9f42e005-e1b4-47c2-a6e8-ac0bc9fa595b/omnitask-spark-compaction-0.0.1.jar
 --arg 
--PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-hourly/entityEventLodgingRate-2/
 --arg 
--OUTPUT_PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2/
 --arg --DB_NAME=tier1_landingzone --arg 
--TABLE_NAME=entityeventlodgingrate_2_daily --arg --TABLE_DESCRIPTION=data in: 
's3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2'
 --arg --FORMAT=AVRO --arg --PARTITION_COLUMNS=dateid --arg --HOURLY=false 
--arg --START_DATE=20180616 --arg --END_DATE=20180616 --properties-file 
/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/__spark_conf__/__spark_conf__.properties
|- 14249 14247 14249 14249 (bash) 0 1 115826688 704 /bin/bash -c 
LD_LIBRARY_PATH=/usr/lib/hadoop2/lib/native::/usr/lib/qubole/packages/hadoop2-2.6.0/hadoop2/lib/native:/usr/lib/qubole/packages/hadoop2-2.6.0/hadoop2/lib/native
 /usr/java/default/bin/java -server -Xmx20480m 
-Djava.io.tmpdir=/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/tmp
 '-Dspring.profiles.active=stage' 
-Dspark.yarn.app.container.log.dir=/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001
 -XX:MaxPermSize=512m org.apache.spark.deploy.yarn.ApplicationMaster --class 
'com.homeaway.omnihub.OmnitaskApp' --jar 
/tmp/spark-9f42e005-e1b4-47c2-a6e8-ac0bc9fa595b/omnitask-spark-compaction-0.0.1.jar
 --arg 
'--PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-hourly/entityEventLodgingRate-2/'
 --arg 
'--OUTPUT_PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2/'
 --arg '--DB_NAME=tier1_landingzone' --arg 
'--TABLE_NAME=entityeventlodgingrate_2_daily' --arg '--TABLE_DESCRIPTION=data 
in: 
'\''s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2'\'''
 --arg '--FORMAT=AVRO' --arg '--PARTITION_COLUMNS=dateid' --arg 
'--HOURLY=false' --arg '--START_DATE=20180616' --arg '--END_DATE=20180616' 
--properties-file 
/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/__spark_conf__/__spark_conf__.properties
 1> 
/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001/stdout
 2> 
/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.





and in one of the executor logs which has a failed task I see below, can 
someone please lmk   whats causing the exception and the tasks to fail and what 
that below class is ?


18/06/18 20:34:05 dispatcher-event-loop-6 INFO BlockManagerInfo: Added 
broadcast_0_piece0 in memory on 10.122.51.238:42797 (size: 30.7 KB, free: 10.5 
GB)
18/06/18 20:34:06 dispatcher-event-loop-0 INFO BlockManagerInfo: Added 
broadcast_0_piece0 in memory on 10.122.51.238:43173 (size: 30.7 KB, free: 10.5 
GB)
18/06/18 20:34:41 task-result-getter-0 WARN TaskSetManager: Lost task 1.0 in 
stage 0.0 (TID 1, 10.122.48.122, executor 2): org.apache.spark.SparkException: 
Task failed while writing rows
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Error while encoding: 
java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
compile: java.lang.NullPointerException
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */   return new SpecificUnsafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificUnsafeProjection extends 
org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private int argValue;
/* 009 */   private Object[] values;
/* 010 */   private int argValue1;
/* 011 */   private boolean isNull21;
/* 012 */   private boolean value21;
/* 013 */   private boolean isNull22;
/* 014 */   private long value22;
/* 015 */   private boolean isNull23;
/* 016 */   private long value23;
/* 017 */   private int argValue2;
/* 018 */   private java.lang.String argValue3;
/* 019 */   private boolean isNull39;
/* 020 */   private boolean value39;
/* 021 */   private boolean isNull40;
/* 022 */   private UTF8String value40;
/* 023 */   private boolean isNull41;
/* 024 */   private UTF8String value41;
/* 025 */   private int argValue4;
/* 026 */   private java.lang.String argValue5;
/* 027 */   private boolean isNull57;
/* 028 */   private boolean value57;
/* 029 */   private boolean isNull58;


Regards
aravind

Reply via email to