Hadoop credentials missing in some tasks?

Gerard Maas Fri, 05 Feb 2016 03:59:20 -0800

Hi,

We're facing a situation where simple queries to parquet files stored in
Swift through a Hive Metastore sometimes fail with this exception:


org.apache.spark.SparkException: Job aborted due to stage failure: Task 6
in stage 58.0 failed 4 times, most recent failure: Lost task 6.3 in stage
58.0 (TID 412, agent-1.mesos.private):
org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException: Missing
mandatory configuration option: fs.swift.service.######.auth.url
at 
org.apache.hadoop.fs.swift.http.RestClientBindings.copy(RestClientBindings.java:219)
(...)

Queries requiring a full table scan, like select(count(*)) would fail with
the mentioned exception while smaller chunks of work like " select *
 from... LIMIT 5" would succeed.

The problem seems to relate to the number of tasks scheduled:

If we force a reduction of the number of tasks to 1, the job  succeeds:

dataframe.rdd.coalesce(1).count()

Would return a correct result while

dataframe.count() would fail with the exception mentioned  above.

To me, it looks like credentials are lost somewhere in the serialization
path when the tasks are submitted to the cluster.  I have not found an
explanation yet to why a job that requires only one task succeeds.

We are running on Apache Zepellin  for Swift and Spark Notebook for S3.
Both show an equivalent exception within their specific hadoop filesystem
implementation when the task fails:

Zepelling + Swift:

org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException: Missing
mandatory configuration option: fs.swift.service.######.auth.url

Spark Notebook + S3:

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
must be specified as the username or password (respectively) of a s3n URL,
or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70)

Valid credentials are being set programmatically through
sc.hadoopConfiguration

Our system: Zepellin or Spark Notebook with Spark 1.5.1 running on Docker,
Docker running on Mesos, Hadoop 2.4.0. One environment running on Softlayer
(Swift) and other Amazon EC2 (S3) of similar sizes.

Any ideas on how to address this issue or figure out what's going on??

Thanks,  Gerard.

Hadoop credentials missing in some tasks?

Reply via email to