On 9 Sep 2016, at 17:56, Daniel Lopes
<[email protected]<mailto:[email protected]>> wrote:
Hi, someone can help
I'm trying to use parquet in IBM Block Storage at Spark but when I try to load
get this error:
using this config
credentials = {
"name": "keystone",
"auth_url":
"https://identity.open.softlayer.com<https://identity.open.softlayer.com/>",
"project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634",
"projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb",
"region": "dallas",
"userId": "XXXXX64087180b40XXXXX2b909",
"username": "admin_XXXX9dd810f8901d48778XXXXXX",
"password": "chXXXXXXXXXXXXX6_",
"domainId": "c1ddad17cfcXXXXXXXXX41",
"domainName": "10XXXXXX",
"role": "admin"
}
def set_hadoop_config(credentials):
"""This function sets the Hadoop configuration with given credentials,
so it is possible to access data using SparkContext"""
prefix = "fs.swift.service." + credentials['name']
hconf = sc._jsc.hadoopConfiguration()
hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens')
hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
hconf.set(prefix + ".tenant", credentials['projectId'])
hconf.set(prefix + ".username", credentials['userId'])
hconf.set(prefix + ".password", credentials['password'])
hconf.setInt(prefix + ".http.port", 8080)
hconf.set(prefix + ".region", credentials['region'])
hconf.setBoolean(prefix + ".public", True)
set_hadoop_config(credentials)
-------------------------------------------------
Py4JJavaErrorTraceback (most recent call last)
<ipython-input-55-5a14928215eb> in <module>()
----> 1 train.groupby('Acordo').count().show()
Py4JJavaError: An error occurred while calling o406.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 60 in
stage 30.0 failed 10 times, most recent failure: Lost task 60.9 in stage 30.0
(TID 2556, yp-spark-dal09-env5-0039):
org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException: Missing
mandatory configuration option: fs.swift.service.keystone.auth.url
In my own code, I'd assume that the value of credentials['name'] didn't match
that of the URL, assuming you have something like swift://bucket.keystone .
Failing that: the options were set too late.
Instead of asking for the hadoop config and editing that, set the option in
your spark context, before it is launched, with the prefix "hadoop"
at
org.apache.hadoop.fs.swift.http.RestClientBindings.copy(RestClientBindings.java:223)
at
org.apache.hadoop.fs.swift.http.RestClientBindings.bind(RestClientBindings.java:147)
Daniel Lopes
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
www.onematch.com.br<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>