[jira] [Commented] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System

2018-02-05 Thread Subhankar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352421#comment-16352421
 ] 

Subhankar commented on SPARK-23338:
---

Thanks for your response Sean. Could you please suggest a workaround for this. 
Should we raise a concern with Azure regarding this. 

> Spark unable to run on HDP deployed Azure Blob File System
> --
>
> Key: SPARK-23338
> URL: https://issues.apache.org/jira/browse/SPARK-23338
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.2.0
> Environment: HDP 2.6.0.3
> Spark2 2.2.0
> HDFS 2.7.3
> CentOS 7.1
>Reporter: Subhankar
>Priority: Major
>  Labels: Azure, BLOB, HDP, azureblob, hadoop, hive, spark
>
> Hello,
> It is impossible to run Spark on the BLOB storage file system deployed on HDP.
> I am unable to run Spark as it is giving errors related to HiveSessionState, 
> HiveExternalCatalog and various Azure File storage exceptions.
> I request you to kindly help in case you have a suggestion to address this. 
> Or is it that my exercise is futile and Spark is not configured to run on 
> BLOB storage after all.
> Thanks in advance.
>  
> Detailed Description:
>  
> h5. *We are unable to access spark/spark2 when we change the file system 
> storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 
> 2.7.3. All other services are working fine.*
> I have set the following configurations:
> *HDFS*:
> core-site-
> fs.defaultFS = 
> wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]
> fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb
> fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs
> fs.azure.selfthrottling.read.factor = 1.0
> fs.azure.selfthrottling.write.factor = 1.0
> [fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/]
>  = KEY
> [spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/]
>  = KEY
> *SPARK2:*
> spark.eventLog.dir = 
> wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
> spark.history.fs.logDirectory = 
> wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
> In spite of trying multiple times and irrespective of alternative 
> configurations, the *spark-shell* command is yielding the below results:
> $ spark-shell
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionState':
> at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
> at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
> at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
> at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
> at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
> at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
> ... 47 elided
> Caused by: java.lang.reflect.InvocationTargetException: 
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveExternalCatalog':
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
> ... 58 more
> Caused by: java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveExternalCatalog':
> at 
> org.apache.spark.sql.intern

[jira] [Created] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System

2018-02-05 Thread Subhankar (JIRA)
Subhankar created SPARK-23338:
-

 Summary: Spark unable to run on HDP deployed Azure Blob File System
 Key: SPARK-23338
 URL: https://issues.apache.org/jira/browse/SPARK-23338
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell
Affects Versions: 2.2.0
 Environment: HDP 2.6.0.3

Spark2 2.2.0

HDFS 2.7.3

CentOS 7.1
Reporter: Subhankar


Hello,

It is impossible to run Spark on the BLOB storage file system deployed on HDP.
I am unable to run Spark as it is giving errors related to HiveSessionState, 
HiveExternalCatalog and various Azure File storage exceptions.
I request you to kindly help in case you have a suggestion to address this. Or 
is it that my exercise is futile and Spark is not configured to run on BLOB 
storage after all.

Thanks in advance.

 

Detailed Description:

 
h5. *We are unable to access spark/spark2 when we change the file system 
storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 
2.7.3. All other services are working fine.*

I have set the following configurations:

*HDFS*:

core-site-

fs.defaultFS = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]

fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb

fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs

fs.azure.selfthrottling.read.factor = 1.0

fs.azure.selfthrottling.write.factor = 1.0

[fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/]
 = KEY

[spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/]
 = KEY


*SPARK2:*

spark.eventLog.dir = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/

spark.history.fs.logDirectory = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/

In spite of trying multiple times and irrespective of alternative 
configurations, the *spark-shell* command is yielding the below results:

$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveSessionState':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: 
java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveExternalCatalog':
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveExternalCatalog':
at 
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
at org.apache.spark.sql.internal.SharedState.(SharedState.scala:86)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.(SessionState