[jira] [Commented] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System
[ https://issues.apache.org/jira/browse/SPARK-23338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356771#comment-16356771 ] Marco Gaido commented on SPARK-23338: - [~Subham] questions should be sent to the user mailing list, JIRA is for reporting bugs/feature requests. Anyway, your problem seems realted to this: https://kitmenke.com/blog/2017/08/05/classcastexception-submitting-spark-apps-to-hdinsight/. Hope this can help you. > Spark unable to run on HDP deployed Azure Blob File System > -- > > Key: SPARK-23338 > URL: https://issues.apache.org/jira/browse/SPARK-23338 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.2.0 > Environment: HDP 2.6.0.3 > Spark2 2.2.0 > HDFS 2.7.3 > CentOS 7.1 >Reporter: Subhankar >Priority: Major > Labels: Azure, BLOB, HDP, azureblob, hadoop, hive, spark > > Hello, > It is impossible to run Spark on the BLOB storage file system deployed on HDP. > I am unable to run Spark as it is giving errors related to HiveSessionState, > HiveExternalCatalog and various Azure File storage exceptions. > I request you to kindly help in case you have a suggestion to address this. > Or is it that my exercise is futile and Spark is not configured to run on > BLOB storage after all. > Thanks in advance. > > Detailed Description: > > h5. *We are unable to access spark/spark2 when we change the file system > storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop > 2.7.3. All other services are working fine.* > I have set the following configurations: > *HDFS*: > core-site- > fs.defaultFS = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net] > fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb > fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs > fs.azure.selfthrottling.read.factor = 1.0 > fs.azure.selfthrottling.write.factor = 1.0 > [fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > [spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > *SPARK2:* > spark.eventLog.dir = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > spark.history.fs.logDirectory = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > In spite of trying multiple times and irrespective of alternative > configurations, the *spark-shell* command is yielding the below results: > $ spark-shell > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionState': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) > at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96) > ... 47 elided > Caused by: java.lang.reflect.InvocationTargetException: > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveExternalCatalog': > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980) > ... 58 more > Caused by:
[jira] [Commented] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System
[ https://issues.apache.org/jira/browse/SPARK-23338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352421#comment-16352421 ] Subhankar commented on SPARK-23338: --- Thanks for your response Sean. Could you please suggest a workaround for this. Should we raise a concern with Azure regarding this. > Spark unable to run on HDP deployed Azure Blob File System > -- > > Key: SPARK-23338 > URL: https://issues.apache.org/jira/browse/SPARK-23338 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.2.0 > Environment: HDP 2.6.0.3 > Spark2 2.2.0 > HDFS 2.7.3 > CentOS 7.1 >Reporter: Subhankar >Priority: Major > Labels: Azure, BLOB, HDP, azureblob, hadoop, hive, spark > > Hello, > It is impossible to run Spark on the BLOB storage file system deployed on HDP. > I am unable to run Spark as it is giving errors related to HiveSessionState, > HiveExternalCatalog and various Azure File storage exceptions. > I request you to kindly help in case you have a suggestion to address this. > Or is it that my exercise is futile and Spark is not configured to run on > BLOB storage after all. > Thanks in advance. > > Detailed Description: > > h5. *We are unable to access spark/spark2 when we change the file system > storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop > 2.7.3. All other services are working fine.* > I have set the following configurations: > *HDFS*: > core-site- > fs.defaultFS = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net] > fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb > fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs > fs.azure.selfthrottling.read.factor = 1.0 > fs.azure.selfthrottling.write.factor = 1.0 > [fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > [spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > *SPARK2:* > spark.eventLog.dir = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > spark.history.fs.logDirectory = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > In spite of trying multiple times and irrespective of alternative > configurations, the *spark-shell* command is yielding the below results: > $ spark-shell > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionState': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) > at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96) > ... 47 elided > Caused by: java.lang.reflect.InvocationTargetException: > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveExternalCatalog': > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980) > ... 58 more > Caused by: java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveExternalCatalog': > at >
[jira] [Commented] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System
[ https://issues.apache.org/jira/browse/SPARK-23338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352410#comment-16352410 ] Sean Owen commented on SPARK-23338: --- This all shows an error from Azure APIs, and ultimately a failure from the Azure blob store. THis doesn't sound Spark-related. > Spark unable to run on HDP deployed Azure Blob File System > -- > > Key: SPARK-23338 > URL: https://issues.apache.org/jira/browse/SPARK-23338 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.2.0 > Environment: HDP 2.6.0.3 > Spark2 2.2.0 > HDFS 2.7.3 > CentOS 7.1 >Reporter: Subhankar >Priority: Major > Labels: Azure, BLOB, HDP, azureblob, hadoop, hive, spark > > Hello, > It is impossible to run Spark on the BLOB storage file system deployed on HDP. > I am unable to run Spark as it is giving errors related to HiveSessionState, > HiveExternalCatalog and various Azure File storage exceptions. > I request you to kindly help in case you have a suggestion to address this. > Or is it that my exercise is futile and Spark is not configured to run on > BLOB storage after all. > Thanks in advance. > > Detailed Description: > > h5. *We are unable to access spark/spark2 when we change the file system > storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop > 2.7.3. All other services are working fine.* > I have set the following configurations: > *HDFS*: > core-site- > fs.defaultFS = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net] > fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb > fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs > fs.azure.selfthrottling.read.factor = 1.0 > fs.azure.selfthrottling.write.factor = 1.0 > [fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > [spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/] > = KEY > *SPARK2:* > spark.eventLog.dir = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > spark.history.fs.logDirectory = > wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/ > In spite of trying multiple times and irrespective of alternative > configurations, the *spark-shell* command is yielding the below results: > $ spark-shell > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionState': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) > at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96) > ... 47 elided > Caused by: java.lang.reflect.InvocationTargetException: > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveExternalCatalog': > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980) > ... 58 more > Caused by: java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveExternalCatalog': > at >