Please find the attached error From: Roland Johann <roland.joh...@phenetic.io> Sent: 23 August 2019 10:51 AM To: Krishna Chandran Nair <kcn...@qatarairways.com.qa> Cc: user@spark.apache.org Subject: [External]Re: error while connecting to azure blob storage
Hi Krishna, there seems to be no attachment. In addition, you should NEVER post private credentials to public forums. Please renew the credentials of your storage account as soon as possible! Best Regards Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io<mailto:roland.joh...@phenetic.io> Web: phenetic.io<http://phenetic.io> Handelsregister: Amtsgericht Köln (HRB 92595) Geschäftsführer: Roland Johann, Uwe Reimann Am 23.08.2019 um 08:33 schrieb Krishna Chandran Nair <kcn...@qatarairways.com.qa<mailto:kcn...@qatarairways.com.qa>>: Hi Team, I have written a small code to connect to azure blob storage but go error. I have attached the error log. Please help Calling command -- ./spark-submit stg.py --jars /home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-azure-3.2.0.jar,/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/azure-storage-8.4.0.jar Code vi ~/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql import DataFrameReader from pyspark.sql import SparkSession session = SparkSession.builder.getOrCreate() #session.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") #session.conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem") #session.conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem") #session.conf.set( # "fs.azure.sas.snowflakestrg.blob.core.windows.net/test", # "?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:37:05Z&st=2019-08-13T08:37:05Z&spr=https&sig=BgTl8mibE%2B%2BTTIMG4dKR17NnGinMWEVTtn888MD8PT4%3D" #) session.conf.set( "fs.azure.account.key.snowflakestrg.blob.core.windows.net", "LIWCYzrJOS4hs0DiQH6fAzjuBnuj/F8myVmJImomEqOqlAV4pSt7KWfr24mj0saaOTVNZkGTKUn41k4e9hqKSA==") df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv") df.show(5) Qatar Airways - Going Places Together [OW LOGO] Disclaimer:- This message (including attachments) is intended solely for the addressee named above. It may be confidential, privileged, subject to copyright, trade secret, or other legal rules and may not be forwarded without the author's permission. If you are not the addressee you must not read, copy or disseminate this message. If you have received it in error please notify the sender immediately and delete the message from all storage devices. Any opinions expressed in this message do not necessarily represent the official positions of Qatar Airways. Any agreements (including any warranties, representations, or offers) concluded with Qatar Airways by using electronic correspondence shall only come into existence if an authorized representative of Qatar Airways has explicitly approved such contract formation. To the fullest extent permissible by law, Qatar Airways disclaim all liability for loss or damage to person or property arising from this message being infected by computer virus or other contamination.
Error ------------------------------ citus@azcitusclient:~/spark/spark-2.3.3-bin-hadoop2.7/bin$ ./spark-submit stg.py --jars /home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-azure-3.2.0.jar,/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/azure-storage-8.4.0.jar WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance() WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 19/08/19 08:55:31 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 19/08/19 08:55:31 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path: [/usr/java/packages/lib, /usr/lib/x86_64-linux-gnu/jni, /lib/x86_64-linux-gnu, /usr/lib/x86_64-linux-gnu, /usr/lib/jni, /lib, /usr/lib] 19/08/19 08:55:31 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 19/08/19 08:55:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/08/19 08:55:31 INFO SparkContext: Running Spark version 2.3.3 19/08/19 08:55:31 INFO SparkContext: Submitted application: stg.py 19/08/19 08:55:31 INFO SecurityManager: Changing view acls to: citus 19/08/19 08:55:31 INFO SecurityManager: Changing modify acls to: citus 19/08/19 08:55:31 INFO SecurityManager: Changing view acls groups to: 19/08/19 08:55:31 INFO SecurityManager: Changing modify acls groups to: 19/08/19 08:55:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(citus); groups with view permissions: Set(); users with modify permissions: Set(citus); groups with modify permissions: Set() 19/08/19 08:55:32 INFO Utils: Successfully started service 'sparkDriver' on port 45463. 19/08/19 08:55:32 INFO SparkEnv: Registering MapOutputTracker 19/08/19 08:55:32 INFO SparkEnv: Registering BlockManagerMaster 19/08/19 08:55:32 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/08/19 08:55:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/08/19 08:55:32 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5c785fe1-8875-45c8-ba1f-0a0a4e04c4b9 19/08/19 08:55:32 INFO MemoryStore: MemoryStore started with capacity 434.4 MB 19/08/19 08:55:32 INFO SparkEnv: Registering OutputCommitCoordinator 19/08/19 08:55:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 19/08/19 08:55:32 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://azcitusclient:4040 19/08/19 08:55:33 INFO SparkContext: Added file file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py at file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py with timestamp 1566204933258 19/08/19 08:55:33 INFO Utils: Copying /home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py to /tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d/userFiles-82f72b0a-5cc1-4884-a641-ac59b00d2217/stg.py 19/08/19 08:55:33 INFO Executor: Starting executor ID driver on host localhost 19/08/19 08:55:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36155. 19/08/19 08:55:33 INFO NettyBlockTransferService: Server created on azcitusclient:36155 19/08/19 08:55:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/08/19 08:55:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, azcitusclient, 36155, None) 19/08/19 08:55:33 INFO BlockManagerMasterEndpoint: Registering block manager azcitusclient:36155 with 434.4 MB RAM, BlockManagerId(driver, azcitusclient, 36155, None) 19/08/19 08:55:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, azcitusclient, 36155, None) 19/08/19 08:55:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, azcitusclient, 36155, None) 19/08/19 08:55:33 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/spark-warehouse/'). 19/08/19 08:55:33 INFO SharedState: Warehouse path is 'file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/spark-warehouse/'. 19/08/19 08:55:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 19/08/19 08:55:34 WARN FileStreamSink: Error while looking for metadata directory. Traceback (most recent call last): File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py", line 24, in <module> df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv") File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 441, in csv File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o22.csv. : java.io.IOException: No FileSystem for scheme: wasbs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:709) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:389) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:834) 19/08/19 08:55:34 INFO SparkContext: Invoking stop() from shutdown hook 19/08/19 08:55:34 INFO SparkUI: Stopped Spark web UI at http://azcitusclient:4040 19/08/19 08:55:34 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/08/19 08:55:34 INFO MemoryStore: MemoryStore cleared 19/08/19 08:55:34 INFO BlockManager: BlockManager stopped 19/08/19 08:55:34 INFO BlockManagerMaster: BlockManagerMaster stopped 19/08/19 08:55:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/08/19 08:55:34 INFO SparkContext: Successfully stopped SparkContext 19/08/19 08:55:34 INFO ShutdownHookManager: Shutdown hook called 19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d/pyspark-f6c990dd-4a77-44d4-beb5-f12951b8d75f 19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-605f41ed-58c7-4824-9d2c-90479e36ccd1 19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d vi ~/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql import DataFrameReader from pyspark.sql import SparkSession session = SparkSession.builder.getOrCreate() #session.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") #session.conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem") #session.conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem") #session.conf.set( # "fs.azure.sas.snowflakestrg.blob.core.windows.net/test", # "?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:37:05Z&st=2019-08-13T08:37:05Z&spr=https&sig=BgTl8mibE%2B%2BTTIMG4dKR17NnGinMWEVTtn888MD8PT4%3D" #) session.conf.set( "fs.azure.account.key.snowflakestrg.blob.core.windows.net", "LIWCYzrJOS4hs0DiQH6fAzjuBnuj/F8myVmJImomEqOqlAV4pSt7KWfr24mj0saaOTVNZkGTKUn41k4e9hqKSA==") df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv") df.show(5)
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org