RE: [External]Re: error while connecting to azure blob storage

Krishna Chandran Nair Fri, 23 Aug 2019 01:08:44 -0700

Please find the attached error

From: Roland Johann <roland.joh...@phenetic.io>
Sent: 23 August 2019 10:51 AM
To: Krishna Chandran Nair <kcn...@qatarairways.com.qa>
Cc: user@spark.apache.org
Subject: [External]Re: error while connecting to azure blob storage


Hi Krishna,

there seems to be no attachment.
In addition, you should NEVER post private credentials to public forums. Please 
renew the credentials of your storage account as soon as possible!

Best Regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io<mailto:roland.joh...@phenetic.io>
Web: phenetic.io<http://phenetic.io>

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann




Am 23.08.2019 um 08:33 schrieb Krishna Chandran Nair 
<kcn...@qatarairways.com.qa<mailto:kcn...@qatarairways.com.qa>>:



Hi Team,

I have written a small code to connect to azure blob storage but go error. I 
have attached the error log.  Please help

Calling command -- ./spark-submit stg.py --jars 
/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-azure-3.2.0.jar,/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/azure-storage-8.4.0.jar

Code


vi ~/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import DataFrameReader
from pyspark.sql import SparkSession

session = SparkSession.builder.getOrCreate()


#session.conf.set("fs.azure", 
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
#session.conf.set("fs.hdfs.impl", 
"org.apache.hadoop.hdfs.DistributedFileSystem")
#session.conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")



#session.conf.set(
#   "fs.azure.sas.snowflakestrg.blob.core.windows.net/test",
  #  
"?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:37:05Z&st=2019-08-13T08:37:05Z&spr=https&sig=BgTl8mibE%2B%2BTTIMG4dKR17NnGinMWEVTtn888MD8PT4%3D"
#)

session.conf.set(
          "fs.azure.account.key.snowflakestrg.blob.core.windows.net",
            
"LIWCYzrJOS4hs0DiQH6fAzjuBnuj/F8myVmJImomEqOqlAV4pSt7KWfr24mj0saaOTVNZkGTKUn41k4e9hqKSA==")

df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv")

df.show(5)


Qatar Airways - Going Places Together

[OW LOGO]

Disclaimer:- This message (including attachments) is intended solely for the 
addressee named above. It may be confidential, privileged, subject to 
copyright, trade secret, or other legal rules and may not be forwarded without 
the author's permission. If you are not the addressee you must not read, copy 
or disseminate this message. If you have received it in error please notify the 
sender immediately and delete the message from all storage devices. Any 
opinions expressed in this message do not necessarily represent the official 
positions of Qatar Airways. Any agreements (including any warranties, 
representations, or offers) concluded with Qatar Airways by using electronic 
correspondence shall only come into existence if an authorized representative 
of Qatar Airways has explicitly approved such contract formation. To the 
fullest extent permissible by law, Qatar Airways disclaim all liability for 
loss or damage to person or property arising from this message being infected 
by computer virus or other contamination.

Error

------------------------------

citus@azcitusclient:~/spark/spark-2.3.3-bin-hadoop2.7/bin$ ./spark-submit 
stg.py --jars 
/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-azure-3.2.0.jar,/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/azure-storage-8.4.0.jar
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.apache.hadoop.security.authentication.util.KerberosUtil 
(file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/jars/hadoop-auth-2.7.3.jar) 
to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of 
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/08/19 08:55:31 DEBUG NativeCodeLoader: Trying to load the custom-built 
native-hadoop library...
19/08/19 08:55:31 DEBUG NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path: 
[/usr/java/packages/lib, /usr/lib/x86_64-linux-gnu/jni, /lib/x86_64-linux-gnu, 
/usr/lib/x86_64-linux-gnu, /usr/lib/jni, /lib, /usr/lib]
19/08/19 08:55:31 DEBUG NativeCodeLoader: 
java.library.path=/usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
19/08/19 08:55:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
19/08/19 08:55:31 INFO SparkContext: Running Spark version 2.3.3
19/08/19 08:55:31 INFO SparkContext: Submitted application: stg.py
19/08/19 08:55:31 INFO SecurityManager: Changing view acls to: citus
19/08/19 08:55:31 INFO SecurityManager: Changing modify acls to: citus
19/08/19 08:55:31 INFO SecurityManager: Changing view acls groups to:
19/08/19 08:55:31 INFO SecurityManager: Changing modify acls groups to:
19/08/19 08:55:31 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(citus); groups 
with view permissions: Set(); users  with modify permissions: Set(citus); 
groups with modify permissions: Set()
19/08/19 08:55:32 INFO Utils: Successfully started service 'sparkDriver' on 
port 45463.
19/08/19 08:55:32 INFO SparkEnv: Registering MapOutputTracker
19/08/19 08:55:32 INFO SparkEnv: Registering BlockManagerMaster
19/08/19 08:55:32 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/08/19 08:55:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/08/19 08:55:32 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-5c785fe1-8875-45c8-ba1f-0a0a4e04c4b9
19/08/19 08:55:32 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/08/19 08:55:32 INFO SparkEnv: Registering OutputCommitCoordinator
19/08/19 08:55:32 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
19/08/19 08:55:32 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://azcitusclient:4040
19/08/19 08:55:33 INFO SparkContext: Added file 
file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py at 
file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py with timestamp 
1566204933258
19/08/19 08:55:33 INFO Utils: Copying 
/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py to 
/tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d/userFiles-82f72b0a-5cc1-4884-a641-ac59b00d2217/stg.py
19/08/19 08:55:33 INFO Executor: Starting executor ID driver on host localhost
19/08/19 08:55:33 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 36155.
19/08/19 08:55:33 INFO NettyBlockTransferService: Server created on 
azcitusclient:36155
19/08/19 08:55:33 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
19/08/19 08:55:33 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, azcitusclient, 36155, None)
19/08/19 08:55:33 INFO BlockManagerMasterEndpoint: Registering block manager 
azcitusclient:36155 with 434.4 MB RAM, BlockManagerId(driver, azcitusclient, 
36155, None)
19/08/19 08:55:33 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, azcitusclient, 36155, None)
19/08/19 08:55:33 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, azcitusclient, 36155, None)
19/08/19 08:55:33 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/spark-warehouse/').
19/08/19 08:55:33 INFO SharedState: Warehouse path is 
'file:/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/spark-warehouse/'.
19/08/19 08:55:34 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
19/08/19 08:55:34 WARN FileStreamSink: Error while looking for metadata 
directory.
Traceback (most recent call last):
  File "/home/citus/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py", line 24, in 
<module>
    
df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv")
  File 
"/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
 line 441, in csv
  File 
"/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
  File 
"/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py",
 line 63, in deco
  File 
"/home/citus/spark/spark-2.3.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o22.csv.
: java.io.IOException: No FileSystem for scheme: wasbs
        at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:709)
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390)
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:390)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:389)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
        at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:834)

19/08/19 08:55:34 INFO SparkContext: Invoking stop() from shutdown hook
19/08/19 08:55:34 INFO SparkUI: Stopped Spark web UI at 
http://azcitusclient:4040
19/08/19 08:55:34 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
19/08/19 08:55:34 INFO MemoryStore: MemoryStore cleared
19/08/19 08:55:34 INFO BlockManager: BlockManager stopped
19/08/19 08:55:34 INFO BlockManagerMaster: BlockManagerMaster stopped
19/08/19 08:55:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
19/08/19 08:55:34 INFO SparkContext: Successfully stopped SparkContext
19/08/19 08:55:34 INFO ShutdownHookManager: Shutdown hook called
19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d/pyspark-f6c990dd-4a77-44d4-beb5-f12951b8d75f
19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-605f41ed-58c7-4824-9d2c-90479e36ccd1
19/08/19 08:55:34 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-fa4efcd6-393e-48e1-9f66-58e5e76c7a8d





vi ~/spark/spark-2.3.3-bin-hadoop2.7/bin/stg.py

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import DataFrameReader
from pyspark.sql import SparkSession

session = SparkSession.builder.getOrCreate()


#session.conf.set("fs.azure", 
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
#session.conf.set("fs.hdfs.impl", 
"org.apache.hadoop.hdfs.DistributedFileSystem")
#session.conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")



#session.conf.set(
 #   "fs.azure.sas.snowflakestrg.blob.core.windows.net/test",
  #  
"?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:37:05Z&st=2019-08-13T08:37:05Z&spr=https&sig=BgTl8mibE%2B%2BTTIMG4dKR17NnGinMWEVTtn888MD8PT4%3D"
#)

session.conf.set(
          "fs.azure.account.key.snowflakestrg.blob.core.windows.net",
            
"LIWCYzrJOS4hs0DiQH6fAzjuBnuj/F8myVmJImomEqOqlAV4pSt7KWfr24mj0saaOTVNZkGTKUn41k4e9hqKSA==")

df=session.read.csv("wasbs://t...@snowflakestrg.blob.core.windows.net/users.csv")

df.show(5)

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: [External]Re: error while connecting to azure blob storage

Reply via email to