Pretty much what it says? you are creating a table over a path that already has data in it. You can't do that without mode=overwrite at least, if that's what you intend.
On Mon, Aug 1, 2022 at 7:29 PM Kumba Janga <kyja...@gmail.com> wrote: > > > - Component: Spark Delta, Spark SQL > - Level: Beginner > - Scenario: Debug, How-to > > *Python in Jupyter:* > > import pyspark > import pyspark.sql.functions > > from pyspark.sql import SparkSession > spark = ( > SparkSession > .builder > .appName("programming") > .master("local") > .config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0") > .config("spark.sql.extensions", > "io.delta.sql.DeltaSparkSessionExtension") > .config("spark.sql.catalog.spark_catalog", > "org.apache.spark.sql.delta.catalog.DeltaCatalog") > .config('spark.ui.port', '4050') > .getOrCreate() > > ) > from delta import * > > string_20210609 = '''worked_date,worker_id,delete_flag,hours_worked > 2021-06-09,1001,Y,7 > 2021-06-09,1002,Y,3.75 > 2021-06-09,1003,Y,7.5 > 2021-06-09,1004,Y,6.25''' > > rdd_20210609 = spark.sparkContext.parallelize(string_20210609.split('\n')) > > # FILES WILL SHOW UP ON THE LEFT UNDER THE FOLDER ICON IF YOU WANT TO BROWSE > THEM > OUTPUT_DELTA_PATH = './output/delta/' > > spark.sql('CREATE DATABASE IF NOT EXISTS EXERCISE') > > spark.sql(''' > CREATE TABLE IF NOT EXISTS EXERCISE.WORKED_HOURS( > worked_date date > , worker_id int > , delete_flag string > , hours_worked double > ) USING DELTA > > > PARTITIONED BY (worked_date) > LOCATION "{0}" > '''.format(OUTPUT_DELTA_PATH) > ) > > *Error Message:* > > AnalysisException Traceback (most recent call > last)<ipython-input-13-e0469b5852dd> in <module> 4 spark.sql('CREATE > DATABASE IF NOT EXISTS EXERCISE') 5 ----> 6 spark.sql(''' 7 > CREATE TABLE IF NOT EXISTS EXERCISE.WORKED_HOURS( 8 worked_date > date > /Users/kyjan/spark-3.0.3-bin-hadoop2.7\python\pyspark\sql\session.py in > sql(self, sqlQuery) 647 [Row(f1=1, f2=u'row1'), Row(f1=2, > f2=u'row2'), Row(f1=3, f2=u'row3')] 648 """--> 649 return > DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) 650 651 > @since(2.0) > \Users\kyjan\spark-3.0.3-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py > in __call__(self, *args) 1302 1303 answer = > self.gateway_client.send_command(command)-> 1304 return_value = > get_return_value( 1305 answer, self.gateway_client, > self.target_id, self.name) 1306 > /Users/kyjan/spark-3.0.3-bin-hadoop2.7\python\pyspark\sql\utils.py in > deco(*a, **kw) 132 # Hide where the exception came from > that shows a non-Pythonic 133 # JVM exception message.--> > 134 raise_from(converted) 135 else: 136 > raise > /Users/kyjan/spark-3.0.3-bin-hadoop2.7\python\pyspark\sql\utils.py in > raise_from(e) > AnalysisException: Cannot create table ('`EXERCISE`.`WORKED_HOURS`'). The > associated location ('output/delta') is not empty.; > > > -- > Best Wishes, > Kumba Janga > > "The only way of finding the limits of the possible is by going beyond > them into the impossible" > -Arthur C. Clarke >