Can you please send the error message? it would ve very helpful to get to the root cause.
On Sun, Jul 19, 2020 at 10:57 PM anbutech <anbutec...@outlook.com> wrote: > Hi Team, > > I'm facing weird behavior in the pyspark dataframe(databricks delta spark > 3.0.0 supported) > > I have tried the below two options to write the processed datafame data > into > delta table with respect to the partition columns in the table.Actually > overwrite mode completely overwrite the whole table.i couldn't figure it > out > why did the dataframe fully overwrite here. > > Also i'm getting the following error while testing with below option 2 > > > Predicate references non-partition column 'json_feeds_flatten_data'. Only > the partition columns may be referenced: [table_name, y, m, d, h]; > > could you please me why did the pyspark behavior like this?.It would be > very > helpful to know the mistake here. > > sample partition column values: > ------------------------------- > > table_name='json_feeds_flatten_data' > y=2020 > m=7 > d=19 > h=0 > > Option 1: > > partition_keys=['table_name','y','m','d','h'] > > (final_df > .withColumn('y', lit(y).cast('int')) > .withColumn('m', lit(m).cast('int')) > .withColumn('d', lit(d).cast('int')) > .withColumn('h', lit(h).cast('int')) > .write > .partitionBy(partition_keys) > .format("delta") > .mode('overwrite') > .saveAsTable(target_table) > ) > > Option 2: > > rep_wh = 'table_name={} AND y={} AND m={} AND d={} AND > h={}'.format(table_name,y, m, d, h) > (final_df > .withColumn('y', lit(y).cast('int')) > .withColumn('m', lit(m).cast('int')) > .withColumn('d', lit(d).cast('int')) > .withColumn('h', lit(h).cast('int')) > .write > .format("delta") > .mode('overwrite') > .option('replaceWhere', rep_wh ) > .saveAsTable(target_table) > ) > > Thanks > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >