I have a spark job that writes data to S3 as below.
source_data_df_to_write.select(target_columns_list) \
.write.partitionBy(target_partition_cols_list) \
.format("ORC").save(self.table_location_prefix + self.target_table,
mode="append")

My dataframe some times can have null values for columns. Writing dataframe
with null attributes fails my job stating IllegalArgumentException as below.
Caused by: java.lang.*IllegalArgumentException: Error: type expected at the
position 14 of
*'double:string:null:string:string:string:double:bigint:null:null:null:null:string:null:string:null:null:null:null:string:string:string:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string'*
but 'null' is found*.


Sample dataframe looks like this:
columns_with_default = "col1, NULL as col2, col2, col4, NULL as col5,
partition_col1, partition_col2"
source_data_df_to_write = self.session.sql(
                 "SELECT %s FROM TEMP_VIEW" % (columns_with_default))

So, is there a way to make spark job to write dataframe with NULL attributes
to S3? 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to