I have a spark job that writes data to S3 as below. source_data_df_to_write.select(target_columns_list) \ .write.partitionBy(target_partition_cols_list) \ .format("ORC").save(self.table_location_prefix + self.target_table, mode="append")
My dataframe some times can have null values for columns. Writing dataframe with null attributes fails my job stating IllegalArgumentException as below. Caused by: java.lang.*IllegalArgumentException: Error: type expected at the position 14 of *'double:string:null:string:string:string:double:bigint:null:null:null:null:string:null:string:null:null:null:null:string:string:string:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:null:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string'* but 'null' is found*. Sample dataframe looks like this: columns_with_default = "col1, NULL as col2, col2, col4, NULL as col5, partition_col1, partition_col2" source_data_df_to_write = self.session.sql( "SELECT %s FROM TEMP_VIEW" % (columns_with_default)) So, is there a way to make spark job to write dataframe with NULL attributes to S3? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org