Had the same issue my self. I was surprised at first as well, but I found
it useful - the amount of data saved for each partition has decreased.
When I load the data from each partition, I add the partitioned columns
with lit function before I merge the frames from the
different partitions.
On Tue
The partitionBy clause is used to create hive folders so that you can point
a hive partitioned table on the data .
What are you using the partitionBy for ? What is the use case ?
On Mon 4 Jun, 2018, 4:59 PM purna pradeep, wrote:
> im reading below json in spark
>
> {"bucket": "B01", "action
Purna,
This behavior is by design. If you provide partitionBy, Spark removes the
columns from the data
From: purna pradeep
Date: Monday, June 4, 2018 at 8:00 PM
To: "user@spark.apache.org"
Subject: spark partitionBy with partitioned column in json output
im reading below jso
im reading below json in spark
{"bucket": "B01", "actionType": "A1", "preaction": "NULL",
"postaction": "NULL"}
{"bucket": "B02", "actionType": "A2", "preaction": "NULL",
"postaction": "NULL"}
{"bucket": "B03", "actionType": "A3", "preaction": "NULL",
"postaction": "NULL"}
val df=