subject:"DataFrame API\: how to partition by a \"virtual\" column, or by a nested column\?"

Re: DataFrame API: how to partition by a "virtual" column, or by a nested column?

2016-10-13 Thread Samy Dindane


This partially answers the question: http://stackoverflow.com/a/35449563/604041

On 10/04/2016 03:10 PM, Samy Dindane wrote:

Hi,

I have the following schema:

-root
 |-timestamp
 |-date
   |-year
   |-month
   |-day
 |-some_column
 |-some_other_column

I'd like to achieve either of these:

1) Use the timestamp field to partition by year, month and day.
This looks weird though, as Spark wouldn't magically know how to load the data 
back since the year, month and day columns don't exist in the schema.

2) If 1) is not possible, partition data by date.year, date.month and date.day.
`df.write.partitionBy('date.year')` does not work, since the `date.year` column 
does not exist in the schema.

If 2) isn't possible either, I'll just move year, month and day to the root of 
the schema, which I don't like as it bloats it.

Do you know if any of these is possible?

Thank you,

Samy

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

DataFrame API: how to partition by a "virtual" column, or by a nested column?

2016-10-04 Thread Samy Dindane


Hi,

I have the following schema:

-root
 |-timestamp
 |-date
   |-year
   |-month
   |-day
 |-some_column
 |-some_other_column

I'd like to achieve either of these:

1) Use the timestamp field to partition by year, month and day.
This looks weird though, as Spark wouldn't magically know how to load the data 
back since the year, month and day columns don't exist in the schema.

2) If 1) is not possible, partition data by date.year, date.month and date.day.
`df.write.partitionBy('date.year')` does not work, since the `date.year` column 
does not exist in the schema.

If 2) isn't possible either, I'll just move year, month and day to the root of 
the schema, which I don't like as it bloats it.

Do you know if any of these is possible?

Thank you,

Samy

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: DataFrame API: how to partition by a "virtual" column, or by a nested column?

DataFrame API: how to partition by a "virtual" column, or by a nested column?

2 matches

Site Navigation

Mail list logo

Footer information