: 2016-01-20 18:31
To: fightf...@163.com
CC: user
Subject: Re: Re: spark dataframe jdbc read/write using dbcp connection pool
Hi,
I think you can view the spark job ui to find out whether the partition works
or not,pay attention to the storage page to the partition size and which stage
/ task fails
; is there any other
> alternatives I can choose to tune this ?
>
> Best,
> Sun.
>
> --
> fightf...@163.com
>
>
> *From:* fightf...@163.com
> *Date:* 2016-01-20 15:06
> *To:* 刘虓
> *CC:* user
> *Subject:* Re: Re: spark dataframe jd
sfully. Do I need to increase the partitions? Or is
there any other
alternatives I can choose to tune this ?
Best,
Sun.
fightf...@163.com
From: fightf...@163.com
Date: 2016-01-20 15:06
To: 刘虓
CC: user
Subject: Re: Re: spark dataframe jdbc read/write using dbcp connection pool
Hi,
Thanks a lot
4")
The added_year column in mysql table contains range of (1985-2015), and I pass
the numPartitions property
to get the partition purpose. Is this what you recommend ? Can you advice a
little more implementation on this ?
Best,
Sun.
fightf...@163.com
From: 刘虓
Date: 2016-01-20 11:26
Hi,
I suggest you partition the JDBC reading on a indexed column of the mysql
table
2016-01-20 10:11 GMT+08:00 fightf...@163.com :
> Hi ,
> I want to load really large volumn datasets from mysql using spark
> dataframe api. And then save as
> parquet file or orc file to facilitate that with hive