ot;, "USER_DIM_USER_ID")
> .withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID")
> .as("userdim")
> , userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID"
> && userAndRetailDates("US
R_ID")
.withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID")
.as("userdim")
, userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID"
&& userAndRetailDates("USER_CNTRY_ID") <=> $"us
Can you try the lastest 1.6.0 RC which includes SPARK-1 ?
Cheers
On Fri, Dec 18, 2015 at 7:38 AM, Prasad Ravilla wrote:
> Hi,
>
> I am running into performance issue when joining data frames created from
> avro files using spark-avro library.
>
> The data frames are created from 120K avro f
Hi,
I am running into performance issue when joining data frames created from avro
files using spark-avro library.
The data frames are created from 120K avro files and the total size is around
1.5 TB.
The two data frames are very huge with billions of records.
The join for these two DataFrames