onvert a DataFrame to
RDD and then invoke the recudeByKey
Ningjun
From: ayan guha [mailto:guha.a...@gmail.com]
Sent: Thursday, April 30, 2015 3:41 AM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: RE: HOw can I merge multiple DataFrame and remove duplicated key
1. Do a group
it using DataFrame? Can you
give an example code snipet?
Thanks
Ningjun
*From:* ayan guha [mailto:guha.a...@gmail.com]
*Sent:* Wednesday, April 29, 2015 5:54 PM
*To:* Wang, Ningjun (LNG-NPV)
*Cc:* user@spark.apache.org
*Subject:* Re: HOw can I merge multiple DataFrame and remove duplicated key
@spark.apache.org
Subject: Re: HOw can I merge multiple DataFrame and remove duplicated key
Its no different, you would use group by and aggregate function to do so.
On 30 Apr 2015 02:15, "Wang, Ningjun (LNG-NPV)"
mailto:ningjun.w...@lexisnexis.com>> wrote:
I have multiple DataFrame objects
Its no different, you would use group by and aggregate function to do so.
On 30 Apr 2015 02:15, "Wang, Ningjun (LNG-NPV)"
wrote:
> I have multiple DataFrame objects each stored in a parquet file. The
> DataFrame just contains 3 columns (id, value, timeStamp). I need to union
> all the DataFra
I have multiple DataFrame objects each stored in a parquet file. The DataFrame
just contains 3 columns (id, value, timeStamp). I need to union all the
DataFrame objects together but for duplicated id only keep the record with the
latest timestamp. How can I do that?
I can do this for RDDs b