Re: Can Spark Dataframes preserve order when joining?

2016-06-30 Thread Takeshi Yamamuro
Hi,

Most of join strategies do not preserve the orderings of input dfs
(sort-merge joins
only hold the ordering of a left input df).
So, as said earlier, you need to explicitly sort them if you want ordered
outputs.

// maropu

On Wed, Jun 29, 2016 at 3:38 PM, Mich Talebzadeh 
wrote:

> Hi,
>
> Well I would not assume anything myself. If you want to order it do it
> explicitly.
>
> Let us take a simple case by creating three DFs based on existing tables
>
> val s =
> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>
> now let us join these tables
>
> val rs =
> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>
> And do ab order explicitly
>
> val rs1 = rs.*orderBy*
> ("calendar_month_desc","channel_desc").take(5).foreach(println)
>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 29 June 2016 at 14:32, Jestin Ma  wrote:
>
>> If it’s not too much trouble, could I get some pointers/help on this?
>> (see link)
>>
>> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>>
>> -also, as a side question, do Dataframes support easy reordering of
>> columns?
>>
>> Thank you!
>> Jestin
>>
>
>


-- 
---
Takeshi Yamamuro


Re: Can Spark Dataframes preserve order when joining?

2016-06-29 Thread Mich Talebzadeh
Hi,

Well I would not assume anything myself. If you want to order it do it
explicitly.

Let us take a simple case by creating three DFs based on existing tables

val s =
HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")

now let us join these tables

val rs =
s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))

And do ab order explicitly

val rs1 = rs.*orderBy*
("calendar_month_desc","channel_desc").take(5).foreach(println)


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 29 June 2016 at 14:32, Jestin Ma  wrote:

> If it’s not too much trouble, could I get some pointers/help on this? (see
> link)
>
> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>
> -also, as a side question, do Dataframes support easy reordering of
> columns?
>
> Thank you!
> Jestin
>


Can Spark Dataframes preserve order when joining?

2016-06-29 Thread Jestin Ma
If it’s not too much trouble, could I get some pointers/help on this? (see link)
http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
 


-also, as a side question, do Dataframes support easy reordering of columns?

Thank you!
Jestin