Dataframe filter based on another Dataframe

2015-04-29 Thread Olivier Girardot
Hi everyone,
what is the most efficient way to filter a DataFrame on a column from
another Dataframe's column. The best idea I had, was to join the two
dataframes :

 val df1 : Dataframe
 val df2: Dataframe
 df1.join(df2, df1(id) === df2(id), inner)

But I end up (obviously) with the id column twice.
Another approach would be to filter df1 but I can't seem to get this to
work using df2's column as a base

Any idea ?

Regards,

Olivier.


Re: Dataframe filter based on another Dataframe

2015-04-29 Thread ayan guha
You can use .select to project only columns you need

On Wed, Apr 29, 2015 at 9:23 PM, Olivier Girardot ssab...@gmail.com wrote:

 Hi everyone,
 what is the most efficient way to filter a DataFrame on a column from
 another Dataframe's column. The best idea I had, was to join the two
 dataframes :

  val df1 : Dataframe
  val df2: Dataframe
  df1.join(df2, df1(id) === df2(id), inner)

 But I end up (obviously) with the id column twice.
 Another approach would be to filter df1 but I can't seem to get this to
 work using df2's column as a base

 Any idea ?

 Regards,

 Olivier.




-- 
Best Regards,
Ayan Guha


Re: Dataframe filter based on another Dataframe

2015-04-29 Thread Olivier Girardot
You mean after joining ? Sure, my question was more if there was any best
practice preferred to joining the other dataframe for filtering.

Regards,

Olivier.

Le mer. 29 avr. 2015 à 13:23, Olivier Girardot ssab...@gmail.com a écrit :

 Hi everyone,
 what is the most efficient way to filter a DataFrame on a column from
 another Dataframe's column. The best idea I had, was to join the two
 dataframes :

  val df1 : Dataframe
  val df2: Dataframe
  df1.join(df2, df1(id) === df2(id), inner)

 But I end up (obviously) with the id column twice.
 Another approach would be to filter df1 but I can't seem to get this to
 work using df2's column as a base

 Any idea ?

 Regards,

 Olivier.