Re: How to avoid duplicate column names after join with multiple conditions

2018-07-12 Thread Prem Sure
Yes Nirav, we can probably request dev for a config param enablement to
take care of this automatically (internally) - additional care required
while specifying column names and joining from users

Thanks,
Prem

On Thu, Jul 12, 2018 at 10:53 PM Nirav Patel  wrote:

> Hi Prem, dropping column, renaming column are working for me as a
> workaround. I thought it just nice to have generic api that can handle that
> for me. or some intelligence that since both columns are same it shouldn't
> complain in subsequent Select clause that it doesn't know if I mean a#12 or
> a#81. They are both same just pick one.
>
> On Thu, Jul 12, 2018 at 9:38 AM, Prem Sure  wrote:
>
>> Hi Nirav, did you try
>> .drop(df1(a) after join
>>
>> Thanks,
>> Prem
>>
>> On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel 
>> wrote:
>>
>>> Hi Vamshi,
>>>
>>> That api is very restricted and not generic enough. It imposes that all
>>> conditions of joins has to have same column on both side and it also has to
>>> be equijoin. It doesn't serve my usecase where some join predicates don't
>>> have same column names.
>>>
>>> Thanks
>>>
>>> On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla 
>>> wrote:
>>>
 Nirav,

 Spark does not create a duplicate column when you use the below join
 expression,  as an array of column(s) like below but that requires the
 column name to be same in both the data frames.

 Example: *df1.join(df2, [‘a’])*

 Thanks.
 Vamshi Talla

 On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D 
 wrote:

 Nirav,

 withColumnRenamed() API might help but it does not different column and
 renames all the occurrences of the given column. either use select() API
 and rename as you want.



 Thanks & Regards,
 Gokula Krishnan* (Gokul)*

 On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel 
 wrote:

> Expr is `df1(a) === df2(a) and df1(b) === df2(c)`
>
> How to avoid duplicate column 'a' in result? I don't see any api that
> combines both. Rename manually?
>
>
>
> [image: What's New with Xactly]
> 
>
>
> 
>
> 
>
> 
>
> 
>
> 




>>>
>>>
>>>
>>> [image: What's New with Xactly] 
>>>
>>> 
>>> 
>>>    
>>> 
>>
>>
>
>
>
> [image: What's New with Xactly] 
>
> 
> 
>    
> 


Re: How to avoid duplicate column names after join with multiple conditions

2018-07-12 Thread Nirav Patel
Hi Prem, dropping column, renaming column are working for me as a
workaround. I thought it just nice to have generic api that can handle that
for me. or some intelligence that since both columns are same it shouldn't
complain in subsequent Select clause that it doesn't know if I mean a#12 or
a#81. They are both same just pick one.

On Thu, Jul 12, 2018 at 9:38 AM, Prem Sure  wrote:

> Hi Nirav, did you try
> .drop(df1(a) after join
>
> Thanks,
> Prem
>
> On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel  wrote:
>
>> Hi Vamshi,
>>
>> That api is very restricted and not generic enough. It imposes that all
>> conditions of joins has to have same column on both side and it also has to
>> be equijoin. It doesn't serve my usecase where some join predicates don't
>> have same column names.
>>
>> Thanks
>>
>> On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla 
>> wrote:
>>
>>> Nirav,
>>>
>>> Spark does not create a duplicate column when you use the below join
>>> expression,  as an array of column(s) like below but that requires the
>>> column name to be same in both the data frames.
>>>
>>> Example: *df1.join(df2, [‘a’])*
>>>
>>> Thanks.
>>> Vamshi Talla
>>>
>>> On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D 
>>> wrote:
>>>
>>> Nirav,
>>>
>>> withColumnRenamed() API might help but it does not different column and
>>> renames all the occurrences of the given column. either use select() API
>>> and rename as you want.
>>>
>>>
>>>
>>> Thanks & Regards,
>>> Gokula Krishnan* (Gokul)*
>>>
>>> On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel 
>>> wrote:
>>>
 Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

 How to avoid duplicate column 'a' in result? I don't see any api that
 combines both. Rename manually?



 [image: What's New with Xactly]
 


 

 

 

 

 
>>>
>>>
>>>
>>>
>>
>>
>>
>> [image: What's New with Xactly] 
>>
>> 
>> 
>>    
>> 
>
>

-- 


 

 
   
   
      



Re: How to avoid duplicate column names after join with multiple conditions

2018-07-12 Thread Prem Sure
Hi Nirav, did you try
.drop(df1(a) after join

Thanks,
Prem

On Thu, Jul 12, 2018 at 9:50 PM Nirav Patel  wrote:

> Hi Vamshi,
>
> That api is very restricted and not generic enough. It imposes that all
> conditions of joins has to have same column on both side and it also has to
> be equijoin. It doesn't serve my usecase where some join predicates don't
> have same column names.
>
> Thanks
>
> On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla  wrote:
>
>> Nirav,
>>
>> Spark does not create a duplicate column when you use the below join
>> expression,  as an array of column(s) like below but that requires the
>> column name to be same in both the data frames.
>>
>> Example: *df1.join(df2, [‘a’])*
>>
>> Thanks.
>> Vamshi Talla
>>
>> On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D 
>> wrote:
>>
>> Nirav,
>>
>> withColumnRenamed() API might help but it does not different column and
>> renames all the occurrences of the given column. either use select() API
>> and rename as you want.
>>
>>
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>> On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel 
>> wrote:
>>
>>> Expr is `df1(a) === df2(a) and df1(b) === df2(c)`
>>>
>>> How to avoid duplicate column 'a' in result? I don't see any api that
>>> combines both. Rename manually?
>>>
>>>
>>>
>>> [image: What's New with Xactly]
>>> 
>>>
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>
>>
>>
>>
>
>
>
> [image: What's New with Xactly] 
>
> 
> 
>    
> 


Re: How to avoid duplicate column names after join with multiple conditions

2018-07-12 Thread Nirav Patel
Hi Vamshi,

That api is very restricted and not generic enough. It imposes that all
conditions of joins has to have same column on both side and it also has to
be equijoin. It doesn't serve my usecase where some join predicates don't
have same column names.

Thanks

On Sun, Jul 8, 2018 at 7:39 PM, Vamshi Talla  wrote:

> Nirav,
>
> Spark does not create a duplicate column when you use the below join
> expression,  as an array of column(s) like below but that requires the
> column name to be same in both the data frames.
>
> Example: *df1.join(df2, [‘a’])*
>
> Thanks.
> Vamshi Talla
>
> On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D  wrote:
>
> Nirav,
>
> withColumnRenamed() API might help but it does not different column and
> renames all the occurrences of the given column. either use select() API
> and rename as you want.
>
>
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
> On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel  wrote:
>
>> Expr is `df1(a) === df2(a) and df1(b) === df2(c)`
>>
>> How to avoid duplicate column 'a' in result? I don't see any api that
>> combines both. Rename manually?
>>
>>
>>
>> [image: What's New with Xactly]
>> 
>>
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>
>
>
>

-- 


 

 
   
   
      



Re: How to avoid duplicate column names after join with multiple conditions

2018-07-08 Thread Vamshi Talla
Nirav,

Spark does not create a duplicate column when you use the below join 
expression,  as an array of column(s) like below but that requires the column 
name to be same in both the data frames.

Example: df1.join(df2, [‘a’])

Thanks.
Vamshi Talla

On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D 
mailto:email2...@gmail.com>> wrote:

Nirav,

withColumnRenamed() API might help but it does not different column and renames 
all the occurrences of the given column. either use select() API and rename as 
you want.



Thanks & Regards,
Gokula Krishnan (Gokul)

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel 
mailto:npa...@xactlycorp.com>> wrote:
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that combines 
both. Rename manually?



[What's New with 
Xactly]

[https://www.xactlycorp.com/wp-content/uploads/2017/09/insta.png]
  [https://www.xactlycorp.com/wp-content/uploads/2017/09/linkedin.png] 

   [https://www.xactlycorp.com/wp-content/uploads/2017/09/twitter.png] 

   [https://www.xactlycorp.com/wp-content/uploads/2017/09/facebook.png] 

   [https://www.xactlycorp.com/wp-content/uploads/2017/09/youtube.png] 





Re: How to avoid duplicate column names after join with multiple conditions

2018-07-06 Thread Gokula Krishnan D
Nirav,

withColumnRenamed() API might help but it does not different column and
renames all the occurrences of the given column. either use select() API
and rename as you want.



Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel  wrote:

> Expr is `df1(a) === df2(a) and df1(b) === df2(c)`
>
> How to avoid duplicate column 'a' in result? I don't see any api that
> combines both. Rename manually?
>
>
>
> [image: What's New with Xactly] 
>
> 
> 
>    
> 


How to avoid duplicate column names after join with multiple conditions

2018-07-02 Thread Nirav Patel
Expr is `df1(a) === df2(a) and df1(b) === df2(c)`

How to avoid duplicate column 'a' in result? I don't see any api that
combines both. Rename manually?

--