Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Axel Dahl
created as SPARK-8685 https://issues.apache.org/jira/browse/SPARK-8685 @Yin, thx, have fixed sample code with the correct names. On Sat, Jun 27, 2015 at 1:56 PM, Yin Huai wrote: > Axel, > > Can you file a jira and attach your code in the description of the jira? > This looks like a bug. > > Fo

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Yin Huai
Axel, Can you file a jira and attach your code in the description of the jira? This looks like a bug. For the third row of df1, the name is "alice" instead of "carol", right? Otherwise, "carol" should appear in the expected output. Btw, to get rid of those columns with the same name after the jo

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
I would test it against 1.3 to be sure, because it could -- though unlikely -- be a regression. For example, I recently stumbled upon this issue which was specific to 1.4. On Sat, Jun 27, 2015 at 12:28 PM Axel Dahl wrote: > I've only tested on 1

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Axel Dahl
I've only tested on 1.4, but imagine 1.3 is the same or a lot of people's code would be failing right now. On Saturday, June 27, 2015, Nicholas Chammas wrote: > Yeah, you shouldn't have to rename the columns before joining them. > > Do you see the same behavior on 1.3 vs 1.4? > > Nick > 2015년 6월

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs 1.4? Nick 2015년 6월 27일 (토) 오전 2:51, Axel Dahl 님이 작성: > still feels like a bug to have to create unique names before a join. > > On Fri, Jun 26, 2015 at 9:51 PM, ayan guha wrote: > >> You c

Re: dataframe left joins are not working as expected in pyspark

2015-06-26 Thread Axel Dahl
still feels like a bug to have to create unique names before a join. On Fri, Jun 26, 2015 at 9:51 PM, ayan guha wrote: > You can declare the schema with unique names before creation of df. > On 27 Jun 2015 13:01, "Axel Dahl" wrote: > >> >> I have the following code: >> >> from pyspark import SQ

Re: dataframe left joins are not working as expected in pyspark

2015-06-26 Thread ayan guha
You can declare the schema with unique names before creation of df. On 27 Jun 2015 13:01, "Axel Dahl" wrote: > > I have the following code: > > from pyspark import SQLContext > > d1 = [{'name':'bob', 'country': 'usa', 'age': 1}, {'name':'alice', > 'country': 'jpn', 'age': 2}, {'name':'carol', 'co

dataframe left joins are not working as expected in pyspark

2015-06-26 Thread Axel Dahl
I have the following code: from pyspark import SQLContext d1 = [{'name':'bob', 'country': 'usa', 'age': 1}, {'name':'alice', 'country': 'jpn', 'age': 2}, {'name':'carol', 'country': 'ire', 'age': 3}] d2 = [{'name':'bob', 'country': 'usa', 'colour':'red'}, {'name':'alice', 'country': 'ire', 'colou