If yo change to ```val numbers2 = numbers```,  then it have the same problem

On Tue, Jun 23, 2015 at 2:54 PM, Ignacio Blasco <[email protected]> wrote:
> It seems that it doesn't happen in Scala API. Not exactly the same as in
> python, but pretty close.
>
> https://gist.github.com/elnopintan/675968d2e4be68958df8
>
> 2015-06-23 23:11 GMT+02:00 Davies Liu <[email protected]>:
>>
>> I think it also happens in DataFrames API of all languages.
>>
>> On Tue, Jun 23, 2015 at 9:16 AM, Ignacio Blasco <[email protected]>
>> wrote:
>> > That issue happens only in python dsl?
>> >
>> > El 23/6/2015 5:05 p. m., "Bob Corsaro" <[email protected]> escribió:
>> >>
>> >> Thanks! The solution:
>> >>
>> >> https://gist.github.com/dokipen/018a1deeab668efdf455
>> >>
>> >> On Mon, Jun 22, 2015 at 4:33 PM Davies Liu <[email protected]>
>> >> wrote:
>> >>>
>> >>> Right now, we can not figure out which column you referenced in
>> >>> `select`, if there are multiple row with the same name in the joined
>> >>> DataFrame (for example, two `value`).
>> >>>
>> >>> A workaround could be:
>> >>>
>> >>> numbers2 = numbers.select(df.name, df.value.alias('other'))
>> >>> rows = numbers.join(numbers2,
>> >>>                     (numbers.name==numbers2.name) & (numbers.value !=
>> >>> numbers2.other),
>> >>>                     how="inner") \
>> >>>               .select(numbers.name, numbers.value, numbers2.other) \
>> >>>               .collect()
>> >>>
>> >>> On Mon, Jun 22, 2015 at 12:53 PM, Ignacio Blasco
>> >>> <[email protected]>
>> >>> wrote:
>> >>> > Sorry thought it was scala/spark
>> >>> >
>> >>> > El 22/6/2015 9:49 p. m., "Bob Corsaro" <[email protected]>
>> >>> > escribió:
>> >>> >>
>> >>> >> That's invalid syntax. I'm pretty sure pyspark is using a DSL to
>> >>> >> create a
>> >>> >> query here and not actually doing an equality operation.
>> >>> >>
>> >>> >> On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco
>> >>> >> <[email protected]>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Probably you should use === instead of == and !== instead of !=
>> >>> >>>
>> >>> >>> Can anyone explain why the dataframe API doesn't work as I expect
>> >>> >>> it
>> >>> >>> to
>> >>> >>> here? It seems like the column identifiers are getting confused.
>> >>> >>>
>> >>> >>> https://gist.github.com/dokipen/4b324a7365ae87b7b0e5
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to