Re: equalTo isin not working as expected with a constructed column with DataFrames

2016-02-19 Thread Michael Armbrust
Can you include the output of explain(true) on the dataframe in question.
It would also be really helpful to see a small code fragment that
reproduces the issue.

On Thu, Feb 18, 2016 at 9:10 AM, Mehdi Ben Haj Abbes 
wrote:

> Hi,
> I forgot to mention that I'm using the 1.5.1 version.
> Regards,
>
> On Thu, Feb 18, 2016 at 4:20 PM, Mehdi Ben Haj Abbes <
> mehdi.ab...@gmail.com> wrote:
>
>> Hi folks,
>>
>> I have DataFrame with let's say this schema :
>> -dealId,
>> -ptf,
>> -ts
>> from it I derive another dataframe (lets call it df) to which I add an
>> extra column (withColumn) which is the concatenation of the 3 existing
>> columns and I call it (the new column) "theone"
>>
>> When I print the schema for the new dataframe "theone" column has a
>> String type. And when I do
>> df.where(df.col("theone").equalTo("nonExistantValue")).toJavaRDD.count well
>> I get the initial size of df as if the filtring did not work. but If I do
>> the same query but filtring on one of the original columns I get what is
>> expected as count which 0
>>
>> The same goes for isin
>>
>> Any help will be more than appreciated.
>>
>> Best regards,
>>
>>
>> --
>> Mehdi BEN HAJ ABBES
>>
>>
>
>
> --
> Mehdi BEN HAJ ABBES
>
>


Re: equalTo isin not working as expected with a constructed column with DataFrames

2016-02-18 Thread Mehdi Ben Haj Abbes
Hi,
I forgot to mention that I'm using the 1.5.1 version.
Regards,

On Thu, Feb 18, 2016 at 4:20 PM, Mehdi Ben Haj Abbes 
wrote:

> Hi folks,
>
> I have DataFrame with let's say this schema :
> -dealId,
> -ptf,
> -ts
> from it I derive another dataframe (lets call it df) to which I add an
> extra column (withColumn) which is the concatenation of the 3 existing
> columns and I call it (the new column) "theone"
>
> When I print the schema for the new dataframe "theone" column has a String
> type. And when I do
> df.where(df.col("theone").equalTo("nonExistantValue")).toJavaRDD.count well
> I get the initial size of df as if the filtring did not work. but If I do
> the same query but filtring on one of the original columns I get what is
> expected as count which 0
>
> The same goes for isin
>
> Any help will be more than appreciated.
>
> Best regards,
>
>
> --
> Mehdi BEN HAJ ABBES
>
>


-- 
Mehdi BEN HAJ ABBES