Re: Using a variable (a column name) in an IF statement in Spark SQL

2015-10-09 Thread Michael Armbrust
I'm thinking there must be a typo somewhere else as this works for me on
Spark 1.4:

Seq(("1231234", 1)).toDF("barcode", "items").registerTempTable("goods")

sql("SELECT barcode, IF(items IS NULL, 0, items) FROM goods").collect()

res1: Array[org.apache.spark.sql.Row] = Array([1231234,1])

I'll also note that you are essentially doing a coalesce here (i.e.
coalesce(items,
0))

Spark 1.5 improved error message here a bunch, you might try upgrading to
see what is wrong.

On Thu, Oct 8, 2015 at 7:28 PM, Maheshakya Wijewardena 
wrote:

> Spark version: 1.4.1
> The schema is "barcode STRING, items INT"
>
> On Thu, Oct 8, 2015 at 10:48 PM, Michael Armbrust 
> wrote:
>
>> Hmm, that looks like it should work to me.  What version of Spark?  What
>> is the schema of goods?
>>
>> On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena <
>> mahesha...@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> Suppose there is data frame called goods with columns "barcode" and
>>> "items". Some of the values in the column "items" can be null.
>>>
>>> I want to the barcode and the respective items from the table adhering
>>> the following rules:
>>>
>>>- If "items" is null -> output 0
>>>- else -> output "items" ( the actual value in the column)
>>>
>>> I would write a query like:
>>>
>>> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>>>
>>> But this query fails with the error:
>>>
>>> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
>>> c0#132]; *
>>>
>>> It seems I can only use numerical values inside this IF statement, but
>>> when a column name is used, it fails.
>>>
>>> Is there any workaround to do this?
>>>
>>> Best regards.
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> Software Engineer
>>> WSO2 : http://wso2.com/
>>> Email: mahesha...@wso2.com
>>> Mobile: +94711228855
>>>
>>>
>>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: mahesha...@wso2.com
> Mobile: +94711228855
>
>
>


Re: Using a variable (a column name) in an IF statement in Spark SQL

2015-10-08 Thread Michael Armbrust
Hmm, that looks like it should work to me.  What version of Spark?  What is
the schema of goods?

On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena 
wrote:

> Hi,
>
> Suppose there is data frame called goods with columns "barcode" and
> "items". Some of the values in the column "items" can be null.
>
> I want to the barcode and the respective items from the table adhering the
> following rules:
>
>- If "items" is null -> output 0
>- else -> output "items" ( the actual value in the column)
>
> I would write a query like:
>
> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>
> But this query fails with the error:
>
> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
> c0#132]; *
>
> It seems I can only use numerical values inside this IF statement, but
> when a column name is used, it fails.
>
> Is there any workaround to do this?
>
> Best regards.
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: mahesha...@wso2.com
> Mobile: +94711228855
>
>
>


Re: Using a variable (a column name) in an IF statement in Spark SQL

2015-10-08 Thread Maheshakya Wijewardena
Spark version: 1.4.1
The schema is "barcode STRING, items INT"

On Thu, Oct 8, 2015 at 10:48 PM, Michael Armbrust 
wrote:

> Hmm, that looks like it should work to me.  What version of Spark?  What
> is the schema of goods?
>
> On Thu, Oct 8, 2015 at 6:13 AM, Maheshakya Wijewardena <
> mahesha...@wso2.com> wrote:
>
>> Hi,
>>
>> Suppose there is data frame called goods with columns "barcode" and
>> "items". Some of the values in the column "items" can be null.
>>
>> I want to the barcode and the respective items from the table adhering
>> the following rules:
>>
>>- If "items" is null -> output 0
>>- else -> output "items" ( the actual value in the column)
>>
>> I would write a query like:
>>
>> *SELECT barcode, IF(items is null, 0, items) FROM goods*
>>
>> But this query fails with the error:
>>
>> *unresolved operator 'Project [if (IS NULL items#1) 0 else items#1 AS
>> c0#132]; *
>>
>> It seems I can only use numerical values inside this IF statement, but
>> when a column name is used, it fails.
>>
>> Is there any workaround to do this?
>>
>> Best regards.
>> --
>> Pruthuvi Maheshakya Wijewardena
>> Software Engineer
>> WSO2 : http://wso2.com/
>> Email: mahesha...@wso2.com
>> Mobile: +94711228855
>>
>>
>>
>


-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 : http://wso2.com/
Email: mahesha...@wso2.com
Mobile: +94711228855