Re: dataframe can not find fields after loading from hive

2015-04-19 Thread Yin Huai
Hi Cesar,

Can you try 1.3.1 (
https://spark.apache.org/releases/spark-release-1-3-1.html) and see if it
still shows the error?

Thanks,

Yin

On Fri, Apr 17, 2015 at 1:58 PM, Reynold Xin  wrote:

> This is strange. cc the dev list since it might be a bug.
>
>
>
> On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores  wrote:
>
>> Never mind. I found the solution:
>>
>> val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
>> hiveLoadedDataFrame.schema)
>>
>> which translate to convert the data frame to rdd and back again to data
>> frame. Not the prettiest solution, but at least it solves my problems.
>>
>>
>> Thanks,
>> Cesar Flores
>>
>>
>>
>> On Thu, Apr 16, 2015 at 11:17 AM, Cesar Flores  wrote:
>>
>>>
>>> I have a data frame in which I load data from a hive table. And my issue
>>> is that the data frame is missing the columns that I need to query.
>>>
>>> For example:
>>>
>>> val newdataset = dataset.where(dataset("label") === 1)
>>>
>>> gives me an error like the following:
>>>
>>> ERROR yarn.ApplicationMaster: User class threw exception: resolved
>>> attributes label missing from label, user_id, ...(the rest of the fields of
>>> my table
>>> org.apache.spark.sql.AnalysisException: resolved attributes label
>>> missing from label, user_id, ... (the rest of the fields of my table)
>>>
>>> where we can see that the label field actually exist. I manage to solve
>>> this issue by updating my syntax to:
>>>
>>> val newdataset = dataset.where($"label" === 1)
>>>
>>> which works. However I can not make this trick in all my queries. For
>>> example, when I try to do a unionAll from two subsets of the same data
>>> frame the error I am getting is that all my fields are missing.
>>>
>>> Can someone tell me if I need to do some post processing after loading
>>> from hive in order to avoid this kind of errors?
>>>
>>>
>>> Thanks
>>> --
>>> Cesar Flores
>>>
>>
>>
>>
>> --
>> Cesar Flores
>>
>
>


Re: dataframe can not find fields after loading from hive

2015-04-17 Thread Reynold Xin
This is strange. cc the dev list since it might be a bug.



On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores  wrote:

> Never mind. I found the solution:
>
> val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
> hiveLoadedDataFrame.schema)
>
> which translate to convert the data frame to rdd and back again to data
> frame. Not the prettiest solution, but at least it solves my problems.
>
>
> Thanks,
> Cesar Flores
>
>
>
> On Thu, Apr 16, 2015 at 11:17 AM, Cesar Flores  wrote:
>
>>
>> I have a data frame in which I load data from a hive table. And my issue
>> is that the data frame is missing the columns that I need to query.
>>
>> For example:
>>
>> val newdataset = dataset.where(dataset("label") === 1)
>>
>> gives me an error like the following:
>>
>> ERROR yarn.ApplicationMaster: User class threw exception: resolved
>> attributes label missing from label, user_id, ...(the rest of the fields of
>> my table
>> org.apache.spark.sql.AnalysisException: resolved attributes label missing
>> from label, user_id, ... (the rest of the fields of my table)
>>
>> where we can see that the label field actually exist. I manage to solve
>> this issue by updating my syntax to:
>>
>> val newdataset = dataset.where($"label" === 1)
>>
>> which works. However I can not make this trick in all my queries. For
>> example, when I try to do a unionAll from two subsets of the same data
>> frame the error I am getting is that all my fields are missing.
>>
>> Can someone tell me if I need to do some post processing after loading
>> from hive in order to avoid this kind of errors?
>>
>>
>> Thanks
>> --
>> Cesar Flores
>>
>
>
>
> --
> Cesar Flores
>