I'm not sure. Using a virtual environment with Hortonwork's version (2.6.1)
and hdfs instead of s3 it works:

hive> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC
> LOCATION 'hdfs://nn.example.com/user/vagrant/country/';
> OK
> Time taken: 4.073 seconds
> hive> Select * from Table1;
> OK
> 1 Singapore
> 2 Malaysia
> 3 India
> 4 Hong Kong
> 5 Macau
> 6 Thailand
> 7 Indonesia
> 8 Philippines
> 9 Dubai
> 10 Vietnam
> Time taken: 0.76 seconds, Fetched: 10 row(s)


 If you want to create a virtual environment, you can use
https://github.com/hortonworks/structor . You can use
the 1node-nonsecure.profile unless you want multiple nodes or security.

Based on that, it is either a problem with EMR or the binding to S3.

.. Owen

On Wed, Oct 25, 2017 at 12:04 AM, Oleg Ruchovets <oruchov...@gmail.com>
wrote:

> Yes, It is exactly my point. Since the file has the data  (orc is valid),
> why hive returns NULLs?
> I tested it s3 , hdfs , hive , beeline. the behavior is the same:
>
>     select count (*) returns 10.
>     select * returns NULLs ...
>
> What is the way to debug this problem? Any configuration, logging. I am
> using defaults of EMR.
>
> Please advice.
> Thanks, Oleg.
>
>
>
>
>
>
> On Wed, Oct 25, 2017 at 2:30 PM, Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> The file has the data. I'm not sure what Hive is doing wrong.
>>
>> owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar
>>> data ~/Downloads/Country.orc
>>> Processing data file /Users/owen/Downloads/Country.orc [length: 392]
>>> {"Id":1,"Name":"Singapore"}
>>> {"Id":2,"Name":"Malaysia"}
>>> {"Id":3,"Name":"India"}
>>> {"Id":4,"Name":"Hong Kong"}
>>> {"Id":5,"Name":"Macau"}
>>> {"Id":6,"Name":"Thailand"}
>>> {"Id":7,"Name":"Indonesia"}
>>> {"Id":8,"Name":"Philippines"}
>>> {"Id":9,"Name":"Dubai"}
>>> {"Id":10,"Name":"Vietnam"}
>>> ____________________________________________________________
>>> ____________________________________________________________
>>
>>
>>  .. Owen
>>
>> On Tue, Oct 24, 2017 at 11:11 PM, Oleg Ruchovets <oruchov...@gmail.com>
>> wrote:
>>
>>> I am creating hive external table ORC (ORC file located on S3).
>>>
>>> *Command*
>>>
>>> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION 
>>> 's3://bucket_name'
>>>
>>> *After running the query*:
>>>
>>> Select * from Table1;
>>>
>>> *Result is*:
>>>
>>> +-------------------------------------+---------------------------------------+
>>> | Table1.id  | Table1.name  |
>>> +-------------------------------------+---------------------------------------+
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> | NULL                                | NULL                                
>>>   |
>>> +-------------------------------------+---------------------------------------+
>>>
>>> Interesting that the number of returned records 10 and it is correct but
>>> all records are NULL. What is wrong, why query returns only NULLs? I am
>>> using EMR instances on AWS. Should I configure/check to support ORC format
>>> for hive?
>>>
>>> ORC file attached
>>>
>>
>>
>

Reply via email to