I'm not sure. Using a virtual environment with Hortonwork's version (2.6.1) and hdfs instead of s3 it works:
hive> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC > LOCATION 'hdfs://nn.example.com/user/vagrant/country/'; > OK > Time taken: 4.073 seconds > hive> Select * from Table1; > OK > 1 Singapore > 2 Malaysia > 3 India > 4 Hong Kong > 5 Macau > 6 Thailand > 7 Indonesia > 8 Philippines > 9 Dubai > 10 Vietnam > Time taken: 0.76 seconds, Fetched: 10 row(s) If you want to create a virtual environment, you can use https://github.com/hortonworks/structor . You can use the 1node-nonsecure.profile unless you want multiple nodes or security. Based on that, it is either a problem with EMR or the binding to S3. .. Owen On Wed, Oct 25, 2017 at 12:04 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Yes, It is exactly my point. Since the file has the data (orc is valid), > why hive returns NULLs? > I tested it s3 , hdfs , hive , beeline. the behavior is the same: > > select count (*) returns 10. > select * returns NULLs ... > > What is the way to debug this problem? Any configuration, logging. I am > using defaults of EMR. > > Please advice. > Thanks, Oleg. > > > > > > > On Wed, Oct 25, 2017 at 2:30 PM, Owen O'Malley <owen.omal...@gmail.com> > wrote: > >> The file has the data. I'm not sure what Hive is doing wrong. >> >> owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar >>> data ~/Downloads/Country.orc >>> Processing data file /Users/owen/Downloads/Country.orc [length: 392] >>> {"Id":1,"Name":"Singapore"} >>> {"Id":2,"Name":"Malaysia"} >>> {"Id":3,"Name":"India"} >>> {"Id":4,"Name":"Hong Kong"} >>> {"Id":5,"Name":"Macau"} >>> {"Id":6,"Name":"Thailand"} >>> {"Id":7,"Name":"Indonesia"} >>> {"Id":8,"Name":"Philippines"} >>> {"Id":9,"Name":"Dubai"} >>> {"Id":10,"Name":"Vietnam"} >>> ____________________________________________________________ >>> ____________________________________________________________ >> >> >> .. Owen >> >> On Tue, Oct 24, 2017 at 11:11 PM, Oleg Ruchovets <oruchov...@gmail.com> >> wrote: >> >>> I am creating hive external table ORC (ORC file located on S3). >>> >>> *Command* >>> >>> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION >>> 's3://bucket_name' >>> >>> *After running the query*: >>> >>> Select * from Table1; >>> >>> *Result is*: >>> >>> +-------------------------------------+---------------------------------------+ >>> | Table1.id | Table1.name | >>> +-------------------------------------+---------------------------------------+ >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> | NULL | NULL >>> | >>> +-------------------------------------+---------------------------------------+ >>> >>> Interesting that the number of returned records 10 and it is correct but >>> all records are NULL. What is wrong, why query returns only NULLs? I am >>> using EMR instances on AWS. Should I configure/check to support ORC format >>> for hive? >>> >>> ORC file attached >>> >> >> >