Re: Hive select shows null after successful data load

Ramki Palle Wed, 19 Jun 2013 09:13:13 -0700

Can you run some other queries from job1 table and see if any query returns
some data?


I am guessing your query "select jobs.values.position.title from jobs1;"
may have some issue. May be it should be as

select jobs.values[0].position.title from jobs1;


Regards,
Ramki.


On Wed, Jun 19, 2013 at 8:24 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:

> Thanks Stephen,
>
> That's just what I tried with the try_parsed table. It is exactly same
> data with lesser nesting in the structure and lesser number of entries.
> Do you mean to say that highly nested jsons can lead to issues? What are
> typical solution to such issues? Write UDFs in hive or parse the JSON into
> a delimited file?
> I have heard of custom serdes also. Not sure if UDFs and custom serdes are
> one and the same.
>
> regards
> Sunita
>
>
> On Wed, Jun 19, 2013 at 10:38 AM, Stephen Sprague <sprag...@gmail.com>wrote:
>
>> I think you might have to start small here instead of going for the home
>> run on the first swing.  when all else fails start with a trivial json
>> object and then build up from there and see what additional step breaks
>> it.   that way you know if the trivial example fails is something
>> fundamental and not the complexity of your json object that's throwing
>> things off.
>>
>>
>> On Wed, Jun 19, 2013 at 4:34 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>
>>> Thanks for sharing your experience Richa.
>>> I do have timestamps but in the format of year : INT, day : INT, month :
>>> INT.
>>> As per your suggestion, I changed them all to string, but still get null
>>> as the output.
>>>
>>> regards
>>> Sunita
>>>
>>>
>>> On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma <
>>> mailtorichasha...@gmail.com> wrote:
>>>
>>>> Do you have any timestamp fields in the table that might contain null
>>>> value ?
>>>>
>>>> I faced a similar situation sometime back - changing the data type to
>>>> string made it work.
>>>>
>>>> But I was working on delimited text files.
>>>> Not sure if it applies to JSON .. but its still worth giving a try !!
>>>>
>>>> Richa
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind 
>>>> <sunitarv...@gmail.com>wrote:
>>>>
>>>>> Having the a column name same as the table name, is a problem due to
>>>>> which I was not able to reference jobs.values.id from jobs. Changing
>>>>> the table name to jobs1 resolved the semantic error.
>>>>> However, the query still returns null
>>>>>
>>>>> hive> select jobs.values.position.title from jobs1;
>>>>> Total MapReduce jobs = 1
>>>>> Launching Job 1 out of 1
>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>> Starting Job = job_201306080116_0036, Tracking URL =
>>>>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
>>>>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>>>>> job_201306080116_0036
>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>> reducers: 0
>>>>> 2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
>>>>> 2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%, Cumulative
>>>>> CPU 0.88 sec
>>>>> 2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%, Cumulative
>>>>> CPU 0.88 sec
>>>>> 2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%, Cumulative
>>>>> CPU 0.88 sec
>>>>> MapReduce Total cumulative CPU time: 880 msec
>>>>> Ended Job = job_201306080116_0036
>>>>> MapReduce Jobs Launched:
>>>>> Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS
>>>>> Write: 3 SUCCESS
>>>>> Total MapReduce CPU Time Spent: 880 msec
>>>>> OK
>>>>> null
>>>>> Time taken: 9.591 seconds
>>>>>
>>>>>
>>>>> regards
>>>>> Sunita
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind 
>>>>> <sunitarv...@gmail.com>wrote:
>>>>>
>>>>>> Ok.
>>>>>> The data files are quite small. Around 35 KB and 1 KB each.
>>>>>>
>>>>>> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
>>>>>> Found 1 items
>>>>>> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>>>>>>
>>>>>>
>>>>>> [sunita@node01 tables]$ hadoop fs -text
>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
>>>>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
>>>>>> [{"company": {"i
>>>>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013,
>>>>>> "day": 10, "mo
>>>>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS,
>>>>>> HBase, Pig- Ve
>>>>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop,
>>>>>> HDFS, HBase,
>>>>>>  Pig, SQL, Industrial Software Development, System Integration, Java,
>>>>>> high perf
>>>>>> ormance, multi-threading, VerticaWe are a well known consumer product
>>>>>> developme
>>>>>> nt company and we are looking to add a Hadoop Engineer to our
>>>>>> Engineering team.
>>>>>>   You will be working with the latest ", "expirationDate": {"year":
>>>>>> 2013, "day"
>>>>>>
>>>>>> Its a single line, so used 'more' rather than 'head'. But
>>>>>> effectively, the file exists and has the data.
>>>>>>
>>>>>> regards
>>>>>> Sunita
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague 
>>>>>> <sprag...@gmail.com>wrote:
>>>>>>
>>>>>>> As Nitin alluded to its best to confirm the data is definitely in
>>>>>>> hdfs using hdfs semantics rather than hive for the first step.
>>>>>>>
>>>>>>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>>>>>>> 2. cat a bit of it and see if anything is there.   hadoop fs -text
>>>>>>> <your hdfs dir>/<filename> | head -10
>>>>>>>
>>>>>>> do you see any data from step #2?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <
>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I ran some complex queries. Something to the extent of
>>>>>>>>                     select jobs from jobs;
>>>>>>>>  which triggers map reduce jobs but does not show errors and
>>>>>>>> produces the same output "null". If I try referencing the struct 
>>>>>>>> elements,
>>>>>>>> I get error which seems to be the root cause.
>>>>>>>>
>>>>>>>> Attached are the select statement outputs with the corresponding
>>>>>>>> hive logs.
>>>>>>>>
>>>>>>>> I have also attached my usage details of another table - try_parsed
>>>>>>>> which has a subset of the same data which seems to work fine. Also 
>>>>>>>> attached
>>>>>>>> is the input file for this table - try_parsed.json
>>>>>>>> Thanks for your help
>>>>>>>>
>>>>>>>> Sunita
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <
>>>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> can you run a little more complex query
>>>>>>>>>
>>>>>>>>> select uniq across columns or do some maths. so we know when it
>>>>>>>>> fires up a mapreduce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <
>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is working
>>>>>>>>>> fine and json file is being picked based on all the errors that 
>>>>>>>>>> showed up
>>>>>>>>>> till this stage. What sort of error are you suspecting. File not 
>>>>>>>>>> present or
>>>>>>>>>> serde not parsing it ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>>>>>>
>>>>>>>>>>> select * from table is as good as hdfs -cat
>>>>>>>>>>>
>>>>>>>>>>> are you sure there is any data in the table?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am able to parse the input JSON file and load it into hive. I
>>>>>>>>>>>> do not see any errors with create table, so I am assuming that. 
>>>>>>>>>>>> But when I
>>>>>>>>>>>> try to read the data, I get null
>>>>>>>>>>>>
>>>>>>>>>>>> hive> select * from jobs;
>>>>>>>>>>>> OK
>>>>>>>>>>>> null
>>>>>>>>>>>>
>>>>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON
>>>>>>>>>>>> plugin and it is a valid JSON. Here is my create table
>>>>>>>>>>>> statement and attached is the json input file.
>>>>>>>>>>>>
>>>>>>>>>>>> create external table jobs (
>>>>>>>>>>>> jobs STRUCT<
>>>>>>>>>>>> values : ARRAY<STRUCT<
>>>>>>>>>>>> company : STRUCT<
>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>> postingDate : STRUCT<
>>>>>>>>>>>> year : INT,
>>>>>>>>>>>> day : INT,
>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>> descriptionSnippet : STRING,
>>>>>>>>>>>> expirationDate : STRUCT<
>>>>>>>>>>>> year : INT,
>>>>>>>>>>>> day : INT,
>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>> position : STRUCT<
>>>>>>>>>>>> title : STRING,
>>>>>>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>> industries : ARRAY<STRUCT<
>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>> jobType : STRUCT<
>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>> experienceLevel : STRUCT<
>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>> customerJobCode : STRING,
>>>>>>>>>>>> skillsAndExperience : STRING,
>>>>>>>>>>>> salary : STRING,
>>>>>>>>>>>> jobPoster : STRUCT<
>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>> firstName : STRING,
>>>>>>>>>>>> lastName : STRING,
>>>>>>>>>>>> headline : STRING>,
>>>>>>>>>>>> referralBonus : STRING,
>>>>>>>>>>>> locationDescription : STRING>>>
>>>>>>>>>>>>  )
>>>>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>>>>>>
>>>>>>>>>>>> The table creation works fine, but when I attempt to query, I
>>>>>>>>>>>> get null as the result.
>>>>>>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing
>>>>>>>>>>>> seems to impact.
>>>>>>>>>>>>
>>>>>>>>>>>> I am of the opinion that the libraries cannot handle this level
>>>>>>>>>>>> of nesting and I probably will have to write a custom serde or a 
>>>>>>>>>>>> parser
>>>>>>>>>>>> myself. Just wanted to seek guidance before I get into that. 
>>>>>>>>>>>> Appreciate
>>>>>>>>>>>> your help and guidance.
>>>>>>>>>>>>
>>>>>>>>>>>> regards
>>>>>>>>>>>> Sunita
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Nitin Pawar
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hive select shows null after successful data load

Reply via email to