ok. so i just learned of a perl script called "json_pp" (for json pretty_prrint) that is a binary included in the distro for the perl JSON module. You gotta figure there are analogous tools in other languages as well but given this is a binary it doesn't matter much what language its written in - it just works.
so doing this: $ hive -e 'select <your_json_column> from <your_table>' | json_pp you'd actually be able to see what your json looks like in a human readable fashion. get the binary (and the module) here: https://metacpan.org/module/JSON On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: > Ok. Thanks Stephen. I will try that out. > Will update the group if I am able to get this to work. For now, I will > continue with non-partitioned table. > > regards > Sunita > > > On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague <sprag...@gmail.com>wrote: > >> it would appear to be that you may partition only by non-nested columns. >> I would recommend transforming your original dataset into one where the >> first column is YYYYMM and the rest is your json object. During this >> transformation you may also wish to make further optimizations as well >> since you'll be scanning every record. >> >> as always my 2 cents only. >> >> >> On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind <sunitarv...@gmail.com>wrote: >> >>> Hi, >>> >>> I am unable to create a partitioned table. >>> The error I get is: >>> FAILED: ParseException line 37:16 mismatched input >>> '"jobs.values.postingDate.year"' expecting Identifier near '(' in column >>> specification >>> >>> I tried referring to the columns in various ways, >>> S.jobs.values.postingDate.year, with quotes, without quotes, get the same >>> error. Also tried creating a partition by year alone. Still get the same >>> error. >>> >>> Here is the create table statement: >>> >>> create external table linkedin_JobSearch ( >>> jobs STRUCT< >>> values : ARRAY<STRUCT< >>> company : STRUCT< >>> id : STRING, >>> name : STRING>, >>> postingDate : STRUCT< >>> day : STRING>, >>> descriptionSnippet : STRING, >>> expirationDate : STRUCT< >>> ...... >>> ....... >>> locationDescription : STRING>>> >>> ) >>> PARTITIONED BY ("jobs.values.postingDate.year" STRING, >>> "jobs.values.postingDate.month" STRING) >>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' >>> WITH SERDEPROPERTIES ( >>> "company"="$.jobs.values.company.name", >>> "position"="$.jobs.values.position.title", >>> "customerJobCode"="$.jobs.values.customerJobCode", >>> "locationDescription"="$.jobs.values.locationDescription", >>> "jobPoster"="$.jobs.values.jobposter.headline" >>> ) >>> LOCATION '/user/sunita/Linkedin/JobSearch'; >>> >>> I need to be able to partition this information. Please help. >>> >>> regards >>> Sunita >>> >> >> >