Is this a Parquet file? It looks more like a JSON document. What is the schema description published by the parquet-tools?
________________________________ From: PROJJWAL SAHA <[email protected]> Sent: Thursday, March 9, 2017 4:36:06 AM To: [email protected] Subject: Query on .gz.parquet files All, one question i am querying on .gz.parquet files. select * from xxx returns data like +---------+ | current | +---------+ | {"vendor_id":"VTS","pickup_datetime":"ACj75+tEAAAvfSUA","payment_type":"CSH","fare_amount":12.0,"mta_tax":0.5,"tip_amount":0.0,"tolls_amount":5.33,"total_amount":18.33,"ratecodeid":1.0,"dropoff_datetime":"AEhTi5NFAAAvfSUA","passenger_count":1,"trip_distance":2.93,"extra":0.5,"pickup_geocode":{"Latitude":40.743677,"Longitude":-73.953802},"dropoff_geocode":{"Latitude":40.740917,"Longitude":-73.989298},"PRIMARY_KEY":"8589934600","pickup_geocode_geo_city":"Long Island City","pickup_geocode_geo_country":"US","pickup_geocode_geo_postcode":"11109","pickup_geocode_geo_region":"New York","pickup_geocode_geo_subregion":"Queens County","pickup_geocode_geo_regionid":"5128638","pickup_geocode_geo_subregionid":"5133268","dropoff_geocode_geo_city":"New York City","dropoff_geocode_geo_country":"US","dropoff_geocode_geo_postcode":"10007","dropoff_geocode_geo_region":"New York","dropoff_geocode_geo_regionid":"5128638"} |..... it doesnt return in tabular format with headers at the top. also select count(*) works fine whereas select count(vendor_id) doesnt work - it returns 0 looks like the header names are not detected. I have tried adding extractheaders: true for parquet also tried adding extensions as gz.parquet - it doesnt work i also have defaultInputFormat as parquet for the workspace. Any suggestions ? Regards, Projjwal
