how to increase parquet file size via hive

2015-05-22 Thread ey-chih chow
Hi, I used a hive insert/select statement to convert our log files into the Parquet format. I found that the size of each Parquet file generated is under 400mb. Is there any way I can increase the size of the Parquet files generated? Thanks. Ey-Chih Chow

Beeline command issues

2015-05-22 Thread Nenad Samardzic
Hi all, I am new to Hive and hopefully this is going to be an easy thing to solve for someone with more experience, but I am having trouble doing it on my own. On my EC2 app server I am running the following command with no error: *beeline -u jdbc:hive2://master* This is working on Hive 13 which

Re: Hive on Spark VS Spark SQL

2015-05-22 Thread Xuefu Zhang
Hi Cheolsoo, Thanks for the correction. I took that for granted and didn't actually check the code to verify. Yes, from the Spark version (1.2), I did see their parser etc. Below is a portion of the README from Spark's sql package for reference. Thanks, Xuefu Spark SQL is broken up into four

Re: how to increase parquet file size via hive

2015-05-22 Thread Grant Overby (groverby)
I don’t understand the question. Why do you want them larger? Are you looking to merge parquet files? Are you looking to append to parquet files? Are you concerned about the small size? [http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726] Grant Overby

Re: Malformed Orc file Invalid postscript length 0

2015-05-22 Thread Owen O'Malley
Bhavana, Could you send me (omal...@apache.org) the incorrect ORC file? Which file system were you using? hdfs? Which version of Hadoop and Hive? Thanks, Owen On Fri, May 22, 2015 at 9:37 AM, Grant Overby (groverby) grove...@cisco.com wrote: I’m getting the following exception when

Malformed Orc file Invalid postscript length 0

2015-05-22 Thread Grant Overby (groverby)
I’m getting the following exception when Hive executes a query on an external table. It seems the postscript isn’t written even though .close() is called and returns normally. Any thoughts? java.io.IOException: Malformed ORC file

RE: how to increase parquet file size via hive

2015-05-22 Thread ey-chih chow
The reason I want them larger is to improve performance of downstream map/reduce and Spark jobs. The larger the file sizes can be, the better the performance of the downstream jobs can achieve. I would like to know if there is any configuration parameter that I can set for Hive to generate

Re: Malformed Orc file Invalid postscript length 0

2015-05-22 Thread Grant Overby (groverby)
I sent a link to you. File system is hdfs. Versions: hdp HDP-2.2.4.2-2 hdfs 2.6.0.2.2 MapReduce2 2.6.0.2.2 YARN 2.6.0.2.2 hive 0.14.0.2.2 tez 0.5.2.2.2 It was a tez query that caused the exception, but I doubt that’s relevant.