Parquet is available as a storage option for hawq internal tables. Hawq implements column oriented storage with a file per column.
Eg, storing a table with orientation=column in hawq, if there are 20 segments, 1000 columns, and the table has 500 partitions, in total it will generate about 20*1000*500 files in hdfs. With orientation=parquet, you only have 20*1000 files. HDFS is not good at handling a huge amount of small files. On Sun, Feb 28, 2016 at 9:47 PM Michael André Pearce < [email protected]> wrote: > Hi Lei, > > How come in latest versions of hive they achieve and advocate using column > orientated tables with orc or parquet, and this isn’t suffered as much? > Isn’t this how some of the more recent performance improvements have even > been achieved in hive by using such formats as hive. > > Surely having columnar tables is more efficient and would bring > performance benefits to hawq for analytics workloads which is what in my > experience the key workload of sql users on hadoop. > > Using something like ORC files with compactions would also enable HAWQ to > support transactions e.g. delete and update operations as is now available > in Hive. > > Cheers > Mike > > > > > On 29 Feb 2016, at 01:19, Lei Chang <[email protected]> wrote: > > Hi, if column oriented tables are not used properly, it may overwhelm hdfs > since it might lead to too many files. So it is disabled by default. > > Cheers > Lei > > > > On Sun, Feb 28, 2016 at 10:39 PM, [email protected] <[email protected]> wrote: > >> hi,all: >> this days i am testing hawq(1.3.1) ,I got some questions: >> by default,hawq off the column_orientied_table,why? >> >> [gpadmin@stars1 test]$ >> [gpadmin@stars1 test]$ psql -U gpadmin -d hawq -f create_table.sql >> >> psql:create_table.sql:48: ERROR: Column oriented tables are deprecated. To >> enable it, set GUC gp_enable_column_oriented_table on. >> [gpadmin@stars1 test]$ gpconfig -s gp_enabled_column_orientied_table >> >> 20160228:21:45:40:026806 gpconfig:stars1:gpadmin-[ERROR]:-Failed to retrieve >> GUC information, guc does not exist: gp_enabled_column_orientied_table >> [gpadmin@stars1 test]$ gpconfig -s gp_enable_column_oriented_table >> Values on all segments are consistent >> GUC : gp_enable_column_oriented_table >> Master value: off >> Segment value: off >> [gpadmin@stars1 test]$ >> >> ------------------------------ >> [email protected] >> > > >
