Some more info: I'm putting the compressions values on hive-site.xml and running spark job. hc.sql("set ****") returns the expected (compression) configuration but looking at the logs, it create the tables without compression: 15/04/21 13:14:19 INFO metastore.HiveMetaStore: 0: create_table: Table(tableName:core_secm_instr_21042015_131411_tmp, dbName:default, owner:hadoop, createTime:1429622059, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{serialization.format=1, path=hdfs:// 10.166.157.97:9000/user/hive/warehouse/core_secm_instr_21042015_131411_tmp}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"sid","type":"integer","nullable":true,"metadata":{}},{"name":"typeid","type":"integer","nullable":true,"metadata":{}},{"name":"symbol","type":"string","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"beginDT","type":"long","nullable":true,"metadata":{}},{"name":"endDT","type":"long","nullable":true,"metadata":{}}]}, EXTERNAL=FALSE, spark.sql.sources.schema.numParts=1, spark.sql.sources.provider=org.apache.spark.sql.parquet}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) 15/04/21 13:14:19 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=create_table: Table(tableName:core_secm_instr_21042015_131411_tmp, dbName:default, owner:hadoop, createTime:1429622059, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{serialization.format=1, path=hdfs:// 10.166.157.97:9000/user/hive/warehouse/core_secm_instr_21042015_131411_tmp}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"sid","type":"integer","nullable":true,"metadata":{}},{"name":"typeid","type":"integer","nullable":true,"metadata":{}},{"name":"symbol","type":"string","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"beginDT","type":"long","nullable":true,"metadata":{}},{"name":"endDT","type":"long","nullable":true,"metadata":{}}]}, EXTERNAL=FALSE, spark.sql.sources.schema.numParts=1, spark.sql.sources.provider=org.apache.spark.sql.parquet}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
On Tue, Apr 21, 2015 at 12:40 PM, Ophir Cohen <oph...@gmail.com> wrote: > Sadly I'm encounter too many issues migrating my code to Spark 1.3 > > I wrote one problem on other mail but my main problem is that I can't set > the right compression type. > In Spark 1.2.1 setting the following values was enough: > hc.setConf("hive.exec.compress.output", "true") > hc.setConf("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.SnappyCodec") > hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK") > > Running it the new cluster I: > 1. Get the files uncompressed and named *.parquet > 2. When trying to explore it using Hive CLI I get the follwing excpetion: > > > *Failed with exception java.io.IOException:java.io.IOException: > hdfs://10.166.157.97:9000/user/hive/warehouse/core_equity_corp_splits_divs/part-r-00001.parquet > <http://10.166.157.97:9000/user/hive/warehouse/core_equity_corp_splits_divs/part-r-00001.parquet> > not a SequenceFileTime taken: 0.538 seconds* > 3. Running from Spark shell the same query yield empty results. > > Please advise > > >