Re: Compression and Hive with Spark 1.3

Ophir Cohen Tue, 21 Apr 2015 06:22:53 -0700

Some more info:
I'm putting the compressions values on hive-site.xml and running spark job.
hc.sql("set ****") returns the expected (compression) configuration but
looking at the logs, it create the tables without compression:
15/04/21 13:14:19 INFO metastore.HiveMetaStore: 0: create_table:
Table(tableName:core_secm_instr_21042015_131411_tmp, dbName:default,
owner:hadoop, createTime:1429622059, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>,
comment:from deserializer)], location:null,
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{serialization.format=1, path=hdfs://
10.166.157.97:9000/user/hive/warehouse/core_secm_instr_21042015_131411_tmp}),
bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{})), partitionKeys:[],
parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"sid","type":"integer","nullable":true,"metadata":{}},{"name":"typeid","type":"integer","nullable":true,"metadata":{}},{"name":"symbol","type":"string","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"beginDT","type":"long","nullable":true,"metadata":{}},{"name":"endDT","type":"long","nullable":true,"metadata":{}}]},
EXTERNAL=FALSE, spark.sql.sources.schema.numParts=1,
spark.sql.sources.provider=org.apache.spark.sql.parquet},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
15/04/21 13:14:19 INFO HiveMetaStore.audit: ugi=hadoop
ip=unknown-ip-addr      cmd=create_table:
Table(tableName:core_secm_instr_21042015_131411_tmp, dbName:default,
owner:hadoop, createTime:1429622059, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>,
comment:from deserializer)], location:null,
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{serialization.format=1, path=hdfs://
10.166.157.97:9000/user/hive/warehouse/core_secm_instr_21042015_131411_tmp}),
bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{})), partitionKeys:[],
parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"sid","type":"integer","nullable":true,"metadata":{}},{"name":"typeid","type":"integer","nullable":true,"metadata":{}},{"name":"symbol","type":"string","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"beginDT","type":"long","nullable":true,"metadata":{}},{"name":"endDT","type":"long","nullable":true,"metadata":{}}]},
EXTERNAL=FALSE, spark.sql.sources.schema.numParts=1,
spark.sql.sources.provider=org.apache.spark.sql.parquet},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)




On Tue, Apr 21, 2015 at 12:40 PM, Ophir Cohen <oph...@gmail.com> wrote:

> Sadly I'm encounter too many issues migrating my code to Spark 1.3
>
> I wrote one problem on other mail but my main problem is that I can't set
> the right compression type.
> In Spark 1.2.1 setting the following values was enough:
> hc.setConf("hive.exec.compress.output", "true")
>     hc.setConf("mapreduce.output.fileoutputformat.compress.codec",
> "org.apache.hadoop.io.compress.SnappyCodec")
>     hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")
>
> Running it the new cluster I:
> 1. Get the files uncompressed and named *.parquet
> 2. When trying to explore it using Hive CLI I get the follwing excpetion:
>
>
> *Failed with exception java.io.IOException:java.io.IOException:
> hdfs://10.166.157.97:9000/user/hive/warehouse/core_equity_corp_splits_divs/part-r-00001.parquet
> <http://10.166.157.97:9000/user/hive/warehouse/core_equity_corp_splits_divs/part-r-00001.parquet>
> not a SequenceFileTime taken: 0.538 seconds*
> 3. Running from Spark shell the same query yield empty results.
>
> Please advise
>
>
>

Re: Compression and Hive with Spark 1.3

Reply via email to