set mapreduce.input.fileinputformat.split.maxsize=20480000000;
set mapreduce.input.fileinputformat.split.minsize.per.node=20480000000;
set mapreduce.input.fileinputformat.split.minsize.per.rack=20480000000;


Increase the parameter above can decrease the number of reduce task.  I think 
will decrease the number of output files.



r7raul1...@163.com
 
From: Jagat Singh
Date: 2015-10-21 08:49
To: user
Subject: Hive ORC file execption serious problem
We are using Hive 0.14
Our input file size is around 100 GB uncompressed
We are using insering this data to hive which is ORC based table , ZLIB
While inserting we are also using following two parameters.
SET hive.exec.reducers.max=10; 
SET mapred.reduce.tasks=5;

The output ORC file produced is of about 10GB compressed.
Question :
How to control the number of output ORC files 
How to control size of ORC file generated
If we get very big files like 10GB ORC ,  we try to query the table we get 
exception in hive as shown below ( query and exception below)
Will setting hive.exec.orc.default.block.size or 
hive.exec.orc.default.stripe.size to some lower value help to control the file 
output size?
Is there any limitation in ORC for file size 
We have following hive properties set in Ambari
hive.merge.size.per.task 256000000
hive.merge.orcfile.stripe.level true
hive.merge.mapfiles true
hive.merge.mapredfiles true


Reading Query
Select * from table where partition=“big_file_size"

Execption
P-524264982-127.0.0.1-1429020129249:blk_1091744762_18097939): PathInfo{path=, 
state=UNUSABLE} is not usable for short circuit; giving up on BlockReaderLocal.
15/10/21 11:30:02 [LeaseRenewer:d760770@tdcdv2]: DEBUG hdfs.LeaseRenewer: Lease 
renewer daemon for [] with renew id 1 executed
15/10/21 11:30:04 [ORC_GET_SPLITS #1]: ERROR orc.OrcInputFormat: Unexpected 
Exception
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setIncludedColumns(OrcInputFormat.java:260)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:779)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failed with exception java.io.IOException:java.lang.RuntimeException: serious 
problem
15/10/21 11:30:04 [main]: ERROR CliDriver: Failed with exception 
java.io.IOException:java.lang.RuntimeException: serious problem
java.io.IOException: java.lang.RuntimeException: serious problem
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1623)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: serious problem
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:949)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:974)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)
... 15 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setIncludedColumns(OrcInputFormat.java:260)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:779)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

15/10/21 11:30:04 [main]: INFO exec.TableScanOperator: 0 finished. closing...
15/10/21 11:30:04 [main]: DEBUG exec.TableScanOperator: Closing child = SEL[2]
15/10/21 11:30:04 [main]: DEBUG exec.SelectOperator: 
allInitializedParentsAreClosed? parent.state = CLOSE
15/10/21 11:30:04 [main]: INFO exec.SelectOperator: 2 finished. closing...
15/10/21 11:30:04 [main]: DEBUG exec.SelectOperator: Closing child = LIM[3]

Reply via email to