Hi there,
We've been encountering the exception
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal
error occurred when node tried to create too many dynamic partitions. The
maximum number of dynamic partitions is controlled by
hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode.
Maximum was set to: 100
On a very small dataset (180 lines) using the following setup
CREATE TABLE enriched_data (
enriched_json_data string
)
PARTITIONED BY (yyyy string, mm string, dd string, identifier string,
sub_identifier string, unique_run_id string)
CLUSTERED BY (enriched_json_data) INTO 128 BUCKETS
LOCATION "${OUTDIR}";
INSERT OVERWRITE TABLE enriched_data PARTITION (yyyy, mm, dd, identifier,
sub_identifier, unique_run_id)
SELECT …
We’ve not seen this issue before (normally our dataset is billions of lines),
but in this case we have a very tiny amount of data causing this issue.
After looking at the code, it appears as if this condition is failing
https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L745
I downloaded and rebuilt the branch with a bit of debugging/stdout printing on
the contents of the valToPaths map and it fails as there are 101 entries in it
All the entries look like this
yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000047_0
yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000048_0
yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000049_0
yyyy=2015/mm=04/dd=09/identifier=1/sub-identifier=3/unique_run_id=df-345345/000051_0
….
We’re just confused as to why Hive considers the final bit of the output path
(e.g. 000047_0) to be a “dynamic partition”, as this is not in our PARTITIONED
BY clause
The only thing I can think of is the CLUSTERED BY 128 BUCKETS clause, combined
with the dataset being really small (180 lines), is loading everything into 1
REDUCER task – but the hashing of each line is distributing the rows fairly
uniformly so we have > 100 buckets to write to via one reducer
Any help will be greatly appreciated
With thanks,
Daniel Harper
Software Engineer, OTG ANT
BC5 A5