Re: Issue: Max block location exceeded for split error when running hive

Harsh J Wed, 18 Sep 2013 18:35:12 -0700

Do your input files carry a replication factor of 10+? That could be
one cause behind this.


On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <[email protected]> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>
> When we set the property to something higher as suggested like:
> mapreduce.job.max.split.locations = more than on what it failed
> then the job runs successfully.
>
> I am trying to dig up additional documentation on this since the default
> seems to be 10, not sure how that limit was set.
> Additionally what is the recommended value and what factors does it depend
> on?
>
> We are running YARN, the actual query is Hive on CDH 4.3, with Hive version
> 0.10
>
> Any pointers in this direction will be helpful.
>
> Regards,
> md



-- 
Harsh J

Re: Issue: Max block location exceeded for split error when running hive

Reply via email to