Are you using a CombineFileInputFormat or similar input format then, perhaps?
On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <[email protected]> wrote: > We are using the default replication factor of 3. When new files are put on > HDFS we never override the replication factor. When there is more data > involved it fails on a larger split size. > > > On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <[email protected]> wrote: >> >> Do your input files carry a replication factor of 10+? That could be >> one cause behind this. >> >> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <[email protected]> >> wrote: >> > Folks, >> > >> > Any one run into this issue before: >> > java.io.IOException: Max block location exceeded for split: Paths: >> > "/foo/bar...." >> > .... >> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat >> > splitsize: 15 maxsize: 10 >> > at >> > >> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162) >> > at >> > >> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87) >> > at >> > >> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501) >> > at >> > >> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) >> > at >> > >> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) >> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269) >> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:415) >> > at >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266) >> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) >> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:415) >> > at >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> > at >> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) >> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) >> > at >> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) >> > >> > When we set the property to something higher as suggested like: >> > mapreduce.job.max.split.locations = more than on what it failed >> > then the job runs successfully. >> > >> > I am trying to dig up additional documentation on this since the default >> > seems to be 10, not sure how that limit was set. >> > Additionally what is the recommended value and what factors does it >> > depend >> > on? >> > >> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive >> > version >> > 0.10 >> > >> > Any pointers in this direction will be helpful. >> > >> > Regards, >> > md >> >> >> >> -- >> Harsh J > > -- Harsh J
