What are the ramifications of setting a hard coded value in our scripts and then changing parameters which influence the input data size. I.e. I want to run across 1 day worth of data, then a different day I want to run against 30 days?
On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <[email protected]> wrote: > I am assuming you have looked at this already: > > https://issues.apache.org/jira/browse/MAPREDUCE-5186 > > You do have a workaround here to increase *mapreduce.job.max.split.locations > *value in hive configuration, or do we need more than that here ? > > -Rahul > > > On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor > <[email protected]>wrote: > >> It used to throw a warning in 1.03 and now has become an IOException. I >> was more trying to figure out why it is exceeding the limit even though the >> replication factor is 3. Also Hive may use CombineInputSplit or some >> version of it, are we saying it will always exceed the limit of 10? >> >> >> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo >> <[email protected]>wrote: >> >>> We have this job submit property buried in hive that defaults to 10. We >>> should make that configurable. >>> >>> >>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <[email protected]> wrote: >>> >>>> Do your input files carry a replication factor of 10+? That could be >>>> one cause behind this. >>>> >>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor < >>>> [email protected]> wrote: >>>> > Folks, >>>> > >>>> > Any one run into this issue before: >>>> > java.io.IOException: Max block location exceeded for split: Paths: >>>> > "/foo/bar...." >>>> > .... >>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat >>>> > splitsize: 15 maxsize: 10 >>>> > at >>>> > >>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162) >>>> > at >>>> > >>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87) >>>> > at >>>> > >>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501) >>>> > at >>>> > >>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) >>>> > at >>>> > >>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) >>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269) >>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266) >>>> > at java.security.AccessController.doPrivileged(Native Method) >>>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>>> > at >>>> > >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266) >>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) >>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) >>>> > at java.security.AccessController.doPrivileged(Native Method) >>>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>>> > at >>>> > >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>> > at >>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) >>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) >>>> > at >>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) >>>> > >>>> > When we set the property to something higher as suggested like: >>>> > mapreduce.job.max.split.locations = more than on what it failed >>>> > then the job runs successfully. >>>> > >>>> > I am trying to dig up additional documentation on this since the >>>> default >>>> > seems to be 10, not sure how that limit was set. >>>> > Additionally what is the recommended value and what factors does it >>>> depend >>>> > on? >>>> > >>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive >>>> version >>>> > 0.10 >>>> > >>>> > Any pointers in this direction will be helpful. >>>> > >>>> > Regards, >>>> > md >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>> >>> >> >
