Patai, My bad - that was on my mind but I missed noting it down on my earlier reply. Yes you'd have to control that as well. 2 should be fine for smaller clusters.
On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum <[email protected]> wrote: > Just want to share & check if this is make sense. > > Job was failed to run after i restarted the namenode and the cluster > stopped complain about under-replication. > > this is what i found in log file > > Requested replication 10 exceeds maximum 2 > java.io.IOException: file > /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar. > Requested replication 10 exceeds maximum 2 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143 > > > So, i scanned though those xml config files, and guess to change > <name>mapred.submit.replication</name> from 10 to 2, and restarted again. > > That's when jobs can start running again. > Hopefully that change is make sense. > > > Thanks > Patai > > On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum > <[email protected]> wrote: >> Thanks Harsh, dfs.replication.max does do the magic!! >> >> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <[email protected]> >> wrote: >>> Thank you, Harsh. I did not know about dfs.replication.max. >>> >>> >>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <[email protected]> wrote: >>>> >>>> Hey Chris, >>>> >>>> The dfs.replication param is an exception to the <final> config >>>> feature. If one uses the FileSystem API, one can pass in any short >>>> value they want the replication to be. This bypasses the >>>> configuration, and the configuration (being per-file) is also client >>>> sided. >>>> >>>> The right way for an administrator to enforce a "max" replication >>>> value at a create/setRep level, would be to set >>>> the dfs.replication.max to a desired value at the NameNode and restart >>>> it. >>>> >>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth >>>> <[email protected]> wrote: >>>> > Hello Patai, >>>> > >>>> > Has your configuration file change been copied to all nodes in the >>>> > cluster? >>>> > >>>> > Are there applications connecting from outside of the cluster? If so, >>>> > then >>>> > those clients could have separate configuration files or code setting >>>> > dfs.replication (and other configuration properties). These would not >>>> > be >>>> > limited by final declarations in the cluster's configuration files. >>>> > <final>true</final> controls configuration file resource loading, but it >>>> > does not necessarily block different nodes or different applications >>>> > from >>>> > running with completely different configurations. >>>> > >>>> > Hope this helps, >>>> > --Chris >>>> > >>>> > >>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum >>>> > <[email protected]> wrote: >>>> >> >>>> >> Hi Hadoopers, >>>> >> >>>> >> I have >>>> >> <property> >>>> >> <name>dfs.replication</name> >>>> >> <value>2</value> >>>> >> <final>true</final> >>>> >> </property> >>>> >> >>>> >> set in hdfs-site.xml in staging environment cluster. while the staging >>>> >> cluster is running the code that will later be deployed in production, >>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than >>>> >> 2; the number that developer thought that will fit in production >>>> >> environment. >>>> >> >>>> >> Even though I final the property dfs.replication in staging cluster >>>> >> already. every time i run fsck on the staging cluster i still see it >>>> >> said under replication. >>>> >> I thought final keyword will not honor value in job config, but it >>>> >> doesn't seem so when i run fsck. >>>> >> >>>> >> I am on cdh3u4. >>>> >> >>>> >> please suggest. >>>> >> Patai >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Harsh J >>> >>> -- Harsh J
