Thanks you so much for confirming that.
On Mon, Oct 15, 2012 at 9:25 PM, Harsh J <[email protected]> wrote: > Patai, > > My bad - that was on my mind but I missed noting it down on my earlier > reply. Yes you'd have to control that as well. 2 should be fine for > smaller clusters. > > On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum > <[email protected]> wrote: >> Just want to share & check if this is make sense. >> >> Job was failed to run after i restarted the namenode and the cluster >> stopped complain about under-replication. >> >> this is what i found in log file >> >> Requested replication 10 exceeds maximum 2 >> java.io.IOException: file >> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar. >> Requested replication 10 exceeds maximum 2 >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629) >> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143 >> >> >> So, i scanned though those xml config files, and guess to change >> <name>mapred.submit.replication</name> from 10 to 2, and restarted again. >> >> That's when jobs can start running again. >> Hopefully that change is make sense. >> >> >> Thanks >> Patai >> >> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum >> <[email protected]> wrote: >>> Thanks Harsh, dfs.replication.max does do the magic!! >>> >>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <[email protected]> >>> wrote: >>>> Thank you, Harsh. I did not know about dfs.replication.max. >>>> >>>> >>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <[email protected]> wrote: >>>>> >>>>> Hey Chris, >>>>> >>>>> The dfs.replication param is an exception to the <final> config >>>>> feature. If one uses the FileSystem API, one can pass in any short >>>>> value they want the replication to be. This bypasses the >>>>> configuration, and the configuration (being per-file) is also client >>>>> sided. >>>>> >>>>> The right way for an administrator to enforce a "max" replication >>>>> value at a create/setRep level, would be to set >>>>> the dfs.replication.max to a desired value at the NameNode and restart >>>>> it. >>>>> >>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth >>>>> <[email protected]> wrote: >>>>> > Hello Patai, >>>>> > >>>>> > Has your configuration file change been copied to all nodes in the >>>>> > cluster? >>>>> > >>>>> > Are there applications connecting from outside of the cluster? If so, >>>>> > then >>>>> > those clients could have separate configuration files or code setting >>>>> > dfs.replication (and other configuration properties). These would not >>>>> > be >>>>> > limited by final declarations in the cluster's configuration files. >>>>> > <final>true</final> controls configuration file resource loading, but it >>>>> > does not necessarily block different nodes or different applications >>>>> > from >>>>> > running with completely different configurations. >>>>> > >>>>> > Hope this helps, >>>>> > --Chris >>>>> > >>>>> > >>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum >>>>> > <[email protected]> wrote: >>>>> >> >>>>> >> Hi Hadoopers, >>>>> >> >>>>> >> I have >>>>> >> <property> >>>>> >> <name>dfs.replication</name> >>>>> >> <value>2</value> >>>>> >> <final>true</final> >>>>> >> </property> >>>>> >> >>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging >>>>> >> cluster is running the code that will later be deployed in production, >>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than >>>>> >> 2; the number that developer thought that will fit in production >>>>> >> environment. >>>>> >> >>>>> >> Even though I final the property dfs.replication in staging cluster >>>>> >> already. every time i run fsck on the staging cluster i still see it >>>>> >> said under replication. >>>>> >> I thought final keyword will not honor value in job config, but it >>>>> >> doesn't seem so when i run fsck. >>>>> >> >>>>> >> I am on cdh3u4. >>>>> >> >>>>> >> please suggest. >>>>> >> Patai >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>> >>>> > > > > -- > Harsh J
