Re: HDFS replication factor
This is solved in Hadoop 3. So stay tuned Best, On Feb 2, 2018 6:26 AM, "李立伟"wrote: > Hi: > It's my understanding that HDFS write operation is not considered > completd until all of the replicas have been successfully written.If so, > does the replication factor affect the write latency? the mapreduce\spark > task will be affected? > is there the way to set HDFS write the first replica synchronously > and return ,the others in an asynchronous. > Thanks in advance. > > >
Re: HDFS Shell tool
Superb, fantastic, and a really needed one. I was half way, now let me try to merge my snippets if necessary. Best, Ravion On Feb 9, 2017 10:12 AM, "Vitásek, Ladislav"wrote: > Hello Hadoop fans, > I would like to inform you about our tool we want to share. > > We created a new utility - HDFS Shell to work with HDFS more faster. > > https://github.com/avast/hdfs-shell > > *Feature highlights* > - HDFS DFS command initiates JVM for each command call, HDFS Shell does it > only once - which means great speed enhancement when you need to work with > HDFS more often > - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* - > both will work > - *HDFS path completion using TAB key* > - you can easily add any other HDFS manipulation function > - there is a command history persisting in history log > (~/.hdfs-shell/hdfs-shell.log) > - support for relative directory + commands *cd* and *pwd* > - it can be also launched as a daemon (using UNIX domain sockets) > - 100% Java, it's open source > > You suggestions are welcome. > > -L. Vitasek aka Vity > >
No Reducer scenarios
Dear all, 1) When we don't set the reducer class in driver program, IdentityReducer is invoked. 2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked. Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero? Best, Ravion
Re: Combiner and KeyComposite
Are you checking logs at correct place? On Sun, Oct 4, 2015, 4:39 PM pacowrote: > I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: > https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ > > I have the exact same code, but now I am trying to improve performance so > I have decided to add a combiner. I have added two modifications: > > Main file: > > job.setCombinerClass(CombinerK.class); > > Combiner file: > > public class CombinerK extends Reducer KeyWritable> { > > public void reduce(KeyWritable key, Iterator values, Context > context) throws IOException, InterruptedException { > > > Iterator it = values; > > System.err.println("combiner " + key); > > KeyWritable first_value = it.next(); > System.err.println("va: " + first_value); > > while (it.hasNext()) { > > sum += it.next().getSs(); > > } > first_value.setS(sum); > context.write(key, first_value); > > > } > } > > But it seems that it is not run because I can't find any logs file which > have the word "combiner". When I saw counters after running, I could see: > > Combine input records=404 > Combine output records=404 > > The combiner seems like it is being executed but it seems as it has been > receiving a call for each key and by this reason it has the same number in > input as output. >
Re: Chaining MapReduce
Hi , The mappers depend on source data only. But data definitely is going through all mappers, so I should get number of map jpbs as my output right? Instead I am getting only one. Thanks and regards, Ravion On Fri, Aug 21, 2015 at 1:35 PM, ☼ R Nair (रविशंकर नायर) ravishankar.n...@gmail.com wrote: All, I have three mappers, followed by a reducer. I executed the map reduce successfully. The reported output shows that number of mappers executed is 1 and number of reducers is also 1. Though number of reducers are correct, won't we be getting number of mappers as 3 , since I have three mapper classes connected by ChainMapper? O/P given below (snippet) :- Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8853 Total time spent by all reduces in occupied slots (ms)=9900 Total time spent by all map tasks (ms)=8853 Total time spent by all reduce tasks (ms)=9900 Total vcore-seconds taken by all map tasks=8853 Total vcore-seconds taken by all reduce tasks=9900 Total megabyte-seconds taken by all map tasks=9065472 Total megabyte-seconds taken by all reduce tasks=10137600 What I guess is, since the output is passing through Context, the internal connected mappers are not caught by job counter, am I correct ? Best, Ravion
Chaining MapReduce
All, I have three mappers, followed by a reducer. I executed the map reduce successfully. The reported output shows that number of mappers executed is 1 and number of reducers is also 1. Though number of reducers are correct, won't we be getting number of mappers as 3 , since I have three mapper classes connected by ChainMapper? O/P given below (snippet) :- Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8853 Total time spent by all reduces in occupied slots (ms)=9900 Total time spent by all map tasks (ms)=8853 Total time spent by all reduce tasks (ms)=9900 Total vcore-seconds taken by all map tasks=8853 Total vcore-seconds taken by all reduce tasks=9900 Total megabyte-seconds taken by all map tasks=9065472 Total megabyte-seconds taken by all reduce tasks=10137600 What I guess is, since the output is passing through Context, the internal connected mappers are not caught by job counter, am I correct ? Best, Ravion