Re: HDFS replication factor

2018-02-02 Thread रविशंकर नायर
This is solved in Hadoop 3. So stay tuned

Best,

On Feb 2, 2018 6:26 AM, "李立伟"  wrote:

> Hi:
>   It's my understanding that HDFS write  operation is not considered
> completd until all of the replicas have been successfully written.If so,
> does the replication factor affect the write latency? the mapreduce\spark
> task will be affected?
>   is there the way to set HDFS write the first replica synchronously
> and return ,the others in an asynchronous.
>   Thanks in advance.
>
>
>


Re: HDFS Shell tool

2017-02-09 Thread रविशंकर नायर
Superb, fantastic, and a really needed one. I was half way, now let me try
to merge my snippets if necessary.

Best, Ravion

On Feb 9, 2017 10:12 AM, "Vitásek, Ladislav"  wrote:

> Hello Hadoop fans,
> I would like to inform you about our tool we want to share.
>
> We created a new utility - HDFS Shell to work with HDFS more faster.
>
> https://github.com/avast/hdfs-shell
>
> *Feature highlights*
> - HDFS DFS command initiates JVM for each command call, HDFS Shell does it
> only once - which means great speed enhancement when you need to work with
> HDFS more often
> - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* -
> both will work
> - *HDFS path completion using TAB key*
> - you can easily add any other HDFS manipulation function
> - there is a command history persisting in history log
> (~/.hdfs-shell/hdfs-shell.log)
> - support for relative directory + commands *cd* and *pwd*
> - it can be also launched as a daemon (using UNIX domain sockets)
> - 100% Java, it's open source
>
> You suggestions are welcome.
>
> -L. Vitasek aka Vity
>
>


No Reducer scenarios

2017-01-29 Thread रविशंकर नायर
Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer
is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is
invoked.

Now, in the second scenario, we observed that the output is part-m-xx
format(instead of part-r-xx format) , which shows the map output. But we
know that the output of Map is always written to intermediate local file
system. So who/which class is responsible for taking these intermediate Map
outputs from local file system and writes to HDFS ? Does this particular
class performs this write operation only when setNumReduceTasks is set to
zero?

Best, Ravion


Re: Combiner and KeyComposite

2015-10-04 Thread रविशंकर नायर
Are you checking logs at correct place?

On Sun, Oct 4, 2015, 4:39 PM paco  wrote:

> I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
> https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
>
> I have the exact same code, but now I am trying to improve performance so
> I have decided to add a combiner. I have added two modifications:
>
> Main file:
>
> job.setCombinerClass(CombinerK.class);
>
> Combiner file:
>
> public class CombinerK extends Reducer KeyWritable> {
>
> public void reduce(KeyWritable key, Iterator values, Context 
> context) throws IOException, InterruptedException {
>
>
> Iterator it = values;
>
> System.err.println("combiner " + key);
>
> KeyWritable first_value = it.next();
> System.err.println("va: " + first_value);
>
> while (it.hasNext()) {
>
> sum += it.next().getSs();
>
> }
> first_value.setS(sum);
> context.write(key, first_value);
>
>
> }
> }
>
> But it seems that it is not run because I can't find any logs file which
> have the word "combiner". When I saw counters after running, I could see:
>
> Combine input records=404
> Combine output records=404
>
> The combiner seems like it is being executed but it seems as it has been
> receiving a call for each key and by this reason it has the same number in
> input as output.
>


Re: Chaining MapReduce

2015-08-22 Thread रविशंकर नायर
Hi ,

The mappers depend on source data only. But data definitely is going
through all mappers, so I should get number of map jpbs as my output right?
Instead I am getting only one.

Thanks and regards,
Ravion

On Fri, Aug 21, 2015 at 1:35 PM, ☼ R Nair (रविशंकर नायर) 
ravishankar.n...@gmail.com wrote:

 All,

 I have three mappers, followed by a reducer. I executed the map reduce
 successfully. The reported output shows that number of mappers executed is
 1 and number of reducers is also 1. Though number of reducers are correct,
 won't we be getting number of mappers as 3 , since I have three mapper
 classes connected by ChainMapper?

 O/P given below (snippet) :-

 Job Counters
 Launched map tasks=1
 Launched reduce tasks=1
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=8853
 Total time spent by all reduces in occupied slots (ms)=9900
 Total time spent by all map tasks (ms)=8853
 Total time spent by all reduce tasks (ms)=9900
 Total vcore-seconds taken by all map tasks=8853
 Total vcore-seconds taken by all reduce tasks=9900
 Total megabyte-seconds taken by all map tasks=9065472
 Total megabyte-seconds taken by all reduce tasks=10137600


 What I guess is, since the output is passing through Context, the internal
 connected mappers are not caught by job counter, am I correct ?

 Best, Ravion



Chaining MapReduce

2015-08-21 Thread रविशंकर नायर
All,

I have three mappers, followed by a reducer. I executed the map reduce
successfully. The reported output shows that number of mappers executed is
1 and number of reducers is also 1. Though number of reducers are correct,
won't we be getting number of mappers as 3 , since I have three mapper
classes connected by ChainMapper?

O/P given below (snippet) :-

Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8853
Total time spent by all reduces in occupied slots (ms)=9900
Total time spent by all map tasks (ms)=8853
Total time spent by all reduce tasks (ms)=9900
Total vcore-seconds taken by all map tasks=8853
Total vcore-seconds taken by all reduce tasks=9900
Total megabyte-seconds taken by all map tasks=9065472
Total megabyte-seconds taken by all reduce tasks=10137600


What I guess is, since the output is passing through Context, the internal
connected mappers are not caught by job counter, am I correct ?

Best, Ravion