Re: How to attach Intellij debugger to a running process to debug yarn mapreduce Application Master source code?

2017-02-09 Thread Tanvir Rahman
Thanks you Steven for your quick feedback.
Setting yarn.app.mapreduce.am.command-opts works in a cluster, but it is
not working in my local Hadoop setup.
I am trying to find out the reason.


On Thu, Feb 9, 2017 at 2:00 PM, Steven Rand  wrote:

> Have you tried setting yarn.app.mapreduce.am.command-opts? That should
> allow you to set those Java options on only the Application Master and no
> other processes.
>
> On Thu, Feb 9, 2017 at 11:52 AM, Tanvir Rahman 
> wrote:
>
>> Hello everyone,
>> I am currently working on a research project where i need to understand
>> the yarn mapreduce Application Master code in hadoop 2.7.3. I have
>> downloaded hadoop-2.7.3 source code, built it, I have a local Hadoop
>> configuration on my machine. I can successfully run wordcount application
>> in my hadoop setup using intelliJ IDE.
>>
>> However, I can not attach Intellij debugger to a running process to debug
>> Application Master code.
>> I can debug the wordcount application source code but I have trouble
>> attaching the the application master (MRAppMaster.java). I tried to follow
>> suggestions that I found on the web about setting HADOOP_OPTS and/or
>> JAVA_OPTS to something like
>>
>> export 
>> JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
>>
>> but I ran into problems, since the same port can only be used once.
>> Hadoop starts however multiple jvms, and by setting these options generally
>> all jvm try to bind  to that port.
>>
>> Does anybody have experience on how to specifically attach to the
>> mapreduce application master ?
>>
>> Thanks in advance
>> Tanvir
>>
>
>


Re: HDFS Shell tool

2017-02-09 Thread Ravi Prakash
Great job Vity!

Thanks a lot for sharing. Have you thought about using WebHDFS?

Thanks
Ravi

On Thu, Feb 9, 2017 at 7:12 AM, Vitásek, Ladislav  wrote:

> Hello Hadoop fans,
> I would like to inform you about our tool we want to share.
>
> We created a new utility - HDFS Shell to work with HDFS more faster.
>
> https://github.com/avast/hdfs-shell
>
> *Feature highlights*
> - HDFS DFS command initiates JVM for each command call, HDFS Shell does it
> only once - which means great speed enhancement when you need to work with
> HDFS more often
> - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* -
> both will work
> - *HDFS path completion using TAB key*
> - you can easily add any other HDFS manipulation function
> - there is a command history persisting in history log
> (~/.hdfs-shell/hdfs-shell.log)
> - support for relative directory + commands *cd* and *pwd*
> - it can be also launched as a daemon (using UNIX domain sockets)
> - 100% Java, it's open source
>
> You suggestions are welcome.
>
> -L. Vitasek aka Vity
>
>


Re: How to attach Intellij debugger to a running process to debug yarn mapreduce Application Master source code?

2017-02-09 Thread Steven Rand
Have you tried setting yarn.app.mapreduce.am.command-opts? That should
allow you to set those Java options on only the Application Master and no
other processes.

On Thu, Feb 9, 2017 at 11:52 AM, Tanvir Rahman 
wrote:

> Hello everyone,
> I am currently working on a research project where i need to understand
> the yarn mapreduce Application Master code in hadoop 2.7.3. I have
> downloaded hadoop-2.7.3 source code, built it, I have a local Hadoop
> configuration on my machine. I can successfully run wordcount application
> in my hadoop setup using intelliJ IDE.
>
> However, I can not attach Intellij debugger to a running process to debug
> Application Master code.
> I can debug the wordcount application source code but I have trouble
> attaching the the application master (MRAppMaster.java). I tried to follow
> suggestions that I found on the web about setting HADOOP_OPTS and/or
> JAVA_OPTS to something like
>
> export 
> JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
>
> but I ran into problems, since the same port can only be used once. Hadoop
> starts however multiple jvms, and by setting these options generally all
> jvm try to bind  to that port.
>
> Does anybody have experience on how to specifically attach to the
> mapreduce application master ?
>
> Thanks in advance
> Tanvir
>


How to attach Intellij debugger to a running process to debug yarn mapreduce Application Master source code?

2017-02-09 Thread Tanvir Rahman
Hello everyone,
I am currently working on a research project where i need to understand the
yarn mapreduce Application Master code in hadoop 2.7.3. I have downloaded
hadoop-2.7.3 source code, built it, I have a local Hadoop configuration on
my machine. I can successfully run wordcount application in my hadoop setup
using intelliJ IDE.

However, I can not attach Intellij debugger to a running process to debug
Application Master code.
I can debug the wordcount application source code but I have trouble
attaching the the application master (MRAppMaster.java). I tried to follow
suggestions that I found on the web about setting HADOOP_OPTS and/or
JAVA_OPTS to something like

export 
JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"

but I ran into problems, since the same port can only be used once. Hadoop
starts however multiple jvms, and by setting these options generally all
jvm try to bind  to that port.

Does anybody have experience on how to specifically attach to the mapreduce
application master ?

Thanks in advance
Tanvir


Aw: HDFS Shell tool

2017-02-09 Thread Uwe Geercken

Hello,

 

that is very cool. Thanks for the contribution. It worked out of the box and makes working a breeze.

 

Greetings,

 

Uwe

 

Gesendet: Donnerstag, 09. Februar 2017 um 16:12 Uhr
Von: "Vitásek, Ladislav" 
An: user@hadoop.apache.org
Betreff: HDFS Shell tool





Hello Hadoop fans,
I would like to inform you about our tool we want to share.

We created a new utility - HDFS Shell to work with HDFS more faster.

https://github.com/avast/hdfs-shell

Feature highlights
- HDFS DFS command initiates JVM for each command call, HDFS Shell does it only once - which means great speed enhancement when you need to work with HDFS more often
- Commands can be used in a short way - eg. hdfs dfs -ls /, ls / - both will work
- HDFS path completion using TAB key
- you can easily add any other HDFS manipulation function
- there is a command history persisting in history log (~/.hdfs-shell/hdfs-shell.log)
- support for relative directory + commands cd and pwd

- it can be also launched as a daemon (using UNIX domain sockets)
- 100% Java, it's open source
 
You suggestions are welcome.
 
-L. Vitasek aka Vity


 







-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Max Application Master Resources with Queue Elasticity

2017-02-09 Thread Sunil Govind
Hello Benson

If I view "Maximum Application Master Resources" on the ResourceManager Web
UI for QueueA, I should see 4096MB, correct?
> Yes. Are u seeing any different behavior? if so, please share
cap-sched.xml.

shouldn't we be able to run 4 uber-mode jobs on QueueA without waiting or
using preemption?
> Yes, it should be.

are you saying that 20% of 5GB is 1GB, so we can only run 1 uber-mode job
even though 5GB is available?
> ideally no. we take max(queue capacity, available limit) * am-res-pcnt.

- Sunil


On Tue, Feb 7, 2017 at 11:51 PM Benson Qiu 
wrote:

> Hi Sunil,
>
> Thanks for your reply!
>
> I have some follow up questions to make sure I fully understand the
> scenario you mentioned (QueueA has 50% capacity, 100% max-capacity, 20%
> maximum-am-resource-percent, cluster resource is 20GB, AM container size is
> 1GB, QueueB has taken over 15GB).
>
> Adding on, lets assume the following:
> - All jobs run in uber mode so we don't need to worry about additional
> resources for map and reduce containers.
> - root.QueueA and root.QueueB are the only two queues on the cluster.
> - user-limit-factor is high enough that a single user can use all of
> QueueA and QueueB's elasticity.
>
> Some questions:
> 1. If I view "Maximum Application Master Resources" on the ResourceManager
> Web UI for QueueA, I should see 4096MB, correct? (QueueA elastically can
> use 100% of the 20GB cluster. 20% of 20GB = 4096MB).
> 2. At the current point in time when QueueB is using 15GB, QueueA has 5GB
> available. Since "Maximum Application Master Resources" is 4096MB, and 5GB
> is available, shouldn't we be able to run 4 uber-mode jobs on QueueA
> without waiting or using preemption? Or are you saying that 20% of 5GB is
> 1GB, so we can only run 1 uber-mode job even though 5GB is available?
>
> Thanks,
> Benson
>
> On Mon, Feb 6, 2017 at 9:25 PM, Sunil Govind 
> wrote:
>
> Hello Benson
>
> I could help to explain a little bit here.
>
> maximum-am-resource-percent could be configured per-queue level (from
> next release, it could be configure per node-label level as well). By
> default 10% is default, and hence 10% of queue's capacity could be used for
> running AM resources. However due to elasticity, a queue could have
> resources above its configured capacity. In that case, "Max Application
> Master Resources" will be considering queue's max limit.
>
> To answer your question, Yes. Ideally this resources is available for
> running AM. However there could many other reasons by which this resource
> may not be available for AM. To list a few, assume QueueA has 50% capacity
> and 100% as its max-capacity. AM resource percentage is 20%. Cluster
> resource is 20GB.
> - Assume QueueB has taken over 15GB. And one app is running in QueueA with
> 1GB as AM resource. As per calculation 4GB could go to AM resource.
> However, we need to wait till some resource are freed from QueueB or use
> preemption.
> - User limit. If user-limit-factor is <=1, then you may not be able to get
> more resources for elasticity.
>
> If you tune all params as per your scenario, and if there are enough
> resources in cluster, you could avail this resource for AM.
>
> Thanks
> Sunil
>
> On Tue, Feb 7, 2017 at 9:20 AM Benson Qiu 
> wrote:
>
> Hi,
>
> I noticed that "Max Application Master Resources" on the ResourceManager
> UI (/cluster/scheduler) takes into account queue elasticity.
>
> AMResourceLimit and userAMResourceLimit on the ResourceManager API
> (/ws/v1/cluster/scheduler) also takes into account queue elasticity.
>
> Are these AM resources always guaranteed? If a queue cannot grow because
> all of the other queues in the cluster are fully utilized, does the queue
> still have "Max Application Master Resources" available for AM containers?
>
> Thanks,
> Benson
>
>
>


Re: HDFS Shell tool

2017-02-09 Thread रविशंकर नायर
Superb, fantastic, and a really needed one. I was half way, now let me try
to merge my snippets if necessary.

Best, Ravion

On Feb 9, 2017 10:12 AM, "Vitásek, Ladislav"  wrote:

> Hello Hadoop fans,
> I would like to inform you about our tool we want to share.
>
> We created a new utility - HDFS Shell to work with HDFS more faster.
>
> https://github.com/avast/hdfs-shell
>
> *Feature highlights*
> - HDFS DFS command initiates JVM for each command call, HDFS Shell does it
> only once - which means great speed enhancement when you need to work with
> HDFS more often
> - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* -
> both will work
> - *HDFS path completion using TAB key*
> - you can easily add any other HDFS manipulation function
> - there is a command history persisting in history log
> (~/.hdfs-shell/hdfs-shell.log)
> - support for relative directory + commands *cd* and *pwd*
> - it can be also launched as a daemon (using UNIX domain sockets)
> - 100% Java, it's open source
>
> You suggestions are welcome.
>
> -L. Vitasek aka Vity
>
>


HDFS Shell tool

2017-02-09 Thread Vitásek , Ladislav
Hello Hadoop fans,
I would like to inform you about our tool we want to share.

We created a new utility - HDFS Shell to work with HDFS more faster.

https://github.com/avast/hdfs-shell

*Feature highlights*
- HDFS DFS command initiates JVM for each command call, HDFS Shell does it
only once - which means great speed enhancement when you need to work with
HDFS more often
- Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* - both
will work
- *HDFS path completion using TAB key*
- you can easily add any other HDFS manipulation function
- there is a command history persisting in history log
(~/.hdfs-shell/hdfs-shell.log)
- support for relative directory + commands *cd* and *pwd*
- it can be also launched as a daemon (using UNIX domain sockets)
- 100% Java, it's open source

You suggestions are welcome.

-L. Vitasek aka Vity