from:"Ravi Prakash"

Re: PendingDeletionBlocks immediately after Namenode failover

2017-11-13 Thread Ravi Prakash

Hi Michael!

Thank you for the report. I'm sorry I don't have advice other than the
generic advice, like please try a newer version of Hadoop (say
Hadoop-2.8.2) . You seem to already know that the BlockManager is the place
to look.

If you found it to be a legitimate issue which could affect Apache Hadoop
and still hasn't been fixed in trunk ( https://github.com/apache/hadoop ),
could you please create a new JIRA for it here
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116=HDFS
?

Thanks
Ravi

On Wed, Nov 8, 2017 at 7:50 PM, Michael Parkin 
wrote:

> Hello,
>
> We're seeing some unusual behavior in our two HDFS 2.6.0
> (CDH5.11.1) clusters and was wondering if you could help. When we failover
> our Namenodes we observe a large number of PendingDeletionBlocks blocks -
> i.e., the metric is zero before failover and several thousand after.
>
> This seems different to the PostponedMisreplicatedBlocks [1] (expected
> before all the datanodes have sent their block reports to the new active
> namenode and the number of NumStaleStorages is zero) - we see that metric
> become zero once all the block reports have been received. What we're
> seeing is that PendingDeletionBlocks increases immediately after
> failover, when NumStaleStorages is ~equal to the number of datanodes in
> the cluster.
>
> The amount of extra space used is a problem as we have to increase our
> cluster size to accommodate these blocks until the Namenodes are
> failed-over.  We've checked the debug logs, metasave report, and other jmx
> metrics and everything appears fine before we fail-over - apart from the
> amount of dfs used growing then decreasing.
>
> We can't find anything obviously wrong with the HDFS configuration, HA
> setup, etc. Any help on where to look/debug next would be appreciated.
>
> Thanks,
>
> Michael.
>
> [1] https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5.
> 11.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apach
> e/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047
>
> --
>
>

Re: Vulnerabilities to UserGroupInformation / credentials in a Spark Cluster

2017-10-31 Thread Ravi Prakash

Hi Blaze!

Thanks for the link, although it did not have anything I didn't already
know. I'm afraid I don't quite follow what your concern is here. The files
are protected using UNIX permissions on the worker nodes. Is that not what
you are seeing? Are you using the LinuxContainerExecutor? Are the yarn
containers running as the user who launched the yarn application? Are you
saying the permissions on the file should be different?

Ravi

On Mon, Oct 30, 2017 at 9:22 PM, Blaze Spinnaker <blazespinna...@gmail.com>
wrote:

> Ravi,
>
> The code and architecture is based on the Hadoop source code submitted
> through the Yarn Client.This is an issue for map reduce as well.  eg:
> https://pravinchavan.wordpress.com/2013/04/25/223/
>
> On Mon, Oct 30, 2017 at 1:15 PM, Ravi Prakash <ravihad...@gmail.com>
> wrote:
>
>> Hi Blaze!
>>
>> Thanks for digging into this. I'm sure security related features could
>> use more attention. Tokens for one user should be isolated from other
>> users. I'm sorry I don't know how spark uses them.
>>
>> Would this question be more appropriate on the spark mailing list?
>> https://spark.apache.org/community.html
>>
>> Thanks
>> Ravi
>>
>> On Mon, Oct 30, 2017 at 12:43 PM, Blaze Spinnaker <
>> blazespinna...@gmail.com> wrote:
>>
>>> I looked at this a bit more and I see a container_tokens file in spark
>>> directory.   Does this contain the credentials where are added by
>>> addCredentials?   Is this file accessible to the spark executors?
>>>
>>> It looks like just a clear text protobuf file.
>>>
>>> https://github.com/apache/hadoop/blob/82cb2a6497caa7c5e693aa
>>> 41ad18e92f1c7eb16a/hadoop-common-project/hadoop-common/src/
>>> main/java/org/apache/hadoop/security/Credentials.java#L221
>>>
>>> This means that anyone with access to the user can read credentials from
>>> any other user.  Correct?
>>>
>>> On Mon, Oct 30, 2017 at 12:28 PM, Blaze Spinnaker <
>>> blazespinna...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are submitting critical UserGroupInformation credentials and wanted
>>>> to know how these are protected in Spark Cluster.
>>>>
>>>> Questions:
>>>>
>>>> Are the credentials persisted to disk at any point?  If so, where?
>>>> If they are persisted, are they encrypted? Or just obfuscated?  is the
>>>> encryption key accessible?
>>>> Are they only protected by file permissions?
>>>>
>>>> Are they only in memory?
>>>>
>>>> How would you securely propagate UGI / credentials to spark executors?
>>>>
>>>> Regards,
>>>>
>>>> Tim
>>>>
>>>
>>>
>>
>

Re: Unable to append to a file in HDFS

2017-10-31 Thread Ravi Prakash

HI Tarik!

I'm glad you were able to diagnose your issue. Thanks for sharing with the
user list. I suspect your writer may have set minimum replication to 3, and
since you have only 2 datanodes, the Namenode will not allow you to
successfully close the file. You could add another node or reduce the
minimum replication.

HTH,
Ravi

On Mon, Oct 30, 2017 at 3:13 PM, Tarik Courdy <tarik.cou...@gmail.com>
wrote:

> Hello Ravi -
>
> I have pin pointed my issue a little more.  When I create a file with a
> dfs.replication factor of 3 I can never append.  However, if I create a
> file with a dfs.replication factor of 1 then I can append to the file all
> day long.
>
> Thanks again for your help regarding this.
>
> -Tarik
>
> On Mon, Oct 30, 2017 at 2:46 PM, Tarik Courdy <tarik.cou...@gmail.com>
> wrote:
>
>> Hello Ravi -
>>
>> I greped the directory that has my logs and couldn't find any instance of
>> "NameNode.complete".
>>
>> I just created a new file in hdfs using hdfs -touchz and it is allowing
>> me to append to it with no problem.
>>
>> Not sure who is holding the eternal lease on my first file.
>>
>> Thanks again for your time.
>>
>> -Tarik
>>
>> On Mon, Oct 30, 2017 at 2:19 PM, Ravi Prakash <ravihad...@gmail.com>
>> wrote:
>>
>>> Hi Tarik!
>>>
>>> You're welcome! If you look at the namenode logs, do you see a "DIR*
>>> NameNode.complete: "  message ? It should have been written when the first
>>> client called close().
>>>
>>> Cheers
>>> Ravi
>>>
>>> On Mon, Oct 30, 2017 at 1:13 PM, Tarik Courdy <tarik.cou...@gmail.com>
>>> wrote:
>>>
>>>> Hello Ravi -
>>>>
>>>> Thank you for your response.  I have read about the soft and hard lease
>>>> limits, however no matter how long I wait I am never able to write again to
>>>> the file that I first created and wrote to the first time.
>>>>
>>>> Thanks again.
>>>>
>>>> -Tarik
>>>>
>>>> On Mon, Oct 30, 2017 at 2:08 PM, Ravi Prakash <ravihad...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Tarik!
>>>>>
>>>>> The lease is owned by a client. If you launch 2 client programs, they
>>>>> will be viewed as separate (even though the user is same). Are you sure 
>>>>> you
>>>>> closed the file when you first wrote it? Did the client program which 
>>>>> wrote
>>>>> the file, exit cleanly? In any case, after the namenode lease hard
>>>>> timeout
>>>>> <https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java#L82>,
>>>>> the lease will be recovered, and you ought to be able to append to it. Is
>>>>> that not what you are seeing?
>>>>>
>>>>> HTH
>>>>> Ravi
>>>>>
>>>>> On Mon, Oct 30, 2017 at 11:04 AM, Tarik Courdy <tarik.cou...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Good morning -
>>>>>>
>>>>>> I have a file in hdfs that I can write to once but when I try to
>>>>>> append to it I receive an error stating that someone else owns the file
>>>>>> lease.
>>>>>>
>>>>>> I am the only one trying to append to this file.  I have also made
>>>>>> sure that dfs.support.append has been set to true.  Additionally, I have
>>>>>> also tried setting the the dfs.replication to 1 since I read this had
>>>>>> helped someone else with this issue.
>>>>>>
>>>>>> However, neither of these have allowed me to append to the file.
>>>>>>
>>>>>> My HDFS setup consists of a name node, a secondary name node, and 2
>>>>>> data nodes.
>>>>>>
>>>>>> Any suggestions that you might be able to provide would be greatly
>>>>>> appreciated.
>>>>>>
>>>>>> Thank you for your time.
>>>>>>
>>>>>> -Tarik
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unable to append to a file in HDFS

2017-10-30 Thread Ravi Prakash

Hi Tarik!

You're welcome! If you look at the namenode logs, do you see a "DIR*
NameNode.complete: "  message ? It should have been written when the first
client called close().

Cheers
Ravi

On Mon, Oct 30, 2017 at 1:13 PM, Tarik Courdy <tarik.cou...@gmail.com>
wrote:

> Hello Ravi -
>
> Thank you for your response.  I have read about the soft and hard lease
> limits, however no matter how long I wait I am never able to write again to
> the file that I first created and wrote to the first time.
>
> Thanks again.
>
> -Tarik
>
> On Mon, Oct 30, 2017 at 2:08 PM, Ravi Prakash <ravihad...@gmail.com>
> wrote:
>
>> Hi Tarik!
>>
>> The lease is owned by a client. If you launch 2 client programs, they
>> will be viewed as separate (even though the user is same). Are you sure you
>> closed the file when you first wrote it? Did the client program which wrote
>> the file, exit cleanly? In any case, after the namenode lease hard
>> timeout
>> <https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java#L82>,
>> the lease will be recovered, and you ought to be able to append to it. Is
>> that not what you are seeing?
>>
>> HTH
>> Ravi
>>
>> On Mon, Oct 30, 2017 at 11:04 AM, Tarik Courdy <tarik.cou...@gmail.com>
>> wrote:
>>
>>> Good morning -
>>>
>>> I have a file in hdfs that I can write to once but when I try to append
>>> to it I receive an error stating that someone else owns the file lease.
>>>
>>> I am the only one trying to append to this file.  I have also made sure
>>> that dfs.support.append has been set to true.  Additionally, I have also
>>> tried setting the the dfs.replication to 1 since I read this had helped
>>> someone else with this issue.
>>>
>>> However, neither of these have allowed me to append to the file.
>>>
>>> My HDFS setup consists of a name node, a secondary name node, and 2 data
>>> nodes.
>>>
>>> Any suggestions that you might be able to provide would be greatly
>>> appreciated.
>>>
>>> Thank you for your time.
>>>
>>> -Tarik
>>>
>>
>>
>

Re: Vulnerabilities to UserGroupInformation / credentials in a Spark Cluster

2017-10-30 Thread Ravi Prakash

Hi Blaze!

Thanks for digging into this. I'm sure security related features could use
more attention. Tokens for one user should be isolated from other users.
I'm sorry I don't know how spark uses them.

Would this question be more appropriate on the spark mailing list?
https://spark.apache.org/community.html

Thanks
Ravi

On Mon, Oct 30, 2017 at 12:43 PM, Blaze Spinnaker 
wrote:

> I looked at this a bit more and I see a container_tokens file in spark
> directory.   Does this contain the credentials where are added by
> addCredentials?   Is this file accessible to the spark executors?
>
> It looks like just a clear text protobuf file.
>
> https://github.com/apache/hadoop/blob/82cb2a6497caa7c5e693aa41ad18e9
> 2f1c7eb16a/hadoop-common-project/hadoop-common/src/
> main/java/org/apache/hadoop/security/Credentials.java#L221
>
> This means that anyone with access to the user can read credentials from
> any other user.  Correct?
>
> On Mon, Oct 30, 2017 at 12:28 PM, Blaze Spinnaker <
> blazespinna...@gmail.com> wrote:
>
>> Hi,
>>
>> We are submitting critical UserGroupInformation credentials and wanted to
>> know how these are protected in Spark Cluster.
>>
>> Questions:
>>
>> Are the credentials persisted to disk at any point?  If so, where?
>> If they are persisted, are they encrypted? Or just obfuscated?  is the
>> encryption key accessible?
>> Are they only protected by file permissions?
>>
>> Are they only in memory?
>>
>> How would you securely propagate UGI / credentials to spark executors?
>>
>> Regards,
>>
>> Tim
>>
>
>

Re: Unable to append to a file in HDFS

2017-10-30 Thread Ravi Prakash

Hi Tarik!

The lease is owned by a client. If you launch 2 client programs, they will
be viewed as separate (even though the user is same). Are you sure you
closed the file when you first wrote it? Did the client program which wrote
the file, exit cleanly? In any case, after the namenode lease hard timeout
,
the lease will be recovered, and you ought to be able to append to it. Is
that not what you are seeing?

HTH
Ravi

On Mon, Oct 30, 2017 at 11:04 AM, Tarik Courdy 
wrote:

> Good morning -
>
> I have a file in hdfs that I can write to once but when I try to append to
> it I receive an error stating that someone else owns the file lease.
>
> I am the only one trying to append to this file.  I have also made sure
> that dfs.support.append has been set to true.  Additionally, I have also
> tried setting the the dfs.replication to 1 since I read this had helped
> someone else with this issue.
>
> However, neither of these have allowed me to append to the file.
>
> My HDFS setup consists of a name node, a secondary name node, and 2 data
> nodes.
>
> Any suggestions that you might be able to provide would be greatly
> appreciated.
>
> Thank you for your time.
>
> -Tarik
>

Re:

2017-10-30 Thread Ravi Prakash

And one of the good things about open-source projects like Hadoop, you can
read all about why :-) : https://issues.apache.org/jira/browse/HADOOP-4952

Enjoy!
Ravi

On Mon, Oct 30, 2017 at 11:54 AM, Ravi Prakash <ravihad...@gmail.com> wrote:

> Hi Doris!
>
> FileContext was created to overcome some of the limitations that we
> learned FileSystem had after a lot of experience. Unfortunately, a lot of
> code (i'm guessing maybe even the majority) still uses FileSystem.
>
> I suspect FileContext is probably the interface you want to use.
>
> HTH,
> Ravi
>
> On Fri, Oct 27, 2017 at 8:14 AM, <gu.yiz...@zte.com.cn> wrote:
>
>> Hi All,
>> As an application over hadoop, is it recommended to use
>> "org.apache.hadoop.fs
>> Class FileContext" rather then "org.apache.hadoop.fs Class FileSystem"?
>> And why, or why not?
>> Besides, my target version will be Apache Hadoop V2.7.3, and the
>> application will be running over both HDFS HA and
>> Federation, I wish my application code could be more flexible.
>> Thanks a lot!
>> Doris
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>
>
>

Re:

2017-10-30 Thread Ravi Prakash

Hi Doris!

FileContext was created to overcome some of the limitations that we learned
FileSystem had after a lot of experience. Unfortunately, a lot of code (i'm
guessing maybe even the majority) still uses FileSystem.

I suspect FileContext is probably the interface you want to use.

HTH,
Ravi

On Fri, Oct 27, 2017 at 8:14 AM,  wrote:

> Hi All,
> As an application over hadoop, is it recommended to use
> "org.apache.hadoop.fs
> Class FileContext" rather then "org.apache.hadoop.fs Class FileSystem"?
> And why, or why not?
> Besides, my target version will be Apache Hadoop V2.7.3, and the
> application will be running over both HDFS HA and
> Federation, I wish my application code could be more flexible.
> Thanks a lot!
> Doris
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-13 Thread Ravi Prakash

Hi Kevin!

The ApplicationMaster doesn't really need any more configuration I think.
Here's something to try out. Launch a very long mapreduce job:

# A sleep job with 1 mapper and 1 reducer.  (All the mapper and reducer do
is sleep for the duration specified in -mt and -rt)
yarn jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar
sleep -m 1 -r 1 -mt 999 -rt 999

Once the job starts running, follow the proxy URL. It should be served by
the MapReduce ApplicationMaster. If you are able to attach debuggers to the
RM, you can set breakpoints in
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
.

I'm afraid the combinatorial space of all possible configuration is too
huge to determine what is wrong with your cluster :( .

HTH
Ravi

On Tue, Sep 12, 2017 at 10:27 PM, Kevin Buckley <
kevin.buckley.ecs.vuw.ac...@gmail.com> wrote:

> On 9 September 2017 at 05:17, Ravi Prakash <ravihad...@gmail.com> wrote:
>
> > I'm not sure my reply will be entirely helpful, but here goes.
>
> It sheds more light on things than I previously understood, Ravi, so cheers
>
> > The ResourceManager either proxies your request to the ApplicationMaster
> (if
> > the application is running), or (once the application is finished)
> serves it
> > itself if the job is in the "cache" (usually the last 1
> applications) or
> > redirects to the MapReduce JHS if its a MapReduce job.
>
> That suggests that I don't have the ApplicationMaster(s) setup correctly
> (or possibly at all!) or that the caching setup is wrong because the
> ResourceManager and JobHistoryServer clearly both has the Job Info
> once the Jobs have ended, as I listed before
>
> >  http://rmserver.ecs.vuw.ac.nz:8088/cluster/app/application_
> 1234567890123_4567/
> >
> > Similarly, over on the Job History Server, we can get to a page,
> > related to the job
> >
> >  http://jhserver.ecs.vuw.ac.nz:19888/jobhistory/job/job_
> 1234567890123_4567/
>
> however, trying to access those, though the "proxy channel"
>
> http://rmserver.ecs.vuw.ac.nz:8088/proxy/
>
> URI doesn't take anyone anywhere.
>
> Thanks again for the insight: I think know where I need to look now,
> Kevin
>

Re: Apache ambari

2017-09-08 Thread Ravi Prakash

Hi Sidharth!

The question seems relevant to the Ambari list :
https://ambari.apache.org/mail-lists.html

Cheers
Ravi

On Fri, Sep 8, 2017 at 1:15 AM, sidharth kumar 
wrote:

> Hi,
>
> Apache ambari is open source. So,can we setup Apache ambari to manage
> existing Apache Hadoop cluster ?
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599> / 7892 192
> 367
> LinkedIn:www.linkedin.com/in/sidharthkumar2792
>
>
>
>

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-08 Thread Ravi Prakash

Hi Kevin!

I'm not sure my reply will be entirely helpful, but here goes.

The ResourceManager either proxies your request to the ApplicationMaster
(if the application is running), or (once the application is finished)
serves it itself if the job is in the "cache" (usually the last 1
applications) or redirects to the MapReduce JHS if its a MapReduce job.

I doubt that kerberization has any role to play other than the original
AuthenticationHandler. Things should work just as without kerberos.

HTH
Ravi

On Thu, Sep 7, 2017 at 6:37 PM, Kevin Buckley <
kevin.buckley.ecs.vuw.ac...@gmail.com> wrote:

> Hi again,
>
> my attempts to Kerberise our Hadoop instance seem to
> have things working OK, although one of the users has
> reported the following issue:
>
> The console output, from a running job, suggests following a link to
> the RM server's WebGUI, akin to
>
>   http://rmserver.ecs.vuw.ac.nz:8088/proxy/application_1234567890123_4567/
>
> but that URI doesn't appear to be being served by the RM server's WebGUI.
>
>
> However, starting from the RM server's WebGUI "About the Cluster"
> page's Applications view:
>
>   http://rmserver.ecs.vuw.ac.nz:8088/cluster/apps
>
> we can get to the following page, related to the job
>
>   http://rmserver.ecs.vuw.ac.nz:8088/cluster/app/application_
> 1234567890123_4567/
>
>
> Similarly, over on the Job History Server, we can get to a page,
> related to the job
>
>  http://jhserver.ecs.vuw.ac.nz:19888/jobhistory/job/job_
> 1234567890123_4567/
>
>
>
> What, if anything, is likely to be missing from the configuration,
> that produces the "8088/proxy" path in the URIs that the console
> output presents ?
>
> Kevin
>
> ---
> Kevin M. Buckley
>
> eScience Consultant
> School of Engineering and Computer Science
> Victoria University of Wellington
> New Zealand
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: When is an hdfs-* service restart required?

2017-09-07 Thread Ravi Prakash

Hi Kellen!

The first part of the configuration is a good indication of which service
you need to restart. Unfortunately the only way to be completely sure is to
read the codez. e.g. most hdfs configuration is mapped to variables in
DFSConfigKeys

$ find . -name *.java | grep -v test | xargs grep
"dfs.datanode.handler.count"
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:
public static final String  DFS_DATANODE_HANDLER_COUNT_KEY =
"dfs.datanode.handler.count";

Looking at where this is used:
$ find . -name *.java | grep -v test | xargs grep
DFS_DATANODE_HANDLER_COUNT_KEY
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java:import
static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_HANDLER_COUNT_KEY;
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java:
getConf().getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:
public static final String  DFS_DATANODE_HANDLER_COUNT_KEY =
"dfs.datanode.handler.count";

Cheers,
Ravi

On Thu, Sep 7, 2017 at 10:46 AM, Kellen Arb  wrote:

> Hello,
>
> I have a seemingly simple question, to which I can't find a clear answer.
>
> Which services/node-types must be restarted for each of the configuration
> properties? For example, if I update the 'dfs.datanode.handler.count'
> property in the `hdfs-site.xml` configuration file, which services must be
> restarted? Can I get away with only restarting `datanodes`, or do I also
> need to restart `journalnodes` and/or `namenodes`, `zkfc` etc.
>
> Just looking for clarification on this issue, though it seems that
> generally the advice is "restart everything”?
>
> Thank you,
> Kellen Arb
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: unsubscribe

2017-08-29 Thread Ravi Prakash

Hi Corne!

Please send an email to user-unsubscr...@hadoop.apache.org as mentioned on
https://hadoop.apache.org/mailing_lists.html

Thanks

On Sun, Aug 27, 2017 at 10:25 PM, Corne Van Rensburg 
wrote:

> [image: Softsure]
>
> unsubscribe
>
>
>
> *Corne Van RensburgManaging Director Softsure*
> [image: Tel] 044 805 3746
> [image: Fax]
> [image: Email] co...@softsure.co.za
> *Softsure (Pty) Ltd | Registration No. 2004/008528/07 | 127A York Street,
> George, 6530 *
>
> Disclaimer
> The views and opinions expressed in this email are those of the author and
> do not necessarily reflect the views and opinions of Softsure (Pty) Ltd,
> its directors or management. Softsure (Pty) Ltd expressly reserves the
> right to manage, monitor and intercept emails. Softsure (Pty) Ltd do not
> warrant that this email is free of viruses, worms, Trojan horses or other
> harmful programmes. This email is intended for the addressed recipient
> alone, and access, copying, distribution, acting or omitting to act
> pursuant to the receipt of the email may be unlawful. No liability is
> accepted if the information contained in this email is corrupted or fails
> to reach the addressee. The information contained in this email is
> confidential.
>
>

Re:

2017-08-29 Thread Ravi Prakash

Hi Dominique,

Please send an email to user-unsubscr...@hadoop.apache.org as mentioned on
https://hadoop.apache.org/mailing_lists.html

Thanks
Ravi

2017-08-26 10:49 GMT-07:00 Dominique Rozenberg :

> unsubscribe
>
>
>
>
>
> [image: cid:image001.jpg@01D10A65.E830C520]
>
> *דומיניק רוזנברג*, מנהלת פרויקטים
>
> *נייד*: 052-7722006 >  *משרד*: 08-6343595 > *פקס*: 08-9202801
>
> *d...@datacube.co.il *
>
> *www.datacube.co.il *
>
>
>
>
>

Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Ravi Prakash

Hi Puneet

Can you take a heap dump and see where most of the churn is? Is it lots of
small applications / few really large applications with small containers
etc. ?

Cheers
Ravi

On Wed, Aug 23, 2017 at 9:23 AM, Ravuri, Venkata Puneet 
wrote:

> Hello,
>
>
>
> I wanted to know if there is any recommendation for ResourceManager GC
> settings.
>
> Full GC (with Parallel GC, 8 threads) is sometimes taking more than 30 sec
> due to which state store sessions to Zookeeper time out resulting in FATAL
> errors.
>
> The YARN cluster is heavily used with 1000’s of applications launched per
> hour.
>
>
>
> Could you please share any documentation related to best practices for
> tuning resourcemanager GC?
>
>
>
> Thanks,
>
> Puneet
>

Re: Some Configs in hdfs-default.xml

2017-08-23 Thread Ravi Prakash

Hi Doris!

I'm not sure what the difference between lab / production use is. All
configuration affects some behavior of the Hadoop system. Usually the
defaults are good for small clusters. For larger clusters, it becomes
worthwhile to tune the configuration.

1. dfs.namenode.heartbeat.recheck-interval : This is more a function of how
busy your datanodes are (sometimes they are too busy to heartbeat) and how
robust is your network (dropping heartbeat packets). It doesn't really take
too long to *check* the last heartbeat time of datanodes, but its a lot of
work to order re-replications, so I would err on the side of keeping it
long.
2. The clients gets an *ordered* list of datanodes from the namenode. It
has its own timeouts and mechanism for finding which one it wants to get /
send data from / to. Are live datanodes becoming stale too often in your
cluster? What's your concern?

Usually if your cluster is large enough, you *want* to spend time tuning
it. And that usually means, you will have to spend lots of time analyzing
the workload, finding the bottlenecks / wasted work and seeing what
configurations can help you remove that.

HTH
Ravi

On Wed, Aug 23, 2017 at 6:15 AM,  wrote:

> Hi All,
>
> There are default values of configs in hdfs-default.xml and
> core-default.xml, and I am wondering which situation are they for? Are they
> closer to lab use, or closer to real production environment?
>
>
> Maybe it depends on different configs, then I have questions to these
> certain configs as follows:
>
>
>
>  Hadoop 2.7.3
>
> 1.dfs.namenode.heartbeat.recheck-interval: the default value is 5min
> which makes the datanode to be marked as dead by the namenode after 10:30
> minutes. I set it to 30s, and gain a lot of removing and registering in
>  namanode's log. Is 5min too long, maybe 2.5min?
>
>
> 2.avoid stale: I notice there is a stale state of datanode but is off
> by default. I feel it's good, is it advised to set on?
>
>
> Thanks in advance,
>
> Doris
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Restoring Data to HDFS with distcp from standard input /dev/stdin

2017-08-16 Thread Ravi Prakash

Hi Heitor!

Welcome to the Hadoop community.

Think of the "hadoop distcp" command as a script which launches other JAVA
programs on the Hadoop worker nodes. The script collects the list of
sources, divides it among the several worker nodes and waits for the worker
nodes to actually do the copying from source to target. The sources could
be hdfs://hadoop2:54310/source-folder or perhaps s3a://some-bucket/somepath
or adl://somepath etc.

If you are trying to upload a file on the local file system to hdfs, please
take a look at the "hdfs dfs -put" command.

HTH
Ravi



On Wed, Aug 16, 2017 at 9:22 AM, Heitor Faria  wrote:

> Hello, List,
>
> I'm new here and I hope you are all very fine.
> I'm trying different combinations of distcp in order to restore data that
> I receive from standard input. Example:
>
> ===
> echo data | /etc/hadoop/bin/hadoop distcp file:///dev/stdin
> hdfs://hadoop2:54310/a
> ===
>
> I tried different options of distcp but the MapReduce always stalls. E.g.:
>
> 
> 7-08-15 06:59:26,665 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2017-08-15 06:59:26,802 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
> Scheduled snapshot period at 10 second(s).
> 2017-08-15 06:59:26,802 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
> MapTask metrics system started
> 2017-08-15 06:59:26,813 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2017-08-15 06:59:26,814 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1502794712113_0001, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@467f0da4)
> 2017-08-15 06:59:26,996 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2017-08-15 06:59:27,518 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /root/hdfs/hadoop-tmp-dir/nm-l
> ocal-dir/usercache/root/appcache/application_1502794712113_0001
> 2017-08-15 06:59:28,926 INFO [main] org.apache.hadoop.conf.Configu
> ration.deprecation: session.id is deprecated. Instead, use
> dfs.metrics.session-id
> 2017-08-15 06:59:29,783 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
> File Output Committer Algorithm version is 1
> 2017-08-15 06:59:29,804 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-08-15 06:59:30,139 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: /tmp/hadoop-yarn/staging/root/
> .staging/_distcp-298457134/fileList.seq:0+176
> 2017-08-15 06:59:30,145 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
> File Output Committer Algorithm version is 1
> 2017-08-15 06:59:30,250 INFO [main] org.apache.hadoop.tools.mapred.CopyMapper:
> Copying file:/dev/stdin to hdfs://hadoop2:54310/aaa
> 2017-08-15 06:59:30,259 INFO [main] 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand:
> Creating temp file: hdfs://hadoop2:54310/.distcp.t
> mp.attempt_1502794712113_0001_m_00_0
> 
>
> Regards,
> --
> 
> ===
> Heitor Medrado de Faria | CEO Bacula do Brasil | Visto EB-1 | LPIC-III |
> EMC 05-001 | ITIL-F
> • Não seja tarifado pelo tamanho dos seus backups, conheça o Bacula
> Enterprise: http://www.bacula.com.br/enterprise/
> • Ministro treinamento e implementação in-company do Bacula Community:
> http://www.bacula.com.br/in-company/
> +55 61 98268-4220 <+55%2061%2098268-4220> | www.bacula.com.br
> 
> 
> Indicamos também as capacitações complementares:
> • Shell básico e Programação em Shell  com
> Julio Neves.
> • Zabbix  com Adail Host.
> 
> 
>

Re: Forcing a file to update its length

2017-08-09 Thread Ravi Prakash

Hi David!

A FileSystem class is an abstraction for the file system. It doesn't make
sense to do an hsync on a file system (should the file system sync all
files currently open / just the user's etc.) . With appropriate flags maybe
you can make it make sense, but we don't have that functionality.

When you create() a file in the file system, you get back a
FSDataOutputStream on which you can call the hsync() method . Doesn't that
make more sense? On that method call, the buffer from the client is flushed
to the pipeline, and an RPC goes to the NameNode to update the length

(which is a fair amount of work). This is obviously not a scalable
solution. Perhaps you might want to look at Kafka? If you don't need it to
scale, and are fine with hammering the NameNode (really you shouldn't be),
then maybe HDFS inotify

can help?

HTH,
Ravi

On Wed, Aug 9, 2017 at 5:37 AM, David Robison 
wrote:

> I understand that, when writing to a file, I can force it to update its
> length on the namenode by using the following command:
>
>
>
> ((DFSOutputStream) imageWriter.getWrappedStream()
> ).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
>
>
>
> Is there a way to force the update without having to open a
> DFSOutputStream? Can I do this from the FileSystem class or some other Java
> class? The reason for this is that I am mostly writing to HDFS and only
> occasionally reading. However, when I go to read, I am most often reading
> the most recent data written (reading the end of the file not the
> beginning). If I could force the length update at the time of reading that
> would save time by not having to make sure I update the length every time I
> write to the file (which is about once per second).
>
>
>
> Thanks, David
>
>
>
> *David R Robison*
>
> *Senior Systems Engineer*
>
> O. +1 512 247 3700 <(512)%20247-3700>
>
> M. +1 757 286 0022 <(757)%20286-0022>
>
> david.robi...@psgglobal.net
>
> *www.psgglobal.net *
>
> [image: cid:image003.png@01D19182.F24CA3E0]
>
> *Prometheus Security Group Global, Inc.*
>
> 3019 Alvin Devane Boulevard
>
> Building 4, Suite 450
>
> Austin, TX 78741
>
> [image: cid:image003.png@01D19182.F24CA3E0]
>
>
>

Re: modify the MapTask.java but no change

2017-08-07 Thread Ravi Prakash

Hi DuanYu!

Most likely, the MapTask class loaded is not from your jar file. Here's a
look at how Oracle JAVA loads classes :
http://docs.oracle.com/javase/8/docs/technotes/tools/findingclasses.html .
Check the classpath that your MapTask is started with.

HTH
Ravi

On Fri, Aug 4, 2017 at 7:09 PM, duanyu teng  wrote:

> Hi,
>
> I modify the MapTask.java file in order to output more log information. I
> re-compile the file and deploy the jar to the whole clusters, but I  found
> that the output log has not changed, I don't know why.
>

Re: Replication Factor Details

2017-08-02 Thread Ravi Prakash

Hi Hilmi!

The topology script / DNSToSwitchMapping tell the NameNode about the
topology of the cluster :
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html

You can trace through
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L805
to find out how re-replications are ordered. (If you start the Namenode
with environment variable "export HADOOP_NAMENODE_OPTS='-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1049' " set, you
can connect a debugger to it.

You might want to set a breakpoint in
BlockManager.updateNeededReconstructions() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4148)
and
BlockManager.computeDatanodeWork() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4508
)

I suspect most of what you are looking for is here
BlockPlacementPolicyDefault.chooseTarget() (
https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L134
)

Also, please be aware that the code has changed a lot over different
versions thanks to incredible contributions from the community. If you're
trying to debug something, please make sure to find the right links in the
right branch.

HTH
Ravi

On Wed, Aug 2, 2017 at 4:31 AM, Hilmi Egemen Ciritoğlu <
hilmi.egemen.cirito...@gmail.com> wrote:

> Hi guys,
>
> I spend my time to read too much about setting replication factor as well
> as block placement so far. But I still wonder how setrep command is working
> behind in the code.
>
> I am looking for answer to following questions:
>
> What if you have one rack and increase and decrease replication factor, is
> it block distribution will be randomised or based on disk usage etc.
> (except or after rack-awareness issue) ?
>
> And what if I have 5 rack and replication factor 4 ? I am looking for
> corner case to understand completely.
>
> I would be really appreciated if you can answer my question and explain
> code side bit more too.
>
> Regards,
> Egemen
>
>

Re: Shuffle buffer size in presence of small partitions

2017-07-31 Thread Ravi Prakash

Hi Robert!

I'm sorry I do not have a Windows box and probably don't understand the
shuffle process well enough. Could you please create a JIRA in the
mapreduce proect if you would like this fixed upstream?
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116=MAPREDUCE

Thanks
Ravi

On Mon, Jul 31, 2017 at 6:36 AM, Robert Schmidtke 
wrote:

> Hi all,
>
> I just ran into an issue, which likely resulted from my not very
> intelligent configuration, but nonetheless I'd like to share this with the
> community. This is all on Hadoop 2.7.3.
>
> In my setup, each reducer roughly fetched 65K from each mapper's spill
> file. I disabled transferTo during shuffle, because I wanted to have a look
> at the file system statistics, which miss mmap calls, which is what
> transferTo sometimes defaults to. I left the shuffle buffer size at 128K
> (not knowing about the parameter at the time). This had the effect that I
> observed roughly 100% more data being read during shuffle, since 128K were
> read for each 65K needed.
>
> I added a quick fix to Hadoop which chooses the minimum of the partition
> size and the shuffle buffer size: https://github.com/
> apache/hadoop/compare/branch-2.7.3...robert-schmidtke:
> adaptive-shuffle-buffer
> Benchmarking this version against transferTo.allowed=true yields the same
> runtime and roughly 10% more reads in YARN during the shuffle phase
> (compared to previous 100%).
> Maybe this is something that should be added to Hadoop? Or do users have
> to be more clever about their job configurations? I'd be happy to open a PR
> if this is deemed useful.
>
> Anyway, thanks for the attention!
>
> Cheers
> Robert
>
> --
> My GPG Key ID: 336E2680
>

Re: How to write a Job for importing Files from an external Rest API into Hadoop

2017-07-31 Thread Ravi Prakash

Hi Ralph!

Although not totally similar to your use case, DistCp may be the closest
thing to what you want.
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
. The client builds a file list, and then submits an MR job to copy over
all the files.

HTH
Ravi

On Sun, Jul 30, 2017 at 2:21 PM, Ralph Soika  wrote:

> Hi,
>
> I want to ask, what's the best way implementing a Job which is importing
> files into the HDFS?
>
> I have an external System offering data accessible through a Rest API. My
> goal is to have a job running in Hadoop which is periodical (maybe started
> by chron?) looking into the Rest API if new data is available.
>
> It would be nice if also this job could run on multiple data nodes. But in
> difference to all the MapReduce examples I found, is my job looking for new
> Data or changed data from an external interface and compares the data with
> existing one.
>
> This is a conceptual example of the job:
>
>1. The job ask the Rest API if there are new files
>2. if so, the job imports the first file in the list
>3. look if the file already exits
>   1. if not, the job imports the file
>   2. if yes, the job compares the data with the data already stored
>  1. if changed the job updates the file
>  4. if more file exits the job continues with 2 -
>5. otherwise ends.
>
>
> Can anybody give me a little help how to start (its my first job I
> write...) ?
>
>
> ===
> Ralph
>
>
>
>
> --
>
>

Re: MapReduce and Spark jobs not starting

2017-07-28 Thread Ravi Prakash

Hi Nishant!

You should be able to look at the datanode and nodemanager log files to
find out why they died after you ran the 76 mappers. It is extremely
unusual (I haven't heard of a verified case for over 4-5 years) of a job
killing nodemanagers unless your cluster is configured poorly. Which
container-executor do you use? Which user is running the nodemanager and
datanode process? Which user does a MapTask run as?

Are you sure the cluster is fine? How many resources do you see available
in the ResourceManager? Are you submitting the application to a queue with
enough resources?

Ravi

On Fri, Jul 28, 2017 at 5:19 AM, Nishant Verma 
wrote:

> Hello,
>
> In my 5 node Hadoop 2.7.3 AWS EC2 instance cluster, things were running
> smooth before I submitted one query. I tried to create an ORC table using
> below query:
>
> create table dummy_orc stored as orc tblproperties ("orc.compress"="Lz4")
> as select * from dummy;
>
> The job said, it would run 76 mappers and 0 reducers and job started.
> After some 10-12 minutes when the map % reached 100%, the job aborted and
> did not give output. Since number of records was large, I did not mind the
> large time it took initially.But then all my datanode daemons and
> nodemanager daemons died. The hdfs dfsadmin -report command gave 0 cluster
> capacity, 0 live datanodes, etc.
>
> I restarted the cluster completely. Restarted namenode, resource manager,
> datanode, nodemanager, zkfc services, quorumPeerMain, everything. After
> that the cluster capacity,etc is coming fine. I am able to fire normal
> non-mapreduce queries like select *.
>
> But mapreduce is not starting.Also spark jobs are running now. They are
> stuck at ACCEPTED state like MR jobs.
>
> MR is stuck for select count(1) from dummy at:
>
> Query ID = hadoopuser_20170728093320_b1875223-801e-466b-997f-4b58f0e90041
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Starting Job = job_1501233326257_0003, Tracking URL =
> http://dev-bigdatamaster1:8088/proxy/application_1501233326257_0003/
> Kill Command = /home/hadoopuser/hadoop//bin/hadoop job  -kill
> job_1501233326257_0003
>
> Which log would give me better picture to resolve this error? And what
> went wrong?
>

Re: how to get info about which data in hdfs or file system that a MapReduce job visits?

2017-07-27 Thread Ravi Prakash

Hi Jaxon!

MapReduce is just an application (one of many including Tez, Spark, Slider
etc.) that runs on Yarn. Each YARN application decides to log whatever it
wants. For MapReduce,
https://github.com/apache/hadoop/blob/27a1a5fde94d4d7ea0ed172635c146d594413781/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L762
logs which split is being processed. Are you not seeing this message?
Perhaps check the log level of the MapTask.

For the other YARN applications, the logging may be different.

In any case, for all the frameworks, if the file is on HDFS, the hdfs audit
log should have a record.

HTH
Ravi

On Wed, Jul 26, 2017 at 11:27 PM, Jaxon Hu  wrote:

> Hi!
>
> I was trying to implement a Hadoop/Spark audit tool, but l met a problem
> that I can’t get  the input file location and file name. I can get
> username, IP address, time, user command, all of these info  from
> hdfs-audit.log. But When I submit a MapReduce job, I can’t see input file
> location  neither in Hadoop logs or Hadoop ResourceManager. Does hadoop
> have API or log that contains these info through some configuration ?If it
> have ,What should I configure?
>
> Thanks.
>

Re: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-27 Thread Ravi Prakash

You replication numbers do seem to be on the high. How did you arrive at
those numbers? If you swamp the datanode with too much replication work
than it can do in an iteration (every 3 seconds), things would go bad.

I often check using `ps aux | grep java` all the java processes running
rather than relying on `service status datanode` or other scripts.

On Wed, Jul 26, 2017 at 10:46 PM, omprakash <ompraka...@cdac.in> wrote:

> Hi Ravi,
>
>
>
> The two datanodes are on different Machines. At the time when these error
> were generating I can see that DN1 was replicating under-replicating blocks
> on DN2.
>
>
>
> Can this be related to properties I added for increasing replication rate?
>
>
>
> Regards
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihad...@gmail.com]
> *Sent:* 27 July 2017 01:26
> *To:* omprakash <ompraka...@cdac.in>
> *Cc:* user <user@hadoop.apache.org>
> *Subject:* Re: Lots of Exception for "cannot assign requested address" in
> datanode logs
>
>
>
> Hi Omprakash!
>
> DatanodeRegistration happens when the Datanode first hearbeats to the
> Namenode. In your case, it seems some other application has acquired the
> port 50010 . You can check this with the command "netstat -anp | grep
> 50010" . Are you trying to run 2 datanode processes on the same machine?
>
> HTH
>
> Ravi
>
>
>
> On Wed, Jul 26, 2017 at 5:46 AM, omprakash <ompraka...@cdac.in> wrote:
>
> Hi all,
>
>
>
> I am running a 4 node cluster with 2 Master node( NN1, NN2 with HA using
> QJM) and 2 Slave nodes(DN1, DN2). I am receiving lots of Exceptions in
> Datanode logs as shown below
>
>
>
> 2017-07-26 17:56:00,703 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.9.132:50010, 
> datanodeUuid=5a2e6721-3a9a-43f1-94cc-f58f24b5a15b,
> infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-
> 7aa9fcd4-36fc-4e7b-87cd-d20594774b85;nsid=1753301932;c=1500696043365):Failed
> to transfer BP-1085904515-192.168.9.116-1500696043365:blk_1078544770_4804082
> to 192.168.9.116:50010 got
>
> java.net.BindException: Cannot assign requested address
>
> at sun.nio.ch.Net.connect0(Native Method)
>
> at sun.nio.ch.Net.connect(Net.java:465)
>
> at sun.nio.ch.Net.connect(Net.java:457)
>
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.
> java:670)
>
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(
> SocketIOWithTimeout.java:192)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>
> at org.apache.hadoop.hdfs.server.datanode.DataNode$
> DataTransfer.run(DataNode.java:2312)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> I have 10 million files in hdfs. All the nodes have same configurations.
> Above Exception started occurring when I changed the below parameters in
> *hdfs-site.xml* file. I made these changes to increase replication rate
> for under-replicated blocks.
>
>
>
> dfs.namenode.handler.count=5000
>
> dfs.namenode.replication.work.multiplier.per.iteration=1000
>
> dfs.namenode.replication.max-streams=2000 *à** not documented in
> hdfs.site.xml*
>
> dfs.namenode.replication.max-streams-hard-limit=4000   *-**à** not
> documented in hdfs.site.xml*
>
>
>
>
>
> The rate of replication of blocks increased but suddenly the Exception
> started to appear.
>
>
>
> Can anybody explain this  behavior?
>
>
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
>
>
>
> 
> ---
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> 
> ---
>
>
>
> 
> ---
> [ C-DAC is on Social-Media too. Kindly

Re: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-26 Thread Ravi Prakash

Hi Omprakash!

DatanodeRegistration happens when the Datanode first hearbeats to the
Namenode. In your case, it seems some other application has acquired the
port 50010 . You can check this with the command "netstat -anp | grep
50010" . Are you trying to run 2 datanode processes on the same machine?

HTH
Ravi

On Wed, Jul 26, 2017 at 5:46 AM, omprakash  wrote:

> Hi all,
>
>
>
> I am running a 4 node cluster with 2 Master node( NN1, NN2 with HA using
> QJM) and 2 Slave nodes(DN1, DN2). I am receiving lots of Exceptions in
> Datanode logs as shown below
>
>
>
> 2017-07-26 17:56:00,703 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.9.132:50010, 
> datanodeUuid=5a2e6721-3a9a-43f1-94cc-f58f24b5a15b,
> infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-
> 7aa9fcd4-36fc-4e7b-87cd-d20594774b85;nsid=1753301932;c=1500696043365):Failed
> to transfer BP-1085904515-192.168.9.116-1500696043365:blk_1078544770_4804082
> to 192.168.9.116:50010 got
>
> java.net.BindException: Cannot assign requested address
>
> at sun.nio.ch.Net.connect0(Native Method)
>
> at sun.nio.ch.Net.connect(Net.java:465)
>
> at sun.nio.ch.Net.connect(Net.java:457)
>
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.
> java:670)
>
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(
> SocketIOWithTimeout.java:192)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>
> at org.apache.hadoop.hdfs.server.datanode.DataNode$
> DataTransfer.run(DataNode.java:2312)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> I have 10 million files in hdfs. All the nodes have same configurations.
> Above Exception started occurring when I changed the below parameters in
> *hdfs-site.xml* file. I made these changes to increase replication rate
> for under-replicated blocks.
>
>
>
> dfs.namenode.handler.count=5000
>
> dfs.namenode.replication.work.multiplier.per.iteration=1000
>
> dfs.namenode.replication.max-streams=2000 *à** not documented in
> hdfs.site.xml*
>
> dfs.namenode.replication.max-streams-hard-limit=4000   *-**à** not
> documented in hdfs.site.xml*
>
>
>
>
>
> The rate of replication of blocks increased but suddenly the Exception
> started to appear.
>
>
>
> Can anybody explain this  behavior?
>
>
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
>
>
> 
> ---
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> 
> ---
>

Re: Regarding Simulation of Hadoop

2017-07-24 Thread Ravi Prakash

Hi Vinod!

Could you please describe the "Hadoop Security Framework"? I am not sure
what you mean by it. What kind of tests do you want to run? You could try
Amazon / Azure / GCE instances fairly cheaply. Or you could use virtual
machines. In the past I have run 2 worker nodes isolated only by Docker
containers (of which you can run many more than virtual machines). If you
configure correctly, you might not even need any isolation mechanism and
run several worker nodes on the same computer.

What are you trying to simulate? There are lots of tools to simulate
several parts of the Hadoop eco-system. Which one are you interested in?
Here are a few:
* SLS : https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-sls
* NNBench :
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/hdfs/NNBench.java
* MiniDFSCluster :
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* MiniYarnCluster :
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java

What part of Hadoop security interests you?

Hope this helps,
Ravi

On Mon, Jul 24, 2017 at 4:10 AM, vinod Saraswat <vndsr...@gmail.com> wrote:

> 
> Dear Ravi,
>
> I am working on Hadoop Security Framework and i want to test on multi node
> cluster. But due to lack of resources i want to know that can i simulate
> hadoop platform.
>
> My final conclusion is that hadoop require a high processing resources so
> simulation is not possible so please tell me i m going to right direction
> or not.
>
> If simulation possible then please share resources.
>
> There *Simulation, i am considering virtual/mathematical environment for
> multi node hadoop clusters.*
>
>
>
>
>
> *Vinod Sharma(Saraswat)*
> *Research Scholar*
> *Department of Computer Science,*
> *Career Point University of Kota, Rajasthan*
>
> On 17 July 2017 at 23:22, Ravi Prakash <ravihad...@gmail.com> wrote:
>
>> Hi Vinod!
>>
>> You can look at static code analysis tools. I'm sure there are ones
>> specific for security. I'd suggest you to set up a Kerberized hadoop
>> cluster first.
>>
>> HTH
>> Ravi
>>
>> On Sat, Jul 15, 2017 at 2:08 AM, vinod Saraswat <vndsr...@gmail.com>
>> wrote:
>>
>>> Dear Sir/Mam,
>>>
>>>
>>>
>>> I am Vinod Sharma (Research Scholar). My research is based on Hadoop
>>> security and want to perform simulation for Hadoop. This simulation checks
>>> current security on Hadoop. Please tell me that it is possible or not and
>>> how can I perform it.
>>>
>>>
>>>
>>> *Thanks and Regards*
>>> *Vinod Sharma(Saraswat)*
>>> *Research Scholar*
>>> *Department of Computer Science,*
>>> *Career Point University of Kota, Rajasthan*
>>>
>>
>>
>

Re: Regarding Simulation of Hadoop

2017-07-17 Thread Ravi Prakash

Hi Vinod!

You can look at static code analysis tools. I'm sure there are ones
specific for security. I'd suggest you to set up a Kerberized hadoop
cluster first.

HTH
Ravi

On Sat, Jul 15, 2017 at 2:08 AM, vinod Saraswat  wrote:

> Dear Sir/Mam,
>
>
>
> I am Vinod Sharma (Research Scholar). My research is based on Hadoop
> security and want to perform simulation for Hadoop. This simulation checks
> current security on Hadoop. Please tell me that it is possible or not and
> how can I perform it.
>
>
>
> *Thanks and Regards*
> *Vinod Sharma(Saraswat)*
> *Research Scholar*
> *Department of Computer Science,*
> *Career Point University of Kota, Rajasthan*
>

Re: Lots of warning messages and exception in namenode logs

2017-06-29 Thread Ravi Prakash

Hi Omprakash!

If both datanodes die at the same time, then yes, data will be lost. In
that case, you should increase dfs.replication to 3 (so that there will be
3 copies). This obviously adversely affects the total amount of data you
can store on HDFS.

However if only 1 datanode dies, the namenode notices that, and orders the
remaining replica to be replicated. The rate at which it orders
re-replication is determined by
dfs.namenode.replication.work.multiplier.per.iteration
and the number of nodes in your cluster. The more nodes you have in your
cluster (some companies run 1000s of nodes in 1 cluster), the faster the
lost replicas will be replicated. Let's say there were 2 million blocks on
each datanode, and you configured only 2 blocks to be re-replicated per
datanode heartbeat (usually 3 seconds). If there were 2 other datanodes, it
would take 200 / 2 * 3 seconds to re-replicate data. Ofcourse you can't
crank up the number of blocks re-replicated too high, because there's only
so much data that datanodes can transfer amongst themselves. You should
calculate how many blocks you have, how much bandwidth is available between
any two datanodes, how quickly you want replication (if your disks are only
re-replicating, jobs may not make progress), and set that configuration
accordingly. Depending on your datanode capacity it may take 1-2 days to
rereplicate all the data.

Also, I'd encourage you to read through more of the documentation
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
and become familiar with the system. There can be a *huge* difference
between a well-tuned Hadoop cluster and a poorly configured one.

HTH
Ravi

On Thu, Jun 29, 2017 at 4:50 AM, omprakash <ompraka...@cdac.in> wrote:

> Hi Sidharth,
>
>
>
> Thanks a lot for the clarification. May you suggest parameters that can
> improve the re-replication in case of failure.
>
>
>
> Regards
>
> Om
>
>
>
> *From:* Sidharth Kumar [mailto:sidharthkumar2...@gmail.com]
> *Sent:* 29 June 2017 16:06
> *To:* omprakash <ompraka...@cdac.in>
> *Cc:* Arpit Agarwal <aagar...@hortonworks.com>;
> common-u...@hadoop.apache.org <user@hadoop.apache.org>; Ravi Prakash <
> ravihad...@gmail.com>
>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi,
>
>
>
> No, as there will be no copy exists of that file. You can increase the
> replication factor to 3 so that there will be 3 copies created and even if
> 2 data nodes goes down you will still have one copy available which will be
> again replicated to 3 by the namenode in due course of time.
>
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192 367
> |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>
>
>
>
>
>
>
>
> On 29-Jun-2017 3:45 PM, "omprakash" <ompraka...@cdac.in> wrote:
>
> Hi Ravi,
>
>
>
> I have 5 nodes in Hadoop cluster and all have same configurations. After
> setting *dfs.replication=2 *, I did a clean start of hdfs.
>
>
>
> As per your suggestion, I added 2 more datanodes and clean all the data
> and metadata. The performance of the cluster has dramatically improved. I
> can see through logs that the files are randomly replicated to four
> datanodes (2 replica of each file).
>
>
>
> But here my problem arise. I want redundant datanodes such that if any two
> of the datanodes goes down I still be able to get files from other two. In
> above case suppose file block-xyz get stored on datanode1 and datanode2,
> and some day these two datanodes goes down , will I be able to access the
> block-xyz? This is what I am worried about.
>
>
>
>
>
> Regards
>
> Om
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihad...@gmail.com]
> *Sent:* 27 June 2017 22:36
> *To:* omprakash <ompraka...@cdac.in>
> *Cc:* Arpit Agarwal <aagar...@hortonworks.com>; user <
> user@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> This is *not* ok. Please go through the datanode logs of the inactive
> datanode and figure out why its inactive. If you set dfs.replication to 2,
> atleast as many datanodes (and ideally a LOT more datanodes) should be
> active and participating in the cluster.
>
> Do you have the hdfs-site.xml you posted to the mailing list on all the
> nodes (including the Namenode)? Was the file containing block
> *blk_1074074104_337394* created when you had the cluster misconfigured to
> dfs.replication=3 ? You can determine which file the block belongs to using
> this command:
>
> hdfs fsck -blockId blk_1074074104
>
> Once you have the file, you c

Re: Lots of warning messages and exception in namenode logs

2017-06-27 Thread Ravi Prakash

Hi Omprakash!

This is *not* ok. Please go through the datanode logs of the inactive
datanode and figure out why its inactive. If you set dfs.replication to 2,
atleast as many datanodes (and ideally a LOT more datanodes) should be
active and participating in the cluster.

Do you have the hdfs-site.xml you posted to the mailing list on all the
nodes (including the Namenode)? Was the file containing block
*blk_1074074104_337394* created when you had the cluster misconfigured to
dfs.replication=3 ? You can determine which file the block belongs to using
this command:

hdfs fsck -blockId blk_1074074104

Once you have the file, you can set its replication using
hdfs dfs -setrep 2 

I'm guessing that you probably have a lot of files with this replication,
in which case you should set it on / (This would overwrite the replication
on all the files)

If the data on this cluster is important I would be very worried about the
condition its in.

HTH
Ravi

On Mon, Jun 26, 2017 at 11:22 PM, omprakash <ompraka...@cdac.in> wrote:

> Hi all,
>
>
>
> I started the HDFS in DEBUG mode. After examining the logs I found below
> logs which read that the replication factor required is 3 (as against the
> specified *dfs.replication=2*).
>
>
>
> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
> to neededReplications at priority level 0*
>
>
>
> *P.S : I have 1 datanode active out of 2. *
>
>
>
> I can also see from Namenode UI that the no. of under replicated blocks
> are growing.
>
>
>
> Any idea? Or this is OK.
>
>
>
> regards
>
>
>
>
>
> *From:* omprakash [mailto:ompraka...@cdac.in]
> *Sent:* 23 June 2017 11:02
> *To:* 'Ravi Prakash' <ravihad...@gmail.com>; 'Arpit Agarwal' <
> aagar...@hortonworks.com>
> *Cc:* 'user' <user@hadoop.apache.org>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Arpit,
>
>
>
> I will enable the settings as suggested and will post the results.
>
>
>
> I am just curious about setting *Namenode RPC service  port*. As I have
> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address* is
> already set which will be default value to RPC service port also. Does
> specifying any other port have advantage over default one?
>
>
>
> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in 
> namenode logs. Here is one of them.
>
>
>
> How to identify the size of heap In such cases as I have 4GB of RAM on the
> namenode VM.?
>
>
>
> *@Ravi* Since the file size are very small thus I have only configured a
> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>
>
>
> As I can see from Namenode UI there are more than 50% of block under
> replicated. I have now 400K blocks out of which 200K are under-replicated.
>
> I will post the results again after changing the value of 
> *dfs.namenode.replication.work
> <http://dfs.namenode.replication.work>.multiplier.per.iteration*
>
>
>
>
>
> Thanks
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihad...@gmail.com <ravihad...@gmail.com>]
> *Sent:* 22 June 2017 23:04
> *To:* Arpit Agarwal <aagar...@hortonworks.com>
> *Cc:* omprakash <ompraka...@cdac.in>; user <user@hadoop.apache.org>
>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>
> In addition to Arpit's reply, I'm also concerned with the number of
> under-replicated blocks you have: Under replicated blocks: 141863
>
> When there are fewer replicas for a block than there are supposed to be
> (in your case e.g. when there's 1 replica when there ought to be 2), the
> namenode will order the datanodes to create more replicas. The rate at
> which it does this is controlled by
> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
> So, it will take quite a while to re-replicate all the blocks.
>
> Also, please know that you want files to be much bigger than 1kb. Ideally
> you'd have a couple of blocks (blocks=128Mb) for each file. You should
> append to files when they are this small.
>
> Please do let us know how things turn out.
>
> Cheers,
>
> Ravi
>
>
>
> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagar...@hortonworks.com>
> wrote:
>
> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by lo

Re: Can hdfs client 2.6 read file of hadoop 2.7 ?

2017-06-26 Thread Ravi Prakash

Hi Jeff!

Yes. hadoop-2.6 clients are able to read files on a hadoop-2.7 cluster. The
document I could find is
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
.

"Both Client-Server and Server-Server compatibility is preserved within a
major release"

HTH
Ravi.

On Mon, Jun 26, 2017 at 5:21 AM, Jeff Zhang  wrote:

>
> It looks like it can. But is there any document about the compatibility
> between versions ? Thanks
>
>
>

Re: Lots of warning messages and exception in namenode logs

2017-06-22 Thread Ravi Prakash

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of
under-replicated blocks you have: Under replicated blocks: 141863
When there are fewer replicas for a block than there are supposed to be (in
your case e.g. when there's 1 replica when there ought to be 2), the
namenode will order the datanodes to create more replicas. The rate at
which it does this is controlled by
dfs.namenode.replication.work.multiplier.per.iteration . Given you have
only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
So, it will take quite a while to re-replicate all the blocks.

Also, please know that you want files to be much bigger than 1kb. Ideally
you'd have a couple of blocks (blocks=128Mb) for each file. You should
append to files when they are this small.

Please do let us know how things turn out.

Cheers,
Ravi

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagar...@hortonworks.com>
wrote:

> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>- Try increasing the NameNode RPC handler count a bit (set
>dfs.namenode.handler.count to 20 in hdfs-site.xml).
>- Enable the NameNode service RPC port. This requires downtime and
>reformatting the ZKFC znode.
>- Search for JvmPauseMonitor messages in your service logs. If you see
>any, try increasing JVM heap for that service.
>- Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
> **org.apache.hadoop.net
> <http://org.apache.hadoop.net/>.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <ompraka...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ravihad...@gmail.com>
> *Cc: *'user' <user@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Ca

Re: Lots of warning messages and exception in namenode logs

2017-06-21 Thread Ravi Prakash

Hi Omprakash!

What is your default replication set to? What kind of disks do your
datanodes have? Were you able to start a cluster with a simple
configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on
different datanodes. The Namenode tries to give a list of datanodes that
the client can write replicas of the block to. If the Namenode is not able
to construct a list with adequate number of datanodes, you will see the
message you are seeing. This may mean that datanodes are unhealthy (failed
disks), or full (disks have no more space), being decomissioned ( HDFS will
not write replicas on decomissioning datanodes) or misconfigured ( I'd
suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you
killed your MR job), after some time (hard limit expiring) the Namenode
will try to recover the file. In your case the namenode is also not able to
find enough datanodes for recovering the files.

HTH
Ravi





On Tue, Jun 20, 2017 at 11:50 PM, omprakash  wrote:

> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=192.168.9.174:50010
>  for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and
> org.apache.hadoop.net .NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks
> are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease
> [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates:
> 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks
> are waiting to be minimally replicated. Try again later.*
>
> *at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *at

Re: Hadoop Application Report from WebUI

2017-06-09 Thread Ravi Prakash

Hi Hilmi!

I'm not sure, but maybe the offset in the block at which the task started
processing?

Ravi

On Fri, Jun 9, 2017 at 7:43 AM, Hilmi Egemen Ciritoğlu <
hilmi.egemen.cirito...@gmail.com> wrote:

> Hi all,
>
> I can see following informations on hadoop yarn web-ui report(8088) for
> each mappers that I run.
>
> Status of mappers has shown like:
> hdfs://c7-master:9000/user/egemen/datasets/db/year1993.
> txt:1207959552+134217728
>
> What does these mean 1207959552+134217728 ? or 0+134217728. First number
> is multiplication of block size with some number but I have no idea about
> second number.
>
> Thanks a lot,
>
> Regards,
> Egemen
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Hadoop error in shuffle in fetcher: Exceeded MAX_FAILED_UNIQUE_FETCHES

2017-06-07 Thread Ravi Prakash

Hi Seonyoung!

Please take a look at this file :
https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L208
.

This is an auxiliary service that runs inside the NodeManager which
provides the intermediate data.

Cheers
Ravi

On Tue, Jun 6, 2017 at 8:06 PM, Seonyoung Park  wrote:

> Hi all,
>
> We've run a hadoop cluster (Apache Hadoop 2.7.1) with 40 datanodes.
> Currently, we're using Fair Scheduler in our cluster.
> And there are no limits on the number of concurrent running jobs.
> 30 ~ 50 I/O heavy jobs has been running concurrently at dawn.
>
> Recently we got shuffle errors as follows when we had run HDFS Balancer or
> spark streaming jobs..
>
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> error in shuffle in fetcher#2
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(
> Shuffle.java:134)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
> at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.
> checkReducerHealth(ShuffleSchedulerImpl.java:366)
> at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.
> copyFailed(ShuffleSchedulerImpl.java:288)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.
> copyFromHost(Fetcher.java:354)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(
> Fetcher.java:193)
>
>
>
> I also noticed that SocketTimeoutException had occurred in some tasks in
> the same job.
> But there is no network problem..
>
>
> Someone said that we need to increase the value of
> "mapreduce.tasktracker.http.threads" property.
> However, no codes use that property after the commit starting with hash
> value 80a05764be5c4f517.
>
>
> Here are my questions:
>
> 1. Is that property currently being used?
> 2. If so, Is it really helpful to solve our problem?
> 3. Do we need to fine tune the settings of NodeManagers and DataNodes?
> 4. Is there any better solution?
>
>
> Thanks,
> Pak
>

Re: Spark 2.0.1 & 2.1.1 fails on Hadoop-3.0.0-alhpa2

2017-05-10 Thread Ravi Prakash

Hi Jasson!

You will have to build Spark again with Hadoop-3.0.0-alpha2. This was done
as part of https://issues.apache.org/jira/browse/HADOOP-12563 .

HTH
Ravi

On Tue, May 9, 2017 at 4:33 PM, Jasson Chenwei 
wrote:

> hi, all
>
> I just upgraded my Hadoop from 2.7.3 to 3.0.0-alpha2. My spark version is
> 2.0.1. It works well on Hadoop -2.7.3. However, I had this error output
> of driver log on 3.0.0:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding
> in
> [jar:file:/extend1/yarn-temp/nm-local-dir/usercache/admin/filecache/10/__spark_libs__3353962889587701453.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J:
> Found binding in
> [jar:file:/home/admin/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J:
> See http://www.slf4j.org/codes.html#multiple_bindings
>  for an
> explanation. SLF4J: Actual binding is of type
> [org.slf4j.impl.Log4jLoggerFactory] 17/05/09 09:05:57 INFO
> util.SignalUtils: Registered signal handler for TERM 17/05/09 09:05:57 INFO
> util.SignalUtils: Registered signal handler for HUP 17/05/09 09:05:57 INFO
> util.SignalUtils: Registered signal handler for INT 17/05/09 09:05:57 WARN
> util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable Exception in thread
> "main" java.io.IOException: Exception reading
> /extend1/yarn-temp/nm-local-dir/usercache/admin/appcache/application_1494291953232_0001/container_1494291953232_0001_02_01/container_tokens
>  at
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:198)
>  at
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:816)
>  at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:760)
>  at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:633)
>  at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
>  at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
>  at
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:787)
>  at
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 
> Caused
> by: java.io.IOException: Unknown version 1 in token storage. at
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:216)
>  at
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:195)
>  ...
> 7 more*
>
>
> Looks like some security reason.
>
> PS, I have tried many times and also re-generated the input data, the
> error is still there. Anyone has this errors before ?
>
>
> Wei
>
>

Re: unsubscribe

2017-05-01 Thread Ravi Prakash

Hi Jason!

Could you please send an email to user-unsubscr...@hadoop.apache.org and
general-unsubscr...@hadoop.apache.org  as mentioned here :
https://hadoop.apache.org/mailing_lists.html ?

Thanks

On Sat, Apr 29, 2017 at 11:34 AM, Jason  wrote:

> unsubscribe
>
> On Thu, Apr 27, 2017 at 1:18 PM, Bourre, Marc  on.ca> wrote:
>
>> unsubscribe
>>
>>
>>
>>
>>
>>
>>
>
>
>
> --
> Regards,
>
> Hao Tian
>

Re: unsubscribe

2017-04-28 Thread Ravi Prakash

Hi Marc,

Could you please send an email to user-unsubscr...@hadoop.apache.org and
general-unsubscr...@hadoop.apache.org  as mentioned here :
https://hadoop.apache.org/mailing_lists.html ?

Thanks

On Thu, Apr 27, 2017 at 5:18 AM, Bourre, Marc <
marc.bou...@ehealthontario.on.ca> wrote:

> unsubscribe
>
>
>
>
>
>
>

Re: unsubscribe

2017-04-28 Thread Ravi Prakash

Hi Krishna!

Could you please send an email to user-unsubscr...@hadoop.apache.org and
general-unsubscr...@hadoop.apache.org  as mentioned here :
https://hadoop.apache.org/mailing_lists.html ?

Thanks

On Wed, Apr 26, 2017 at 7:58 PM, Krishna <
ramakrishna.srinivas.mur...@gmail.com> wrote:

>
>
> --
> Thanks & Regards
> Ramakrishna S
>

Re: Noob question about Hadoop job that writes output to HBase

2017-04-22 Thread Ravi Prakash

Hi Evelina!

You've posted the logs for the MapReduce ApplicationMaster . From this I
can see the reducer timed out after 600 secs :
2017-04-21 00:24:07,747 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
report from attempt_1492722585320_0001_r_00_0:
AttemptID:attempt_1492722585320_0001_r_00_0 Timed out after 600 secs

To find out why the reducer timed out, you'd have to go look at the logs of
the reducer. These too are available from the RM page (where you got these
logs). just click a few more links deeper.

HTH
Ravi

On Fri, Apr 21, 2017 at 1:58 AM, evelina dumitrescu <
evelina.a.dumitre...@gmail.com> wrote:

> The Hadoop version that I use is 2.7.1 and the Hbase version is 1.2.5.
> I can do any operation from the HBase shell.
>
>
> On Fri, Apr 21, 2017 at 8:01 AM, evelina dumitrescu <
> evelina.a.dumitre...@gmail.com> wrote:
>
>> Hi,
>>
>> I am new to Hadoop and Hbase.
>> I was trying to make a small proof-of-concept Hadoop map reduce job that
>> reads the data from HDFS and stores the output in Hbase.
>> I did the setup as presented in this tutorial [1].
>> Here is the pseudocode from the map reduce code [2].
>> The problem is that I am unable to contact Hbase from the Hadoop job and
>> the job gets stuck.
>> Here are the logs from syslog [3], stderr [4] and console [5].
>> How should I correctly setup HbaseConfiguration ?
>> I couldn't find any example online that worked and it's hard for a
>> beginner to debug the issue.
>> Any help would be appreciated.
>>
>> Thank you,
>> Evelina
>>
>> [1]
>> http://www.bogotobogo.com/Hadoop/BigData_hadoop_HBase_Pseudo
>> _Distributed.php
>> [2]
>> https://pastebin.com/hUDAMMes
>> [3]
>> https://pastebin.com/XxmWAUTf
>> [4]
>> https://pastebin.com/fYUYw4Cv
>> [5]
>> https://pastebin.com/YJ1hERDe
>>
>
>

Re: Running a script/executable stored in HDFS from a mapper

2017-04-21 Thread Ravi Prakash

Perhaps you want to look at Hadoop Streaming?
https://hadoop.apache.org/docs/r2.7.1/hadoop-streaming/HadoopStreaming.html

On Fri, Apr 21, 2017 at 12:30 AM, Philippe Kernévez 
wrote:

> Hi Evelina,
>
> Files in HDFS are not executable.
> You first need to copy it on a local tmp disk then run it (or may be load
> it in the mapper depending on your case).
>
> Regards,
> Philippe
>
> On Fri, Apr 21, 2017 at 7:29 AM, evelina dumitrescu <
> evelina.a.dumitre...@gmail.com> wrote:
>
>> Hi,
>>
>> Is it possible to run a script/executable stored in HDFS from a mapper ?
>>
>> Thank you,
>> Evelina
>>
>>
>
>
> --
> Philippe Kernévez
>
>
>
> Directeur technique (Suisse),
> pkerne...@octo.com
> +41 79 888 33 32 <+41%2079%20888%2033%2032>
>
> Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
> OCTO Technology http://www.octo.ch
>

Re: About the name of "dfs.namenode.checkpoint.dir" and "dfs.namenode.checkpoint.edits.dir"

2017-03-24 Thread Ravi Prakash

Hi Huxiaodong!

Thanks for your email. "dfs.namenode.checkpoint.dir" is used in a lower
level abstraction (called FSImage) :
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L1374
. Incidentally to have code reuse and abstractions, this lower level
abstraction is used in both Namenode and SecondaryNamenode (and lots of
other places actually).

Also the Namenode and SecondaryNamenode are running on different machines
ideally.

HTH
Ravi

On Thu, Mar 23, 2017 at 11:23 PM,  wrote:

>
> Hello,
>
>   I think "dfs.namenode.checkpoint.dir" and 
> "dfs.namenode.checkpoint.edits.dir"
> is used for secondaryNameNode.
>
>   So I think "dfs.secondary.namenode.checkpoint.dir" and  "dfs.
> secondary.namenode.checkpoint.edits.dir" is better than
> "dfs.namenode.checkpoint.dir" and "dfs.namenode.checkpoint.edits.dir", do
> you think so?
>
>
>   thank you.
>
>   Looking forward to your early reply.
>
>
>
>
>
> 胡晓东 huxiaodong
>
>
> 网管及服务系统部 Network Management & Service System Dept
>
>
>
> 南京市紫荆花路68号中兴通讯二期
> MP: +86-15950565866 <+86%20159%205056%205866>
>
> E: hu.xiaod...@zte.com.cn
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Ravi Prakash

Sorry to hear about your travails.

I think you might be better off asking the spark community:
http://spark.apache.org/community.html

On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack  wrote:

> Hi,
>
> I'm trying to read a s3 bucket from Spark and up until today Spark always
> complain that the request return 403
>
> hadoopConf = spark_context._jsc.hadoopConfiguration()
> hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
> hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
> hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AF
> ileSystem")
> logs = spark_context.textFile("s3a://mybucket/logs/*)
>
> Spark was saying  Invalid Access key [ACCESSKEY]
>
> However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
>
> aws s3 ls mybucket/logs/
>
> and in python boto3 this was working
>
> resource = boto3.resource("s3", region_name="us-east-1")
> resource.Object("mybucket", "logs/text.py") \
> .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>
> so my credentials ARE invalid and the problem is definitely something with
> Spark..
>
> Today I decided to turn on the "DEBUG" log for the entire spark and to my
> suprise... Spark is NOT using the [SECRETKEY] I have provided but
> instead... add a random one???
>
> 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
> https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
> ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
> Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
> application/x-www-form-urlencoded; charset=utf-8, )
>
> This is why it still return 403! Spark is not using the key I provide with
> fs.s3a.secret.key but instead invent a random one EACH time (everytime I
> submit the job the random secret key is different)
>
> For the record I'm running this locally on my machine (OSX) with this
> command
>
> spark-submit --packages com.amazonaws:aws-java-sdk-pom
> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>
> Could some one enlighten me on this?
>

Re: last exception: java.io.IOException: Call to e26-node.fqdn.com/10.12.1.209:60020 failed on local exception

2017-03-07 Thread Ravi Prakash

You should probably email Hbase mailing lists rather than Hadoop :
https://hbase.apache.org/mail-lists.html

On Thu, Mar 2, 2017 at 10:02 AM, Motty Cruz  wrote:

> Hello, in the past two weeks, I see the following error on HBase Thrift
> servers, we have total of about 10 Thrift servers and randomly get the
> following errors:
>
>
>
> 2017-02-28 10:45:56,541 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> #4087940, table=MBData, attempt=11/35 failed=2ops, last exception: ja
>
> va.io.IOException: Call to e26-node.fqdn.com/10.12.1.209:60020 failed on
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutExcepti
>
> on: Call id=36009061, waitTime=180001, operationTimeout=18 expired. on
> e26-node.fqdn.com,60020,1487800633128, tracking started null,
>
> retrying after=10060ms, replay=2ops
>
> 2017-02-28 10:45:57,207 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> #4084675, waiting for some tasks to finish. Expected max=0, tasksInPro
>
> gress=22
>
>
>
> Restarting the Thrift server resolves the issue. Any ideas where should I
> be looking for?
>
>
>
> Thanks,
> Motty
>

Re: Journal nodes , QJM requirement

2017-02-28 Thread Ravi Prakash

Thanks for the question Amit and your response Surendra!

I think Amit has raised a good question. I can only guess towards the
"need" for *journaling* while using a QJM. I'm fairly certain that if you
look through all the comments in
https://issues.apache.org/jira/browse/HDFS-3077 and its subtasks, you are
bound to find the reasoning there. (Or maybe we never thought about it ;-)
and its worth pursuing)

Journaling was necessary in the past when there was a single Namenode
because we wanted to be sure to persist any fsedits (changes to the file
system metadata) before actually making those changes in memory. That way,
if the Namenode crashed, we would load up fsimage from disk, and apply the
journalled edits to this state.

Along comes the QJM where the likelihood of all QJM nodes failing is
reduced (but still is non-zero). Further more, I'm not sure (and perhaps
someone more knowledgeable about the QJM can answer) if an individual
JournalNode in a Quorum accepts a transaction only after persisting to disk
or after applying it to its journal in memory. If its the latter, keeping a
journal around is still valuable. Perhaps that's the reason?

Or perhaps it was just the software engineering aspect of it. To have a
special case of not journaling when a Quorum is available probably would
have required large scale changes to very brittle and important code, and
the designers chose to not increase the maintenance burden, and work with
the abstraction of the journal?

Good question though. Thanks for bringing it up and making us think about
it.

Cheers
Ravi

On Mon, Feb 27, 2017 at 11:16 PM, surendra lilhore <
surendra.lilh...@huawei.com> wrote:

> Hi Amit,
>
>
>
> 1. Shared storage is used instead of direct write to standby, to allow
> cluster to be functional, even when the standby is not available. Shared
> storage is distributed, it will be functional even if one of the node
> (standby) fails. So it supports uninterrupted functionality for the user.
>
>
>
> 2. HDFS used shared storage or journal node to avoiding the “split-brain”
> syndrome, where multiple namenodes think they’re in charge of the cluster.
> JournalNodes node will allow only one active namenode to write the edits
> logs.
>
> For more info you can check the HDFS document https://hadoop.apache.org/
> docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.
> html
>
>
>
> Regards
>
> Surendra
>
>
>
>
>
> *From:* Amit Kabra [mailto:amitkabrai...@gmail.com
> ]
> *Sent:* 27 February 2017 10:29
> *To:* user@hadoop.apache.org
> *Subject:* Journal nodes , QJM requirement
>
>
>
> Hi Hadoop Users,
>
>
>
> I have one question, didn't get information on internet.
>
>
>
> Why hadoop needs journaling system. In order to sync Active / Standby NN,
> instead of using Journal node or any shared system, can't it do
> master-slave or multi master replication where for any write master will
> write to other master/slave as well and only once replication is done at
> other sites will commit / accept the write ?
>
>
>
> One reason I could think is journal node writes data from NN in append
> only mode which *might* make it faster as compared to writing to slave /
> another master for replication but I am not sure.
>
>
>
> Any pointers ?
>
>
>
> Thanks,
>
> Amit Kabra.
>

Re: WordCount MapReduce error

2017-02-23 Thread Ravi Prakash

Hi Vasil!

Thanks a lot for replying with your solution. Hopefully someone else will
find it useful. I know that the pi example (amongst others) is in
hadoop-mapreduce-examples-2.7.3.jar . I'm sorry I do not know of
Matrix-vector multiplication bundled in the Apache Hadoop source. I'm sure
lots of people on github may have tried that though.

Glad it worked for you finally! :-)
Regards
Ravi

On Thu, Feb 23, 2017 at 5:41 AM, Васил Григоров <vask...@abv.bg> wrote:

> Dear Ravi,
>
> Even though I was unable to understand most of what you suggested me to
> try due to my lack of experience in the field, one of your suggestions did
> guide me in the right direction and I was able to solve my error. I decided
> to share it as you mentioned that you're adding this conversation to user
> mailing list for other people to see in case they run into a similiar
> problem.
>
> It turns out that my Windows username being consisted of 2 words: "Vasil
> Grigorov" has messed up the paths for the application somewhere due to the
> space inbetween the words. I thought I had fixed it by setting the
> HADOOP_IDENT_STRING variable to equal "Vasil Grigorov" from the default
> %USERNAME%, but that only disregarded my actual username. Since there is no
> way of changing my Windows username, I decided to make another account
> called "Vadoop" and tested running the code there. And to my surprise, the
> WordCount code ran with no issue, completing both the Map and Reduce tasks
> to 100% and giving me the correct output in the output directory. It's a
> bit annoying that I had to go through all this trouble just because the
> hadoop application hasn't been modified to escape space characters in
> people's username but yet again, I don't know how hard that would be to do.
> Anyway, I really appreciate the help and I hope this would help someone
> else in the future.
>
> Additionally, I'm about to test out some more examples provided in the
> hadoop documentation just to get more familiar with how it works. I have
> heard about these famous examples of *Matrix-vector multiplication* and 
> *Estimate
> the value of pi *but I have been unable to find them myself online. Do
> you know if the documentation provides those examples and if so, could you
> please reference them to me? Thank you in advance!
>
> Best regards,
> Vasil Grigorov
>
>
>
> > Оригинално писмо 
> >От: Ravi Prakash ravihad...@gmail.com
> >Относно: Re: WordCount MapReduce error
> >До: Васил Григоров <vask...@abv.bg>, user <user@hadoop.apache.org>
> >Изпратено на: 23.02.2017 02:22
>
> Hi Vasil!
>
> I'm taking the liberty of adding back user mailing list in the hope that
> someone in the future may chance on this conversation and find it useful.
>
> Could you please try by setting HADOOP_IDENT_STRING="Vasil" , although I
> do see https://issues.apache.org/jira/browse/HADOOP-10978 and I'm not
> sure it was fixed in 2.7.3.
>
> Could you please inspect the OS process that is launched for the Map Task?
> What user does it run as? In Linux, we have the strace utility that would
> let me see all the system calls that a process makes. Is there something
> similar in Windows?
> If you can ensure only 1 Map Task, you could try setting
> "mapred.child.java.opts" to  "-Xdebug -Xrunjdwp:transport=dt_socket,
> server=y,suspend=y,address=1047", then connecting with a remote debugger
> like eclipse / jdb and stepping through to see where the failure happens.
>
> That is interesting. I am guessing the MapTask is trying to write
> intermediate results to "mapreduce.cluster.local.dir" which defaults to
> "${hadoop.tmp.dir}/mapred/local" . hadoop.tmp.dir in turn defaults to
> "/tmp/hadoop-${ user.name}"
>
> Could you please try setting mapreduce.cluster.local.dir (and maybe even
> hadoop.tmp.dir) to preferably some location without space? Once that works,
> you could try narrowing down the problem.
>
> HTH
> Ravi
>
>
> On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров <vask...@abv.bg> wrote:
>
> Hello Ravi, thank you for the fast reply.
>
> 1. I did have a problem with my username having a space, however I solved
> it by changing the  *set HADOOP_IDENT_STRING=%USERNAME% *to * set
> HADOOP_IDENT_STRING="Vasil Grigorov" *in the last line of hadoop-env.cmd.
> I can't change my windows username however, so if you know another file
> where I should specify it?
> 2. I do have a D:\tmp directory and about 500GB free space on that drive
> so space shouldn't be the issue.
> 3. The application has all the required permissions.
>
> Additionally, something I've tested is that if I set the nu

Re: WordCount MapReduce error

2017-02-22 Thread Ravi Prakash

Hi Vasil!

I'm taking the liberty of adding back user mailing list in the hope that
someone in the future may chance on this conversation and find it useful.

Could you please try by setting HADOOP_IDENT_STRING="Vasil" , although I do
see https://issues.apache.org/jira/browse/HADOOP-10978 and I'm not sure it
was fixed in 2.7.3.

Could you please inspect the OS process that is launched for the Map Task?
What user does it run as? In Linux, we have the strace utility that would
let me see all the system calls that a process makes. Is there something
similar in Windows?
If you can ensure only 1 Map Task, you could try setting
"mapred.child.java.opts" to  "-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1047", then
connecting with a remote debugger like eclipse / jdb and stepping through
to see where the failure happens.

That is interesting. I am guessing the MapTask is trying to write
intermediate results to "mapreduce.cluster.local.dir" which defaults to
"${hadoop.tmp.dir}/mapred/local" . hadoop.tmp.dir in turn defaults to
"/tmp/hadoop-${user.name}"

Could you please try setting mapreduce.cluster.local.dir (and maybe even
hadoop.tmp.dir) to preferably some location without space? Once that works,
you could try narrowing down the problem.

HTH
Ravi


On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров <vask...@abv.bg> wrote:

> Hello Ravi, thank you for the fast reply.
>
> 1. I did have a problem with my username having a space, however I solved
> it by changing the *set HADOOP_IDENT_STRING=%USERNAME% *to* set
> HADOOP_IDENT_STRING="Vasil Grigorov" *in the last line of hadoop-env.cmd.
> I can't change my windows username however, so if you know another file
> where I should specify it?
> 2. I do have a D:\tmp directory and about 500GB free space on that drive
> so space shouldn't be the issue.
> 3. The application has all the required permissions.
>
> Additionally, something I've tested is that if I set the number of reduce
> tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get
> the success files for the Map task in my output directory. So the Map tasks
> work fine but the Reduce is messing up. Is it possible that my build is
> somewhat incorrect even though it said everything was successfully built?
>
> Thanks again, I really appreciate the help!
>
>
>
> > Оригинално писмо 
> >От: Ravi Prakash ravihad...@gmail.com
> >Относно: Re: WordCount MapReduce error
> >До: Васил Григоров <vask...@abv.bg>
> >Изпратено на: 22.02.2017 21:36
>
> Hi Vasil!
>
> It seems like the WordCount application is expecting to open the
> intermediate file but failing. Do you see a directory under
> D:/tmp/hadoop-Vasil Grigirov/ . I can think of a few reasons. I'm sorry I
> am not familiar with the Filesystem on Windows 10.
> 1. Spaces in the file name are not being encoded / decoded properly. Can
> you try changing your name / username to remove the space?
> 2. There's not enough space on the D:/tmp directory?
> 3. The application does not have the right permissions to create the file.
>
> HTH
> Ravi
>
> On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров <vask...@abv.bg> wrote:
>
> Hello, I've been trying to run the WordCount example provided on the
> website on my Windows 10 machine. I have built the latest hadoop version
> (2.7.3) successfully and I want to run the code on the Local (Standalone)
> Mode. Thus, I have not specified any configurations, apart from setting the
> JAVA_HOME path in the "hadoop-env.cmd" file. When I try to run the
> WordCount file it fails to run the Reduce task but it completes the Map
> tasks. I get the following output:
>
>
> *D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount>hadoop
> jar wc.jar WordCount
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\input
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\output*
> *17/02/22 18:40:43 INFO Configuration.deprecation: session.id
> <http://session.id> is deprecated. Instead, use dfs.metrics.session-id*
> *17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop command-line
> option parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar file
> set.  User classes may not be found. See Job or Job#setJar(String).*
> *17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to
> process : 2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of splits:2*
&g

Re: WordCount MapReduce error

2017-02-22 Thread Ravi Prakash

Hi Vasil!

It seems like the WordCount application is expecting to open the
intermediate file but failing. Do you see a directory under
D:/tmp/hadoop-Vasil Grigirov/ . I can think of a few reasons. I'm sorry I
am not familiar with the Filesystem on Windows 10.
1. Spaces in the file name are not being encoded / decoded properly. Can
you try changing your name / username to remove the space?
2. There's not enough space on the D:/tmp directory?
3. The application does not have the right permissions to create the file.

HTH
Ravi

On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров  wrote:

> Hello, I've been trying to run the WordCount example provided on the
> website on my Windows 10 machine. I have built the latest hadoop version
> (2.7.3) successfully and I want to run the code on the Local (Standalone)
> Mode. Thus, I have not specified any configurations, apart from setting the
> JAVA_HOME path in the "hadoop-env.cmd" file. When I try to run the
> WordCount file it fails to run the Reduce task but it completes the Map
> tasks. I get the following output:
>
>
> *D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount>hadoop
> jar wc.jar WordCount
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\input
> D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\output*
> *17/02/22 18:40:43 INFO Configuration.deprecation: session.id
>  is deprecated. Instead, use dfs.metrics.session-id*
> *17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop command-line
> option parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.*
> *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar file
> set.  User classes may not be found. See Job or Job#setJar(String).*
> *17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to
> process : 2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of splits:2*
> *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job:
> http://localhost:8080/ *
> *17/02/22 18:40:45 INFO mapreduce.Job: Running job:
> job_local334410887_0001*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set in
> config null*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map tasks*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_00_0*
> *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree:
> ProcfsBasedProcessTree currently is supported only on Linux.*
> *17/02/22 18:40:45 INFO mapred.Task:  Using ResourceCalculatorProcessTree
> : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@3019d00f*
> *17/02/22 18:40:45 INFO mapred.MapTask: Processing split:
> file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file02:0+27*
> *17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi
> 26214396(104857584)*
> *17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100*
> *17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length =
> 6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner:*
> *17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output*
> *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid
> = 104857600*
> *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396(104857584);
> kvend = 26214384(104857536); length = 13/6553600*
> *17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0*
> *17/02/22 18:40:45 INFO mapred.Task:
> Task:attempt_local334410887_0001_m_00_0 is done. And is in the process
> of committing*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: map*
> *17/02/22 18:40:45 INFO mapred.Task: Task
> 'attempt_local334410887_0001_m_00_0' done.*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task:
> attempt_local334410887_0001_m_00_0*
> *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task:
> attempt_local334410887_0001_m_01_0*
> *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer
> Algorithm version is 1*
> *17/02/22

Re: HDFS fsck command giving health as corrupt for '/'

2017-02-16 Thread Ravi Prakash

Hi Nishant!

I'd suggest reading the HDFS user guide to begin with and becoming familiar
with the architecture.
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
.

Where are the blocks stored on the datanodes? Were they on persistent
storage on the EC2 instances or ephemeral? Can you log on to the the
datanodes and find "blk_*" and their corresponding "blk_*" files?

e.g. You can identify the locations of an HDFS file using this command:
HADOOP_USER_NAME=hdfs hdfs fsck  -files -blocks
-locations
If you have Kerberos turned on, then you'd have to get the super-user
credentials and run the command as the super-user.

If there are no datanodes in the list, that means *no datanodes* have
reported the block. NOTE: On startup the Namenode doesn't know where a
block is stored. It only has a mapping from an HDFS file to the blocks. The
Datanodes are the ones that report a block to the Namenode and then the
Namenode remembers (every startup) where to locate the block.

HTH
Ravi


On Wed, Feb 15, 2017 at 11:53 PM, Nishant Verma  wrote:

> Hi Philippe
>
> Yes, I did. I restarted NameNode and other daemons multiple times.
> I found that all my files had got corrupted somehow. I was able to fix the
> issue by running below command:
>
> hdfs fsck / | egrep -v '^\.+$' | grep -v replica | grep -v Replica
>
> But it deleted all the files from my cluster. Only the directory
> structures were left.
>
> My main concern is how did this issue happen and how to prevent it in
> future from happening?
>
> Regards
> Nishant
>
> Nishant
>
> sent from handheld device. please ignore typos.
>
> On Wed, Feb 15, 2017 at 3:01 PM, Philippe Kernévez 
> wrote:
>
>> Hi Nishant,
>>
>> You namenode are probably unable to comunicate with your datanode. Did
>> you restart all the HDFS services ?
>>
>> Regards,
>> Philipp
>>
>> On Tue, Feb 14, 2017 at 10:43 AM, Nishant Verma <
>> nishant.verma0...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves)
>>> installed on AWS EC2 instances. I am using the cluster to integrate it with
>>> Kafka Connect.
>>>
>>> The setup of cluster was done last month and setup of kafka connect was
>>> completed last fortnight. Since then, we were able to operate the kafka
>>> topic records on our HDFS and do various operations.
>>>
>>> Since last afternoon, I find that any kafka topic is not getting
>>> committed to the cluster. When I tried to open the older files, I started
>>> getting below error. When I copy a new file to the cluster from local, it
>>> comes and gets opened but after some time, again starts showing similar
>>> IOException:
>>>
>>> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 file=/test/inputdata/derby.log
>>> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 from any node: java.io.IOException: 
>>> No live nodes contain block BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 after checking nodes = [], 
>>> ignoredNodes = null No live nodes contain current block Block locations: 
>>> Dead nodes: . Will get new block locations from namenode and retry...
>>> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 
>>> IOException, will wait for 499.3472970548959 msec.
>>> 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 file=/test/inputdata/derby.log
>>> 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 from any node: java.io.IOException: 
>>> No live nodes contain block BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 after checking nodes = [], 
>>> ignoredNodes = null No live nodes contain current block Block locations: 
>>> Dead nodes: . Will get new block locations from namenode and retry...
>>> 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 
>>> IOException, will wait for 4988.873277172643 msec.
>>> 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 file=/test/inputdata/derby.log
>>> 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain 
>>> BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 from any node: java.io.IOException: 
>>> No live nodes contain block BP-1831277630-10.16.37.124-1484306078618 
>>> <(430)%20607-8618>:blk_1073793876_55013 after checking nodes = [], 
>>> ignoredNodes = null No live nodes contain current block Block locations: 
>>> Dead nodes: . Will get new block locations from namenode and retry...

Re: How to fix "HDFS Missing replicas"

2017-02-13 Thread Ravi Prakash

Hi Ascot!

Just out of curiosity, which version of hadoop are you using?

fsck has some other options (e.g. -blocks will print out the block report
too, -list-corruptfileblocks prints out the list of missing blocks and
files they belong to) . I suspect you may also want to specify the
-openforwrite option.

In any case, missing blocks are a pretty bad symptom. There's a high
likelihood that you've lost data. If you can't find the blocks on any of
the datanodes, you would want to delete the files on HDFS and recreate them
(however they were originally created). In my experience I've seen missing
files which were never closed. This used to happen in older versions when
an rsync via HDFS NFS / HDFS FUSE is cancelled / fails.

HTH
Ravi

On Sun, Feb 12, 2017 at 4:15 AM, Ascot Moss  wrote:

> Hi,
>
> After running 'hdfs fsck /blocks' to check the cluster, I got
> 'Missing replicas:  441 (0.24602923 %)"
>
> How to fix HDFS missing replicas?
> Regards
>
>
>
>
> (detailed output)
>
> Status: HEALTHY
>
>  Total size:3375617914739 B (Total open files size: 68183613174 B)
>
>  Total dirs:2338
>
>  Total files:   39960
>
>  Total symlinks:0 (Files currently being written: 60)
>
>  Total blocks (validated):  59493 (avg. block size 56739749 B) (Total
> open file blocks (not validated): 560)
>
>  Minimally replicated blocks:   59493 (100.0 %)
>
>  Over-replicated blocks:0 (0.0 %)
>
>  Under-replicated blocks:   111 (0.18657658 %)
>
>  Mis-replicated blocks: 0 (0.0 %)
>
>  Default replication factor:3
>
>  Average block replication: 3.0054965
>
>  Corrupt blocks:0
>
>  Missing replicas:  441 (0.24602923 %)
>
>  Number of data-nodes:  7
>
>  Number of racks:   1
>
>
>

Re: Yarn containers creating child process

2017-02-13 Thread Ravi Prakash

Hi Sandesh!

A *yarn* task is just like any other process on the operating system.
Depending on which ContainerExecutor you use, you should launch the yarn
task with appropriate limits in place. Although I have never tried it, on
Linux you could use setrlimit or
https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt . What yarn
application are you planning on using? Or creating your own?

HTH
Ravi.

On Fri, Feb 10, 2017 at 4:03 PM, Sandesh Hegde 
wrote:

> Hi,
>
> What are the features available to limit the Yarn containers from creating
> the child process?
>
> Thanks
>

Re: HDFS Shell tool

2017-02-10 Thread Ravi Prakash

Hi Vity!

Please let me reiterate that I think its great work and I'm glad you
thought of sharing it with the community. Thanks a lot.

I can think of a few reasons for using WebHDFS, although, if these are not
important to you, it may not be worth the effort:
1. You can point to an HttpFS gateway in case you do not have network
access to the datanodes.
2. WebHDFS is a lot more likely to be compatible with different versions of
Hadoop (https://github.com/avast/hdfs-shell/blob/master/build.gradle#L80)
Although, the community is trying really hard to maintain compatibility
going forward for FileSystem too.
3. You may be able to eliminate linking a lot of jars that hadoop-client
would pull in.

Having said that there may well be reasons why you don't want to use
WebHDFS.

Thanks again!
Ravi

On Fri, Feb 10, 2017 at 12:38 AM, Vitásek, Ladislav <vita...@avast.com>
wrote:

> Hello Ravi,
> I am glad you like it.
> Why should I use WebHDFS? Our cluster sysops, include me, prefer command
> line. :-)
>
> -Vity
>
> 2017-02-09 22:21 GMT+01:00 Ravi Prakash <ravihad...@gmail.com>:
>
>> Great job Vity!
>>
>> Thanks a lot for sharing. Have you thought about using WebHDFS?
>>
>> Thanks
>> Ravi
>>
>> On Thu, Feb 9, 2017 at 7:12 AM, Vitásek, Ladislav <vita...@avast.com>
>> wrote:
>>
>>> Hello Hadoop fans,
>>> I would like to inform you about our tool we want to share.
>>>
>>> We created a new utility - HDFS Shell to work with HDFS more faster.
>>>
>>> https://github.com/avast/hdfs-shell
>>>
>>> *Feature highlights*
>>> - HDFS DFS command initiates JVM for each command call, HDFS Shell does
>>> it only once - which means great speed enhancement when you need to work
>>> with HDFS more often
>>> - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* -
>>> both will work
>>> - *HDFS path completion using TAB key*
>>> - you can easily add any other HDFS manipulation function
>>> - there is a command history persisting in history log
>>> (~/.hdfs-shell/hdfs-shell.log)
>>> - support for relative directory + commands *cd* and *pwd*
>>> - it can be also launched as a daemon (using UNIX domain sockets)
>>> - 100% Java, it's open source
>>>
>>> You suggestions are welcome.
>>>
>>> -L. Vitasek aka Vity
>>>
>>>
>>
>

Re: HDFS Shell tool

2017-02-09 Thread Ravi Prakash

Great job Vity!

Thanks a lot for sharing. Have you thought about using WebHDFS?

Thanks
Ravi

On Thu, Feb 9, 2017 at 7:12 AM, Vitásek, Ladislav  wrote:

> Hello Hadoop fans,
> I would like to inform you about our tool we want to share.
>
> We created a new utility - HDFS Shell to work with HDFS more faster.
>
> https://github.com/avast/hdfs-shell
>
> *Feature highlights*
> - HDFS DFS command initiates JVM for each command call, HDFS Shell does it
> only once - which means great speed enhancement when you need to work with
> HDFS more often
> - Commands can be used in a short way - eg. *hdfs dfs -ls /*, *ls /* -
> both will work
> - *HDFS path completion using TAB key*
> - you can easily add any other HDFS manipulation function
> - there is a command history persisting in history log
> (~/.hdfs-shell/hdfs-shell.log)
> - support for relative directory + commands *cd* and *pwd*
> - it can be also launched as a daemon (using UNIX domain sockets)
> - 100% Java, it's open source
>
> You suggestions are welcome.
>
> -L. Vitasek aka Vity
>
>

Re: Confusion between dfs.replication and dfs.namenode.replication.min options in hdfs-site.xml

2017-02-02 Thread Ravi Prakash

Hi Andrey!

Your assumption is absolutely correct. dfs.namenode.replication.min is what
you should set to 2 in your case. You should also look at
dfs.client.block.write.replace-datanode-on-failure.policy,
dfs.client.block.write.replace-datanode-on-failure.enable and
dfs.client.block.write.replace-datanode-on-failure.best-effort
.

HTH
Ravi

On Wed, Feb 1, 2017 at 1:37 PM, Andrey Elenskiy 
wrote:

> Hello,
>
> I use hadoop 2.7.3 non-HA setup with hbase 1.2.3 on top of it.
>
> I'm trying to understand these options in hdfs-site.xml:
>
> dfs.replication
> 3 Default block replication. The actual number of replications can be
> specified when the file is created. The default is used if replication is
> not specified in create time.
> dfs.namenode.replication.min
> 1 Minimal block replication.
> What I'm trying to do is to make sure that on write we always end up with
> 2 replicas minimum. In other words, a write should fail if we don't end up
> with 2 replicas of each block.
>
> As I understand, on write, hadoop creates a write pipeline of datanodes
> where each datanode writes to the next one. Here's a diagram from Cloudera:
> [image: Inline image 1]
> Is it correct to say that the dfs.namenode.replication.min option
> controls how many datanodes in the pipeline must have COMPLETEd the block
> in order to consider a write successful and then acks to the client about
> success? And dfs.replication option means that we eventually want to have
> this many replicas of each block, but it doesn't need to be done at the
> write time but could be done asynchronously later by the Namenode?
>
> So, essentially, if I want a guarantee that I have one back up of each
> block at all times, I need to set to dfs.namenode.replication.min=2. And,
> if I want to make sure that I won't go into safemode on startup too
> often, I should set dfs.replication = 3 to tolerate one replica loss.
>
>
>

Re: Some Questions about Node Manager Memory Used

2017-01-24 Thread Ravi Prakash

Hi Zhuo Chen!

Yarn has a few methods to account for memory. By default, it is
guaranteeing your (hive) application a certain amount of memory. It depends
totally on the application whether it uses all of that memory, or as in
your case, leaves plenty of headroom in case it needs to expand in the
future.

There's plenty of documentation from several vendors on this. I suggest a
search engine query on the lines of "hadoop Yarn memory usage"

HTH
Ravi

On Tue, Jan 24, 2017 at 1:04 AM, Zhuo Chen  wrote:

> My Hive job gets stuck when submitted to the cluster. To view the Resource
> Manager web UI,
> I found the metrics [mem used] have reached approximately the upper limit.
> but when I login into the host, the OS shows memory used is only 13GB by
> run command 'free', and about 46GB were occupied by cache.
>
> 
>
> so I wonder why there is such inconsistency and how to understand this
> scenario?
> any explanations would be appreciated.
>

Re: Why is the size of a HDFS file changed?

2017-01-09 Thread Ravi Prakash

I have not been able to reproduce this:

[raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt /
[raviprak@ravi ~]$ cd /tmp
[raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt
[raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck
[raviprak@ravi tmp]$ md5sum hck
8dc8966178cc1bf4eb95a5b31780269c  hck
[raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt
8dc8966178cc1bf4eb95a5b31780269c  HuckleberryFinn.txt
[raviprak@ravi tmp]$ hdfs dfs -put hck /
[raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt
/HuckleberryFinn.txtMD5-of-0MD5-of-512CRC32C
0200c99e8741a1f3d311513df9d9e73b0bc8
[raviprak@ravi tmp]$ hdfs dfs -checksum /hck
/hckMD5-of-0MD5-of-512CRC32C
0200c99e8741a1f3d311513df9d9e73b0bc8

This is on trunk.

On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <mungeol@gmail.com> wrote:

> "^A" is used as delimiter in the file.
> However, I don't think this is the reason causing the problem, because
> there are files also using "^A" as delimiter but with no problem.
> BTW, the reason using "^A" as delimiter is these files are hive data.
>
> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <ravihad...@gmail.com>
> wrote:
> > Is there a carriage return / new line / some other whitespace which `cat`
> > may be appending?
> >
> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <mungeol@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> Suppose, I name the HDFS file which cause the problem as A.
> >>
> >> hdfs dfs -ls A
> >> -rw-r--r--   3 web_admin hdfs  868003931 2017-01-04 09:05 A
> >>
> >> hdfs dfs -get A AFromGet
> >> hdfs dfs -cat A > AFromCat
> >>
> >> ls -l
> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromGet
> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromCat
> >>
> >> hdfs dfs -put AFromGet
> >>
> >> diff <(hdfs dfs -cat  A) <(hdfs dfs -cat AFromGet)
> >> (no output, which means the contents of two files are same. At least,
> >> after "cat")
> >>
> >> hdfs dfs -checksum A
> >> A   MD5-of-262144MD5-of-512CRC32C
> >> 0204e667fb4f0dda78101feb2b689af8260b
> >>
> >> hdfs dfs -checksum AFromGet
> >> AFromGet   MD5-of-262144MD5-of-512CRC32C
> >> 02047284759249ff98c7395e6a4bb59343dc
> >>
> >> As I listed some results above. I wonder why is the size of the file
> >> changed.
> >> Any help will be GREAT!
> >>
> >> Thank you.
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: user-h...@hadoop.apache.org
> >>
> >
>

Re: Why is the size of a HDFS file changed?

2017-01-06 Thread Ravi Prakash

Is there a carriage return / new line / some other whitespace which `cat`
may be appending?

On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo  wrote:

> Hello,
>
> Suppose, I name the HDFS file which cause the problem as A.
>
> hdfs dfs -ls A
> -rw-r--r--   3 web_admin hdfs  868003931 2017-01-04 09:05 A
>
> hdfs dfs -get A AFromGet
> hdfs dfs -cat A > AFromCat
>
> ls -l
> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromGet
> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromCat
>
> hdfs dfs -put AFromGet
>
> diff <(hdfs dfs -cat  A) <(hdfs dfs -cat AFromGet)
> (no output, which means the contents of two files are same. At least,
> after "cat")
>
> hdfs dfs -checksum A
> A   MD5-of-262144MD5-of-512CRC32C
> 0204e667fb4f0dda78101feb2b689af8260b
>
> hdfs dfs -checksum AFromGet
> AFromGet   MD5-of-262144MD5-of-512CRC32C
> 02047284759249ff98c7395e6a4bb59343dc
>
> As I listed some results above. I wonder why is the size of the file
> changed.
> Any help will be GREAT!
>
> Thank you.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: Small mistake (?) in doc about HA with Journal Nodes

2016-12-05 Thread Ravi Prakash

Hi Alberto!

The assumption is that *multiple* machines could be running the Namenode
process. Only one of them would be active, while the other Namenode
processes would be in Standby mode.

The number of machines is suggested to be odd so that its easier to form
consensus. To handle the failure of k machines, 2k+1 is usually the number
of QJMs you'd need.

HTH
Ravi

On Mon, Dec 5, 2016 at 11:32 AM, Alberto Chiusole <
alberto.chiusol...@gmail.com> wrote:

> Hi all,
> I'm Alberto Chiusole, an Italian computer science student and open-source
> fan.
> I'm currently performing a small research to expose to my fellow students
> the Hadoop project, and this is my first post in this ML.
>
> I think I spotted I small mistake in the HDFS documentation regarding
> achieving HA with the Quorum Journal Manager [1], section "Hardware
> resources", paragraph "JournalNode machines": it's stated:
> """
> The JournalNode daemon is relatively lightweight, so these daemons may
> reasonably be collocated on machines with other Hadoop daemons, for example
> NameNodes, the JobTracker, (...)
> """
>
> Is "NameNodes" a typo and you meant "DataNode" instead? Aren't the
> JournalNodes meant to survive in case of a failure of the NameNodes? Why
> should I place a JournalNode on the same machine that contains the log I
> need to synchronize?
>
>
> Moreover I have a quick question on the same topic: why do you suggest to
> place an odd numbers of machines as JournalNodes in order to increase the
> Fault Tolerance?
>
>
> Regards,
> Alberto Chiusole
>
>
> [1]: https://hadoop.apache.org/docs/stable/hadoop-project-dist/
> hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Hardware_resources
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: Does the JobHistoryServer register itself with ZooKeeper?

2016-11-16 Thread Ravi Prakash

Are you talking about the Mapreduce JobHistoryServer? I am not aware of it
needing Zookeeper for anything. What gave you that impression?

On Wed, Nov 16, 2016 at 11:32 AM, Benson Qiu 
wrote:

> I'm looking for a way to check for connectivity to the JobHistoryServer.
>
> One way I can think of is to create a Socket connection (in Java code) to
> the JobHistoryServer IPC port specified in mapreduce.jobhistory.address.
>
> If the JHS registers itself with ZooKeeper, is there a way for me to ping
> ZooKeeper to check the status of the JHS?
>

Re: How to mount HDFS as a local file system?

2016-11-10 Thread Ravi Prakash

Or you could use NFS
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html
. In our experience, both of them still need some work for stability and
correctness.

On Thu, Nov 10, 2016 at 10:00 AM,  wrote:

> Fuse is your tool:
>
> https://wiki.apache.org/hadoop/MountableHDFS
>
>
>
> --
> m: wget.n...@gmail.com
> b: https://mapredit.blogspot.com
>
>
>
> *From: *Alexandr Porunov 
> *Sent: *Thursday, November 10, 2016 6:56 PM
> *To: *user.hadoop 
> *Subject: *How to mount HDFS as a local file system?
>
>
>
> Hello,
>
>
>
> I try to understand how to mount HDFS as a local file system but without
> success. I already have a running a hadoop cluster 2.7.1 but I can access
> HDFS only with hdfs dfs tool. For example:
>
> hdfs dfs -mkdir /test
>
>
>
> Can somebody help me to figure out how to mount it?
>
>
>
> Sincerely,
>
> Alexandr
>
>
>

Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes?

2016-11-09 Thread Ravi Prakash

Hi Rafal!

Have you been able to launch the job successfully first without configuring
node-labels? Do you really need node-labels? How much total memory do you
have on the cluster? Node labels are usually for specifying special
capabilities of the nodes (e.g. some nodes could have GPUs and your
application could request to be run on only the nodes which have GPUs)

HTH
Ravi

On Wed, Nov 9, 2016 at 5:37 AM, Rafał Radecki 
wrote:

> Hi All.
>
> I have a 4 node cluster on which I run yarn. I created 2 queues "long" and
> "short", first with 70% resource allocation, the second with 30%
> allocation. Both queues are configured on all available nodes by default.
>
> My memory for yarn per node is ~50GB. Initially I thought that when I will
> run tasks in "short" queue yarn will allocate them on all nodes using 30%
> of the memory on every node. So for example if I run 20 tasks, 2GB each
> (40GB summary), in short queue:
> - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB
> available on this node for "short" queue -> 15GB)
> - next ~7 tasks will be scheduled on node2
> - ~6 remaining tasks will be scheduled on node3
> - yarn on node4 will not use any resources assigned to "short" queue.
> But this seems not to be the case. At the moment I see that all tasks are
> started on node1 and other nodes have no tasks started.
>
> I attached my yarn-site.xml and capacity-scheduler.xml.
>
> Is there a way to force yarn to use configured above thresholds (70% and
> 30%) per node and not per cluster as a whole? I would like to get a
> configuration in which on every node 70% is always available for "short"
> queue, 70% for "long" queue and in case any resources are free for a
> particular queue they are not used by other queues. Is it possible?
>
> BR,
> Rafal.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Capacity scheduler for yarn oin 2.7.3 - problem with job scheduling to created queue.

2016-11-08 Thread Ravi Prakash

Hi Rafal!

Have you been able to launch the job successfully first without configuring
node-labels? Do you really need node-labels? How much total memory do you
have on the cluster? Node labels are usually for specifying special
capabilities of the nodes (e.g. some nodes could have GPUs and your
application could request to be run on only the nodes which have GPUs)

HTH
Ravi

On Tue, Nov 8, 2016 at 6:19 AM, Rafał Radecki 
wrote:

> Hi All.
>
> I configured yarn to use capacity scheduler, I have for physical nodes. On
> first of them I run resourcemanager and nodemanager on all of them.
>
> My capacity-scheduler.xml and yarn-site.yml are attached.
> When I submit a job to the "long" queue I get in resourcemanager's logfile
> the content of attached rm.log and in RM qui the job is in state
> "ACCEPTED: waiting for AM container to be allocated, launched and
> register with RM"
> and has finalstatus
> "UNDEFINED"
>
> At the same time I see that in RM gui in scheduler section I have four
> partitions (node1-4d) in which in every one there are two queues "long" and
> "short" available. To summarize when I run a task, samza task in my case, I
> only specify (https://samza.apache.org/learn/documentation/0.10/jobs/
> yarn-jobs.html)
> yarn.queue=long
> or
> yarn.queue=short
>
> Have I missed something?
>
> BR,
> Rafal.
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: Fw:Re:How to add custom field to hadoop MR task log?

2016-11-04 Thread Ravi Prakash

Hi Maria!

You have to be careful which log4j.properties file is on the classpath of
the task which was launched. Often times there are multiple
log4j.properties file, perhaps in the classpaths or in one of the jars on
the classpath. Are you sure the log4j.properties file you edited is the
only one loaded by the classloader?

Ravi

On Fri, Nov 4, 2016 at 5:06 AM, Maria  wrote:

> Sorry,(clerical errors) .
> I just modified “log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601}
> %p %c: %m%n”
> to "log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: [ID:
> %X{ID}] %m%n",
> and use "MDC.put("ID",ID)" in mapper class, It does not work.
>
>
> At 2016-11-04 17:01:16, "Maria"  wrote:
> >
> >I know that, A simple way is to write " " to every 
> >LOG.info()/LOG.warn()like this:
> >
> >logger.info(ID + " start map logic");
> >BUT,every LOG info has to add "ID" is not wise.
> >Or else, can someone know how to modify the mapreduce task ConversionPattern 
> >configuration?
> >I tried to modify "RFA" Appender to this:
> >---
> >log4j.appender.RFA=org.apache.log4j.RollingFileAppender
> >log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
> >log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
> >log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
> >log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
> ># Pattern format: Date LogLevel LoggerName LogMessage
> >log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
> ># Debugging Pattern format
> >#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %l - [ID: 
> >%X{ID}]  %m%n
> >-
> >It does not work.
> >
> >At 2016-11-04 11:26:57, "Maria"  wrote:
> >>
> >>Hi, dear developers,
> >>
> >>I'm trying to reconfig $HADOOP/etc.hadoop/log4j.properties,
> >>I want to add an  to mapreduce log before LOGmessage. Like this:
> >>"ID:234521 start map logic"
> >>
> >>My steps as follow:
> >>(1)In my Mapper Class:
> >>
> >>static Logger logger = LoggerFactory.getLogger(Mapper.class);
> >>
> >>
> >>public void map(Object key, Text value, Context context) throws 
> >>IOException, InterruptedException {
> >>
> >>MDC.put("ID", "operatorID");
> >>logger.info("start map logic");
> >>
> >> StringTokenizer itr = new StringTokenizer(value.toString());
> >> while (itr.hasMoreTokens()) {
> >>   word.set(itr.nextToken());
> >>   context.write(word, one);
> >> }
> >>   }
> >> }
> >>(2)config $HADOOP/etc.hadoop/log4j.properties
> >>
> >>log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
> >>log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
> >>log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
> >>log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
> >>
> >>log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
> >>log4j.appender.TLA.layout.ConversionPattern=%l  %p %c: ID:[%X{ID}]  %m%n
> >>
> >>
> >>BUT it does not work. and because use slf4j API, so I don't know how to get 
> >>Appenders.
> >>
> >>I am desperately in need。。
> >>Any help would be highly appreciated
> >>-
> >>To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >>For additional commands, e-mail: user-h...@hadoop.apache.org
> >
> >
> >【网易自营|30天无忧退货】日本匠心设计秋冬宠物用品，限时9.9元起，还可叠加双11折扣>>
> >
> >-
> >To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >For additional commands, e-mail: user-h...@hadoop.apache.org
>
>
>
>
>

Re: why the default value of 'yarn.resourcemanager.container.liveness-monitor.interval-ms' in yarn-default.xml is so high?

2016-11-03 Thread Ravi Prakash

Hi Tanvir!

Although an application may request for that node, a container won't be
scheduled until the nodemanager sends a heartbeat. If the application
hasn't specified a preference for that node, then whichever node heartbeats
next, will be used to launch a container.

HTH
Ravi

On Thu, Nov 3, 2016 at 12:12 PM, Tanvir Rahman <tanvir9982...@gmail.com>
wrote:

> Thank you Ravi for your reply.
> I found one parameter 'yarn.resourcemanager.nm.
> liveness-monitor.interval-ms' (default value=1000ms) in yarn-default.xml
> (v2.4.1) which determines how often to check that node managers are still
> alive. So RM is checking heartbeat of NM every second but it takes 10 min
> to decide whether the NM is dead or not. (yarn.nm.liveness-monitor.
> expiry-interval-ms: How long to wait until a node manager is considered
> dead; default value = 60 ms).
>
> What happens if RM finds that one NM's heartbeat is missing but it is not
> 10 min yet (yarn.nm.liveness-monitor.expiry-interval-ms time is not
> expired yet)
> Will a new application still make container request to that NM via RM?
>
> Thanks
> Tanvir
>
>
>
>
>
> On Wed, Nov 2, 2016 at 5:41 PM, Ravi Prakash <ravihad...@gmail.com> wrote:
>
>> Hi Tanvir!
>>
>> Its hard to have some configuration that works for all cluster scenarios.
>> I suspect that value was chosen as somewhat a mirror of the time it takes
>> HDFS to realize a datanode is dead (which is also 10 mins from what I
>> remember). The RM also has to reschedule the work when that timeout
>> expires. Also there may be network glitches which could last that
>> long.. Also, the NMs are pretty stable by themselves. Failing NMs have
>> not been too common in my experience.
>>
>> HTH
>> Ravi
>>
>> On Wed, Nov 2, 2016 at 10:44 AM, Tanvir Rahman <tanvir9982...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> Can anyone please tell me why the default value of '
>>> yarn.resourcemanager.container.liveness-monitor.interval-ms' in
>>> yarn-default.xml
>>> <https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml>
>>>  is
>>> so high? This parameter determines "How often to check that containers
>>> are still alive". The default value is 6 ms or 10 minutes. So if a
>>> node manager fails, the resource manager detects the dead container after
>>> 10 minutes.
>>>
>>>
>>> I am running a wordcount code in my university cluster. In the middle of
>>> run, I stopped node manager of one node (the data node is still running)
>>> and found that the completion time increases about 10 minutes because of
>>> the node manager failure.
>>>
>>> Thanks in advance
>>> Tanvir
>>>
>>>
>>>>
>>>
>>
>

Re: unsubscribe

2016-10-31 Thread Ravi Prakash

Please email user-unsubscr...@hadoop.apache.org

On Mon, Oct 31, 2016 at 2:29 AM, 风雨无阻 <232341...@qq.com> wrote:

> unsubscribe
>

Re: Bug in ORC file code? (OrcSerde)?

2016-10-19 Thread Ravi Prakash

MIchael!

Although there is a little overlap in the communities, I strongly suggest
you email u...@orc.apache.org ( https://orc.apache.org/help/ ) I don't know
if you have to be subscribed to a mailing list to get replies to your email
address.

Ravi



On Wed, Oct 19, 2016 at 11:29 AM, Michael Segel 
wrote:

> Just to follow up…
>
> This appears to be a bug in the hive version of the code… fixed in the orc
> library…  NOTE: There are two different libraries.
>
> Documentation is a bit lax… but in terms of design…
>
> Its better to do the build completely in the reducer making the mapper
> code cleaner.
>
>
> > On Oct 19, 2016, at 11:00 AM, Michael Segel 
> wrote:
> >
> > Hi,
> > Since I am not on the ORC mailing list… and since the ORC java code is
> in the hive APIs… this seems like a good place to start. ;-)
> >
> >
> > So…
> >
> > Ran in to a little problem…
> >
> > One of my developers was writing a map/reduce job to read records from a
> source and after some filter, write the result set to an ORC file.
> > There’s an example of how to do this at:
> > http://hadoopcraft.blogspot.com/2014/07/generating-orc-
> files-using-mapreduce.html
> >
> > So far, so good.
> > But now here’s the problem….  Large source data, means many mappers and
> with the filter, the number of output rows is a fraction in terms of size.
> > So we want to write to a single reducer. (An identity reducer) so that
> we get only a single file.
> >
> > Here’s the snag.
> >
> > We were using the OrcSerde class to serialize the data and generate an
> Orc row which we then wrote to the file.
> >
> > Looking at the source code for OrcSerde, OrcSerde.serialize() returns a
> OrcSerdeRow.
> > see: http://grepcode.com/file/repo1.maven.org/maven2/co.
> cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
> >
> > OrcSerdeRow implements Writable and as we can see in the example code…
> for a map only example… context.write(Text, Writable) works.
> >
> > However… if we attempt to make this in to a Map/Reduce job, we run in to
> a problem during run time. the context.write() throws the following
> exception:
> > "Error: java.io.IOException: Type mismatch in value from map: expected
> org.apache.hadoop.io.Writable, received org.apache.hadoop.hive.ql.io.
> orc.OrcSerde$OrcSerdeRow”
> >
> >
> > The goal was to reduce the orc rows and then write out in the reducer.
> >
> > I’m curious as to why the context.write() fails?
> > The error is a bit cryptic since the OrcSerdeRow implements Writable… so
> the error message doesn’t make sense.
> >
> >
> > Now the quick fix is to borrow the ArrayListWritable from giraph and
> create the list of fields in to an ArrayListWritable and pass that to the
> reducer which will then use that to generate the ORC file.
> >
> > Trying to figure out why the context.write() fails… when sending to
> reducer while it works if its a mapside write.
> >
> > The documentation on the ORC site is … well… to be polite… lacking. ;-)
> >
> > I have some ideas why it doesn’t work, however I would like to confirm
> my suspicions.
> >
> > Thx
> >
> > -Mike
> >
> >
> >  B�CB�
> � [��X��ܚX�K  K[XZ[ � \�\�][��X��ܚX�P  Y �� �\ X� K�ܙ�B��܈ Y  ] [ۘ[  ��[X[�
> �  K[XZ[ � \�\�Z [Y �� �\ X� K�ܙ�B
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

Re: file permission issue

2016-10-17 Thread Ravi Prakash

Hi!

https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L1524

Just fyi, there are different kinds of distributed cache:
http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/ Here's
a good article from Vinod.

HTH
Ravi

On Mon, Oct 17, 2016 at 7:56 AM, CB  wrote:

> Hi,
>
> I'm running Hadoop 2.7.1 release.
> While I'm running a MapReduce job, I've encountered a file permission
> issue as shown below because I'm working in an environment running Linux
> where world permissions bits are disabled.
>
> 2016-10-14 15:51:45,333 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file /state/partition1/hadoop/nm-
> local-dir/nmPrivate/container_1476470591621_0004_02_01.tokens.
> Credentials list:
>
> 2016-10-14 15:51:45,375 WARN org.apache.hadoop.yarn.server.
> nodemanager.containermanager.localizer.ResourceLocalizationService:
> Permissions incorrectly set for dir 
> /state/partition1/hadoop/nm-local-dir/usercache,
> should be rwxr-xr-x, actual value = rwxr-x---
>
> Does any one have any suggestions to work around the issue for a single
> user environment, where one user can running all the services and run the
> Map-reduce jobs?
>
> I'm not familiar with the source code but if you suggest me where to
> modify to relax the check, it would be appreciated.
>
> Thanks,
> - Chansup
>
>

Re: hadoop cluster container memory limit

2016-10-14 Thread Ravi Prakash

Hi!

Look at yarn.nodemanager.resource.memory-mb in
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

I'm not sure how 11.25Gb comes in. How did you deploy the cluster?

Ravi

On Thu, Oct 13, 2016 at 9:07 PM, agc studio 
wrote:

> Hi all,
>
> I am running a EMR cluster with 1 master node and 10 core nodes.
>
> When I go to the dashboard of the hadoop cluster, I each container only
> has 11.25 GB memory available where as the instance that I use for
> it(r3.xlarge) has 30.5 GB of memory.
>
> may I ask, how is this possible and why? Also is it possible to fully
> utilise these resources.
> I am able to change the settings to utilise the 11.25 GB available memory
> but I am wondering about the remainder of the 30.5GB that r3.xlarge offers?
> --
> HEAP=9216
> -Dmapred.child.java.opts=-Xmx${HEAP}m \
> -Dmapred.job.map.memory.mb=${HEAP} \
> -Dyarn.app.mapreduce.am.resource.mb=1024 \
> -Dmapred.cluster.map.memory.mb=${HEAP} \
> --
> Please see the link of the cluster screenshot. http://imgur.com/a/zFvyw
>

Re: Where does Hadoop get username and group mapping from for linux shell username and group mapping?

2016-10-14 Thread Ravi Prakash

Chen!

It gets it from whatever is configured on the Namenode.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Group_Mapping

HTH
Ravi

On Thu, Oct 13, 2016 at 7:43 PM, chen dong  wrote:

> Hi,
>
> Currently I am working on a project to enhance the security for the Hadoop
> cluster. Eventually I will use Kerberos and Sentry for authentication and
> authorisation. And the username and group mapping will come from AD/LDAP
> (?), I think so.
>
> But now I am just learning and trying. I have a question and I haven’t
> figure it out is
>
> *where the username/group mapping information come from? *
>
> As far as I know there is no username and group name for Hadoop and
> username and group name come from the client wherever from local client
> machine or Kerberos realm. But it is a little bit vague for me and can I
> get the implementation details here?
>
> Is this information from the machine where HDFS client is located or from
> the linux shell username and group on name node?  Or it depends on the
> context - even related to data node? What if the data nodes and name nodes
> have different users or user-group mapping in the local boxes.
>
> Regards,
>
> Dong
>
>

Re: Issue in Rollback (after rolling upgrade) from hadoop 2.7.2 to 2.5.2

2016-10-13 Thread Ravi Prakash

Hi Dinesh!

This is obviously a very hazardous situation you are in (if your data is
important), so I'd suggest moving carefully. Make as many backups of as
many things you can.

The usual mechanism that Hadoop uses when upgrading is to rename
directories of the old format and keep them around until the admin
finalizes the upgrade. Here is the relevant method :
https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L388
. You will probably have to dig into code and see what operations were
performed, where the failure occurred, and figure out how best to fix it.

In fact for your particular upgrade there were quite substantial changes
not just in the Namenode formats, but also the layouts on the datanodes
(which you may have to work on once you recover your namenode) .
https://issues.apache.org/jira/browse/HDFS-6482 . I'm guessing operations
will take a long time and may not work. Why do you need to rollback? We are
on 2.7.2 and its working fine for us.

HTH
Ravi

On Wed, Oct 12, 2016 at 11:01 PM, Dinesh Kumar Prabakaran <
dineshpv...@gmail.com> wrote:

> Hi Guys,
>
> Did rolling upgrade from hadoop 2.5.2 to hadoop 2.7.2 and *did not
> finalize* the upgrade. Now I wished to rollback to 2.5.2 version based on
> reference
> 
> .
>
> Starting Name node 1 as active with *-rollingUpgrade rollback*, it gets
> shuts down with the following exception,
>
> *org.apache.hadoop.hdfs.server.common.IncorrectVersionException:
> Unexpected version of storage directory ..\Metadata\data\dfs\namenode.
> Reported: -63. Expecting = -57.*
>
> There is already a task regarding this but the status is *Open*.
> https://issues.apache.org/jira/browse/HDFS-9096
>
> Please let me know are there any *work around *to rollback HDFS from
> rolling upgrade without any issues.
>
> Thanks,
> Dinesh Kumar P
>
>
>
>

Re: Hadoop: precomputing data

2016-10-12 Thread Ravi Prakash

I guess one of the questions is what is your false negative rate in
Approach 1 Step 1?

Ofcourse if you are limited by resources you may have to go with Approach 1.

On Thu, Oct 6, 2016 at 6:14 AM, venito camelas 
wrote:

> I'm designing a prototype using *Hadoop* for video processing to do face
> recognition. I thought of 2 ways of doing it.
>
> *Approach 1:*
>
> I was thinking of doing something in 2 steps:
>
>1. A map that receives frames and if a face is found it gets stored
>for the next step.
>2. A map that receives the frames from step 1 (all frames containing 1
>face at least) and does face recognition.
>
> Step 1 would be ran only once while step 2 runs every time I want
> recognize a new face.
>
>
> *Approach 2:*
>
> The other approach I thought about is to do face recognition to all the
> data every time
>
> The first approach saves time because I don't have to process faceless
> frames every time I want to do face recognition, it also uses more disk
> space (and it could be a lot of space).
>
>
> I'm not sure whats better. Is it a bad thing to leave that precomputed
> frames there forever?
>

Re: Newbie Ambari Question

2016-10-12 Thread Ravi Prakash

I suspect https://ambari.apache.org/mail-lists.html may be more useful.

On Thu, Oct 6, 2016 at 2:45 AM, Deepak Goel  wrote:

>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
> Sorry, Is this the right forum for asking a question "Ambari Hadoop
> Installation" from Hortonworks?
>
> Thanks
> Deepak
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>

Re: HDFS Issues.

2016-10-12 Thread Ravi Prakash

There are a few conditions for the Namenode to come out of safemode.
# Number of datanodes,
# Number of blocks that have been reported.

How many blocks have the datanodes reported?

On Tue, Oct 4, 2016 at 1:22 PM, Steve Brenneis  wrote:

> I have an HDFS cluster of three nodes. They are all running on Amazon EC2
> instances. I am using HDFS for an HBase backing store. Periodically, I will
> start the cluster and the name node stays in safe mode because it says the
> number of live datanodes has dropped to 0.
>
> The number of live datanodes 2 has reached the minimum number 0. Safe mode 
> will be turned off automatically once the thresholds have been reached.
>
> The datanode logs appear to be normal, with no errors indicated. The
> dfsadmin report says the datanodes are both normal and that the name node
> is in contact with them.
>
> Safe mode is ON
> Configured Capacity: 16637566976 (15.49 GB)
> Present Capacity: 7941234688 (7.40 GB)
> DFS Remaining: 7940620288 (7.40 GB)
> DFS Used: 614400 (600 KB)
> DFS Used%: 0.01%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> -
> Live datanodes (2):
>
> Name: 172.31.52.176:50010 (dev2)
> Hostname: dev2
> Decommission Status : Normal
> Configured Capacity: 8318783488 (7.75 GB)
> DFS Used: 307200 (300 KB)
> Non DFS Used: 3257020416 (3.03 GB)
> DFS Remaining: 5061455872 (4.71 GB)
> DFS Used%: 0.00%
> DFS Remaining%: 60.84%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Tue Oct 04 15:47:00 EDT 2016
>
>
> Name: 172.31.63.188:50010 (dev1)
> Hostname: dev1
> Decommission Status : Normal
> Configured Capacity: 8318783488 (7.75 GB)
> DFS Used: 307200 (300 KB)
> Non DFS Used: 5439311872 (5.07 GB)
> DFS Remaining: 2879164416 (2.68 GB)
> DFS Used%: 0.00%
> DFS Remaining%: 34.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Tue Oct 04 15:47:00 EDT 2016
>
> If I force the name node out of safe mode, the fsck commmand says that the
> file system is corrupt. When this happens, the only thing I've been able to
> do to get it back is to format the HDFS file system. I have not changed the
> configuration of the cluster. This just randomly seems to occur. The system
> is in development, but this will be unacceptable in production.
> I’m using version 2.7.3. Thank you in advance for any help.
>
>

Re: HDFS Replication Issue

2016-10-12 Thread Ravi Prakash

Hi Eric!

Did you follow https://hadoop.apache.org/docs/current2/hadoop-project-
dist/hadoop-common/SingleCluster.html to set up your single node cluster?
Did you set dfs.replication in hdfs-site.xml ? The logs you posted don't
have enough information to debug the issue.

*IF* everything has been set up correctly, your understanding is correct.
The block would be written to the single datanode. *IF* the replication was
set to >1, and when the block was written it didn't have enough replicas, a
"source" replica would be chosen to write to "target" datanodes, so that
sufficient replicas existed for your block. If there were no datanodes
available, the # of "targets" would be 0 and so HDFS wouldn't be able to
achieve the replication you requested. Your configuration would have to be
a bit messed up for HDFS to even allow you to write a file with less than
minimum replication and then try to replicate after you close.

I suggest you follow the SingleCluster.html doc assiduously.

HTH
Ravi

On Tue, Oct 4, 2016 at 11:58 AM, Eric Swenson  wrote:

> I have set up a single node cluster (initially) and am attempting to
> write a file from a client outside the cluster.  I’m using the Java
> org.apache.hadoop.fs.FileSystem interface to write the file.
>
> While the write call returns, the close call hangs for a very long time,
> eventually
> returns, but the resulting file in HDFS is 0 bytes in length. The namenode
> log
> says:
>
> 2016-10-03 22:01:41,367 INFO BlockStateChange: chooseUnderReplicatedBlocks
> selected 1 blocks at priority level 0;  Total=1 Reset bookmarks? true
> 2016-10-03 22:01:41,367 INFO BlockStateChange: BLOCK* neededReplications =
> 1, pendingReplications = 0.
> 2016-10-03 22:01:41,367 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
> Blocks chosen but could not be replicated = 1; of which 1 have no target, 0
> have no source, 0 are UC, 0 are abandoned, 0 already have enough replicas.
>
> Why is the block not written to the single datanode (same as
> namenode)? What does it mean to "have no target"? The replication
> count is 1 and I would have thought that a single copy of the file
> would be stored on the single cluster node.
>
> I decided to see what happened if I added a second node to the cluster.
> Essentially the same thing happens.  The file (in HDFS) ends up being
> zero-length, and I get similar messages from the NameNode telling me that
> there are additional neededReplications and that none of the blocks could
> be replicated because they “have no target”.
>
> If I SSH into the combined Name/Data node instance and use the “hdfs dfs
> -put” command, I have no trouble storing files.  I’m using the same user
> regardless of whether I’m using a remote fs.write operation or whether I’m
> using the “hdfs dfs -put” command while logged into the NameNode.
>
> What am I doing wrong?  — Eric
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: YARN Resource Allocation When Memory is Very Small

2016-08-30 Thread Ravi Prakash

Hi Nico!

The RM is configured with a minimum allocation. Take a look at
"yarn.scheduler.minimum-allocation-mb" . You can also read through code
here:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L1295
.
Please note that the code above is for the trunk branch, you should go look
in the branch that you are running. Thanks to great work from some folks,
this code has been evolving a lot recently.

HTH
Ravi

On Tue, Aug 30, 2016 at 12:09 PM, Nico Pappagianis <
nico.pappagia...@salesforce.com> wrote:

> How does YARN decide if remaining memory is enough AM allocation
>
> Hi all,
>
> Somewhat of a newb here so hopefully this question isn't too trivial.
>
> I want to know how YARN decides if the remaining memory after AM resource
> allocation is "enough". Here's a breakdown of what's going on in my
> particular case.
>
> We have a YARN queue that has ~1.7GB memory allocated to it. We have a
> parent MR job that spawns multiple child jobs. The AM container memory for
> the parent job is configured to 1GB. Once the AM container gets allocated
> its 1GB there is a remaining ~0.7GB memory available. In my case that 0.7GB
> is enough for the child jobs to succeed.
>
> My question is, what if that remaining amount wasn't 0.7GB but something
> much less, like 0.1GB, or 0.0001GB? How does YARN handle those cases - will
> it allocate *any* remaining resources to the child jobs? Will there be a
> ton of thrashing if there is only a small amount of memory available?
>
> Thanks for any help, or a pointer in the right direction!
>

Re: Yarn web UI shows more memory used than actual

2016-08-15 Thread Ravi Prakash

Hi Suresh!

YARN's accounting for memory on each node is completely different from the
Linux kernel's accounting of memory used. e.g. I could launch a MapReduce
task which in reality allocates just 100 Mb, and tell YARN to give it 8 Gb.
The kernel would show the memory requested by the task, the resident memory
(which would be ~ 100Mb) and the NodeManager page will show 8Gb used.
Please see
https://yahooeng.tumblr.com/post/147408435396/moving-the-utilization-needle-with-hadoop

HTH
Ravi

On Mon, Aug 15, 2016 at 5:58 AM, Sunil Govind 
wrote:

> Hi Suresh
>
> "This 'memory used' would be the memory used by all containers running on
> that node"
> >> "Memory Used" in Nodes page indicates how memory is used in all the
> node managers with respect to the corresponding demand made to RM. For eg,
> if application has asked for 4GB resource and if its really using only 2GB,
> then this kind of difference can be shown (one possibility). Which means
> 4GB will be displayed in Node page.
>
> As Ray has mentioned if the demand for resource is more from AM itself OR
> with highly configured JVM size for containers (through java opts), there
> can be chances that containers may take more that you intented and UI will
> display higher value.
>
> Thanks
> Sunil
>
> On Sun, Aug 14, 2016 at 6:35 AM Suresh V  wrote:
>
>> Hello Ray,
>>
>> I'm referring to the nodes of the cluster page, which shows the
>> individual nodes and the total memory available in each node and the memory
>> used in each node.
>>
>> This 'memory used' would be the memory used by all containers running on
>> that node; however, if I check free command in the node, there is
>> significant difference. I'm unable to understand this...
>>
>> Appreciate any light into this. I agree the main RM page shows the total
>> containers memory utilization across nodes., which is matching the sum of
>> memory used in each nodes as displayed in the 'nodes of the cluster' page...
>>
>> Thank you
>> Suresh.
>>
>>
>> Suresh V
>> http://www.justbirds.in
>>
>>
>> On Sat, Aug 13, 2016 at 12:44 PM, Ray Chiang  wrote:
>>
>>> The RM page will show the combined container memory usage.  If you have
>>> a significant difference between any or all of
>>>
>>> 1) actual process memory usage
>>> 2) JVM heap size
>>> 3) container maximum
>>>
>>> then you will have significant memory underutilization.
>>>
>>> -Ray
>>>
>>>
>>> On 20160813 6:31 AM, Suresh V wrote:
>>>
>>> Hello,
>>>
>>> In our cluster when a MR job is running, in the 'Nodes of the cluster'
>>> page, it shows the memory used as 84GB out of 87GB allocated to yarn
>>> nodemanagers.
>>> However when I actually do a top or free command while logged in to the
>>> node, it shows as only 23GB used and about 95GB or more free.
>>>
>>> I would imagine the memory used displayed in the Yarn web UI should
>>> match the memory used shown by top or free command on the node.
>>>
>>> Please advise if this is right thinking or am I missing something?
>>>
>>> Thank you
>>> Suresh.
>>>
>>>
>>>
>>>
>>

Re: MapReduce Job State: PREP over 8 hours, state no change

2016-08-08 Thread Ravi Prakash

That's unusual. Are you able to submit a simple sleep job? You can do this
using:

yarn jar
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar
sleep -m 1 -r 1

This should finish it in under a minute. Otherwise I'd suspect that your
cluster is misconfigured.

HTH
Ravi

On Fri, Aug 5, 2016 at 7:11 PM, Ascot Moss  wrote:

> Hi,
>
> I have submitted a mapreduce  job, and can find it from job list, however
> I find its STATE is PREP over last 8 hours, any idea why it takes so long
> to "PREP"?
>
> regards
>
>
>
> (mapred job -list)
>
>   JobId State StartTime UserName   Queue
> Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem   AM info
>
>  job_1470075140254_0003   PREP 1470402873895
>
>
>
>

Re: Node Manager crashes with OutOfMemory error

2016-07-26 Thread Ravi Prakash

Hi Rahul!

Which version of Hadoop are you using? What non-default values of
configuration are you setting?

You can set HeapDumpOnOutOfMemoryError on the command line while starting
up your nodemanagers and see the resulting heap dump in Eclipse MAT /
jvisualvm / yourkit to see where are the memory is being used. There is
likely some configuration that you may have set way beyond what you need.
We regularly run NMs with 1000Mb and it works fine.

HTH
Ravi

On Mon, Jul 25, 2016 at 11:05 PM, Rahul Chhiber <
rahul.chhi...@cumulus-systems.com> wrote:

> Hi All,
>
>
>
> I am running a Hadoop cluster with following configuration :-
>
>
>
> Master (Resource Manager) - 16GB RAM + 8 vCPU
>
> Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU
>
> Slave 2 (Node manager 2) - 8GB RAM + 4 vCPU
>
>
>
> Memory allocated for container use per slave  i.e.
> *yarn.nodemanager.resource.memory-mb* is 6144.
>
>
>
> When I launch an application, container allocation and execution is
> successful, but after executing 1 or 2 jobs on the cluster, either one or
> both the node manager daemons crash with the following error in logs :-
>
>
>
> “java.lang.OutOfMemoryError: Java heap space
>
> at java.util.Arrays.copyOf(Arrays.java:2367)
>
> at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
>
> at
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
>
> at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
>
> at java.lang.StringBuffer.append(StringBuffer.java:237)
>
> at org.apache.hadoop.util.Shell$1.run(Shell.java:511)
>
> 2016-07-22 06:54:54,326 INFO org.apache.hadoop.util.ExitUtil: Halt with
> status -1 Message: HaltException”
>
>
>
> We have allocated 1 GB of heap space for each node manager daemon. On
> average there are about 3 containers running on 1 slave node. We have been
> running Hadoop clusters for a while now, but haven’t faced this issue until
> recently. *What are the memory sizing recommendations for Nodemanager ?
> As per my understanding, the memory used by containers or by the
> Application master should not have any bearing on Node manager memory
> consumption, as they all run in separate JVMs. What could be the possible
> reasons for high memory consumption for the Node Manager*?
>
>
>
> NOTE :- I tried allocating more heap memory for Node manager (2 GB), but
> issue still occurs intermittently. Containers getting killed due to excess
> memory consumption is understandable but if Node manager crashes in this
> manner it would be a serious scalability problem.
>
>
>
> Thanks,
>
> Rahul Chhiber
>
>
>

Re: Where's official Docker image for Hadoop?

2016-07-20 Thread Ravi Prakash

Would something like this be useful as a starting point?
https://github.com/apache/hadoop/tree/trunk/dev-support/docker (this is
checked into apache/trunk)

The DockerContainerExecutor was an alpha feature that didn't really get
much traction and is not what you think it is. (If configured on the
cluster, it enables users to launch yarn applications that spawn docker
containers for tasks).

On Tue, Jul 19, 2016 at 5:05 PM, Klaus Ma  wrote:

> HI Deepak,
>
> This image still need to manually configure which did not meet the
> requirement. And I’d suggest Hadoop community provide a set of Dockerfile
> as example instead of vendor.
>
> And where’s the dockerfile in source code? Here’s the output of 2.7.2.
>
> Klauss-MacBook-Pro:hadoop-2.7.2-src klaus$ pwd
> /Users/klaus/Workspace/hadoop-2.7.2-src
> Klauss-MacBook-Pro:hadoop-2.7.2-src klaus$ find . | grep Dockerfile
> Klauss-MacBook-Pro:hadoop-2.7.2-src klaus$
>
> If any comments, please let me know.
>
> ——
> Da (Klaus) Ma (马达), PMP® | Software  Architect
> IBM Spectrum, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>
> On Jul 19, 2016, at 21:55, Deepak Vohra  wrote:
>
> Apache Hadoop develops the Hadoop software, not related technologies such
> as Docker image. But a Docker image could be developed using a Dockerfile
> that downloads and installs a Apache Hadoop distribution.
>
>
>

Re: Building a distributed system

2016-07-18 Thread Ravi Prakash

Welcome to the community Richard!

I suspect Hadoop can be more useful than just splitting and stitching back
data. Depending on your use cases, it may come in handy to manage your
machines, restart failed tasks, scheduling work when data becomes available
etc. I wouldn't necessarily count it out. I'm sorry I am not familiar with
celery, so I can't provide a direct comparison. Also, in the non-rare
chance that your input data grows, you wouldn't have to rewrite your
infrastructure code if you wrote your Hadoop code properly.

HTH
Ravi

On Mon, Jul 18, 2016 at 9:23 AM, Marcin Tustin 
wrote:

> I think you're confused as to what these things are.
>
> The fundamental question is do you want to run one job on sub parts of the
> data, then stitch their results together (in which case
> hive/map-reduce/spark will be for you), or do you essentially already have
> splitting to computer-sized chunks figured out, and you just need a work
> queue? In the latter case there are a number of alternatives. I happen to
> like python, and would recommend celery (potentially wrapped by something
> like airflow) for that case.
>
> On Mon, Jul 18, 2016 at 12:17 PM, Richard Whitehead <
> richard.whiteh...@ieee.org> wrote:
>
>> Hello,
>>
>> I wonder if the community can help me get started.
>>
>> I’m trying to design the architecture of a project and I think that using
>> some Apache Hadoop technologies may make sense, but I am completely new to
>> distributed systems and to Apache (I am a very experienced developer, but
>> my expertise is image processing on Windows!).
>>
>> The task is very simple: call 3 or 4 executables in sequence to process
>> some data.  The data is just a simple image and the processing takes tens
>> of minutes.
>>
>> We are considering a distributed architecture to increase throughput
>> (latency does not matter).  So we need a way to queue work on remote
>> computers, and a way to move the data around.  The architecture will have
>> to work n a single server, or on a couple of servers in a rack, or in the
>> cloud; 2 or 3 computers maximum.
>>
>> Being new to all this I would prefer something simple rather than
>> something super-powerful.
>>
>> I was considering Hadoop YARN and Hadoop DFS, does this make sense?  I’m
>> assuming MapReduce would be over the top, is that the case?
>>
>> Thanks in advance.
>>
>> Richard
>>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> 
> Latest news  at Handy
> Handy just raised $50m
> 
>  led
> by Fidelity
>
>

Re: New cluster help

2016-07-14 Thread Ravi Prakash

Hi Tombin!

Is this the first cluster you're ever setting up? Are you able to run an
"hdfs dfs -ls /" successfully? How about putting files into HDFS? I'd take
it one step at a time if I were you. i.e.

1. Set up a simple HDFS cluster (without SSL)
2. Turn on SSL
3. Then try to run HBase.

Is step 1 working for you?

Ravi

On Thu, Jul 14, 2016 at 12:59 PM, tombin  wrote:

> I am setting up a new hadoop cluster for the first time.  My setup
> currently looks as follows:
>
> hdfs cluster:
> 1 namenode
> 2 datanodes
>
> hbase:
> 1 hbase node
>
> zookeeper cluster:
> 3 zookeeper nodes
>
> I have enabled ssl on the hdfs cluster.  When trying to connect from base
> i see the following error:
>
> 2016-07-14 19:38:58,333 WARN  [Thread-73] hdfs.DFSClient: DataStreamer
> Exception
>
> java.lang.NullPointerException
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferEncryptor.getEncryptedStreams(DataTransferEncryptor.java:191)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1335)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1281)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
>
> 2016-07-14 19:39:04,341 INFO  [hb01:16000.activeMasterManager]
> hdfs.DFSClient: Could not complete /hbase/.tmp/hbase.version retrying...
>
>
>
> this will repeat several time and then i'll throw the following exception:
>
>
> 2016-07-14 19:39:58,772 FATAL [hb01:16000.activeMasterManager]
> master.HMaster: Unhandled exception. Starting shutdown.
>
> java.io.IOException: Unable to close file because the last block does not
> have enough number of replicas.
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>
> at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:730)
>
> at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:705)
>
> at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:662)
>
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:462)
>
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
>
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:128)
>
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:652)
>
> at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:185)
>
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1750)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> hbase shuts down at this point.
>
>
> on the datanode i side i see the following in the logs that looks like it
> may be related:
>
>
> 2016-07-14 19:38:23,132 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hd03.domain.com:50010:DataXceiver
> error processing unknown operation  src: /10.0.0.10:34893 dst: /
> 10.0.1.10:50010
>
> java.io.EOFException
>
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:358)
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getEncryptedStreams(SaslDataTransferServer.java:178)
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:110)
>
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:193)
>
> at java.lang.Thread.run(Thread.java:745)
>
> 2016-07-14 19:39:33,575 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hd03.domain.com:50010:DataXceiver
> error processing unknown operation  src: /10.0.0.10:34898 dst: /
> 10.0.1.10:50010
>
> java.io.EOFException
>
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:358)
>
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getEncryptedStreams(SaslDataTransferServer.java:178)
>
> Is this in relation to my ssl configuration ?
> I'm confused about whats going on here.  Thank you in advance for any help.
>

Re: unsubscribe

2016-07-05 Thread Ravi Prakash

Please send an email to user-unsubscr...@hadoop.apache.org

On Wed, Jun 29, 2016 at 8:02 AM, Bob Krier  wrote:

>
>

Re: unsubscribe

2016-07-05 Thread Ravi Prakash

Please send an email to user-unsubscr...@hadoop.apache.org

On Wed, Jun 29, 2016 at 8:04 AM, Mike Rapuano 
wrote:

>
>
> --
>
>
> Michael Rapuano
>
> Dev/Ops Engineer
>
> 617-498-7800 | 617-468-1774
>
> 25 Drydock Ave
>
> Boston, MA 02210
>
> 
>
>  
> 
>

Re: Usage of data node to run on commodity hardware

2016-06-07 Thread Ravi Prakash

Hi Krishna!

I don't see why you couldn't start Hadoop in this configuration.
Performance would obviously be suspect. Maybe by configuring your network
toppology script, you could even improve the performance.

Most mobiles are ARM processor. I know some cool people ran Hadoop v1 on
Raspberry Pis (also ARM), but I don't know if Hadoop's performance
optimized native code has been run successfully on ARM. (Hadoop will use
the native binaries if they are available, otherwise fall back on JAVA
implementations)

HTH
Ravi

On Mon, Jun 6, 2016 at 7:41 PM, Krishna <
ramakrishna.srinivas.mur...@gmail.com> wrote:

> Hi All,
>
> I am new to hadoop and I am thinking of requirement  don't know whether it
> is feasible or not. I want to run hadoop on non-cluster environment means I
> want to run it on commodity hardware. I have one desktop machine with
> higher CPU and memory configuration, and i have close to 20 laptops and all
> are connected in same network through wire or wireless connection. I want
> to use desktop machine as name node and 20 laptop as data nodes.  Will that
> be possible?
>
> Extend to it is there any requirement for data node in terms of system
> configuration? Now days mobiles are also coming with good RAM and CPU, can
> we use mobiles as a data node provided Java is installed in mobile?
>
> Thanks
> Ramakrishna S
>

Re: HDFS in Kubernetes

2016-06-06 Thread Ravi Prakash

Klaus!

Good luck with your attempt to run HDFS inside Kubernetes! Please keep us
posted.

For creating a new file, a DFSClient :
1. First calls addBlock on the NameNode.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L842
. This returns a list of "LocatedBlocks" . (This is essentially a list of
datanode storages to which the client should write)
2. The DFSClient then creates a pipeline to which it streams data
consisting of the (usually 3) datanodes.

In your case, when you go to the Namenode Web UI
(http://:50070/dfshealth.html#tab-datanode)
what is the Datanode's ID? You should debug the client and namenode to see
what is the list of LocatedBlocks returned by the addBlock call.

HTH
Ravi

On Sat, Jun 4, 2016 at 7:22 AM, Klaus Ma  wrote:

> Hi team,
>
>
> I'm working to run HDFS in kubernetes; all configuration is ready:
> kube-dns, hdfs-site.xml and ssh. But when I create files in HDFS I got the
> following exception. In exception, "10.0.1.126:50010" is the host's ip &
> port instead of container; is there any configuration to ask DFSClient to
> use container's IP instead of host IP?
>
>
> 16/06/04 14:06:23 INFO hdfs.DFSClient: Exception in createBlockOutputStream
>
> java.net.ConnectException: Connection refused
>
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
>
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1537)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1313)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> 16/06/04 14:06:23 INFO hdfs.DFSClient: Abandoning
> BP-223491250-172.1.78.2-1465048638628:blk_1073741825_1001
>
> 16/06/04 14:06:23 INFO hdfs.DFSClient: Excluding datanode
> DatanodeInfoWithStorage[10.0.1.126:50010
> ,DS-a2c2d3db-790c-4b76-81f6-856c809b01e2,DISK]
>
> 16/06/04 14:06:23 WARN hdfs.DFSClient: DataStreamer Exception
>
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/root/QuasiMonteCarlo_1465049182505_2071986941/in/part0 could only be
> replicated to 0 nodes instead of minReplication (=1).  There are 1
> datanode(s) running and 1 node(s) are excluded in this operation.
>
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
>
> 
> Da (Klaus), Ma (马达) | PMP | Advisory Software Engineer
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>

Re: HDFS Federation

2016-06-06 Thread Ravi Prakash

Perhaps use the "viewfs://" protocol prepended to your path?


On Sun, Jun 5, 2016 at 1:10 PM, Kun Ren  wrote:

> Hi Genius,
>
> I just configured HDFS Federation, and try to use it(2 namenodes, one is
> for /my, another is for /your). When I  run the command:
> hdfs dfs -ls /,
>
> I can get:
> -r-xr-xr-x   - hadoop hadoop  0 2016-06-05 20:05 /my
> -r-xr-xr-x   - hadoop hadoop  0 2016-06-05 20:05 /your
>
> This makes sense. However, when I run the command to create a new
> directory:
> hdfs dfs -mkdir /my/test
>
> I got error:
> mkdir: `/my/test': No such file or directory.
>
> Even when I run "hdfs dfs -ls /my", still get the no such file or
> directory error.
>
> Does someone tell me how to use the command line to do the file operations
> with HDFS Federation? Thanks a lot for your help.
>
> I attached My core-site.xml and hdfs-site.xml:
>
> Core-site.xml:
>
> 
>
>
>
>fs.defaultFS
>
>viewfs:///
>
>
>
>
>
>fs.viewfs.mounttable.default.link./my
>
>hdfs://Master:9000/my
>
>
>
>
>
>fs.viewfs.mounttable.default.link./your
>
>hdfs://Slave1:9000/your
>
>
>
>
>
>hadoop.tmp.dir
>
>file:/home/hadoop/hadoop_build/tmp
>
>
> 
>
>
>
> Hdfs-site.xml:
>
> 
>
> 
>
>dfs.replication
>
>1
>
> 
>
> 
>
>dfs.namenode.name.dir
>
>file:/home/hadoop/hadoop_build/tmp/dfs/name
>
> 
>
> 
>
>dfs.datanode.data.dir
>
>file:/home/hadoop/hadoop_build/tmp/dfs/data
>
> 
>
>
>
>dfs.federation.nameservices
>
>mycluster,yourcluster
>
>
>
>
>
>dfs.namenode.rpc-address.mycluster
>
>Master:9000
>
>
>
>
>
>dfs.namenode.rpc-address.yourcluster
>
>Slave1:9000
>
>
>
>
>
>dfs.namenode.http-address.mycluster
>
>Master:50070
>
>
>
>
>
>dfs.namenode.http-address.yourcluster
>
>Slave1:50070
>
>
>
> 
>

Re: No edits files in dfs.namenode.edits.dir

2016-05-19 Thread Ravi Prakash

No! You are probably writing the edits file somewhere still. An `lsof` on
the namenode process may be more revealing. Obviously this depends on
configuration, but unless you have some really crazy settings, I'm pretty
sure the edits would be persisted to disk.

On Wed, May 18, 2016 at 2:47 AM, sky88088  wrote:

> Hi,
> I found that there is no edits file in my cluster's dfs.namenode.edits.dir.
>
>
> It is a 20-nodes cluster without secondary namenode, and it runs well for
> a long while (more than one year).
>
> However, it's found that no edits file in the dfs.namenode.edits.dir and
> the the fsimage file doesn't update.
>
> Is it meaning that all the metadata of name node only stay in the memory?
>
> Is there a way to fix it? Is there any configuration to control the
> persistence?
>
> Thanks!
>

Re: Regarding WholeInputFileFormat Java Heap Size error

2016-05-12 Thread Ravi Prakash

Shubh! You can perhaps introduce an artificial delay in your map task and
then take a JAVA heap dump of the MapTask JVM to analyze where the memory
is going. Its hard to speculate otherwise.

On Wed, May 11, 2016 at 10:15 PM, Shubh hadoopExp 
wrote:

>
>
>
> Hi All,
>
> While reading input from directory recursively consisting of files of size
> 30Mb, using WholeFileInputFormat and WholeFileRecordReader, I am running
> into JavaHeapSize error for even a very small file of 30MB. By default the
> *mapred.child.java.opts* is set to -*Xmx200m* and should be sufficient
> enough to run atleast 30MB files present in the directory.
>
> The input is a normal random words in file. Each Map is given a single
> file of size 30MB and I am reading value as the content of the whole file.
> And running normal word count.
>
> If I increase the *mapred.child.java.opts *size to higher value the
> applications runs successfully. But it would be great if anyone can suggest
> me why *mapred.child.java.opts*  which is currently 200Mb default for
> task is not sufficient for 30 MB file, as this means Hadoop MapReduce is
> consuming a lot of heap size and out of 200MB it doesn't even use 30Mb to
> process the task? Also, is there any other way to read the a large Whole
> file as a input to a single Map, meaning every Map gets a whole file to
> process?
>
> -Shubh
>
>
>

Re: Upgrading production hadoop system from 2.5.1 to 2.7.2

2016-03-24 Thread Ravi Prakash

Hi Chathuri!

You're welcome! We did not have an HBase instance to upgrade. It depends on
how many blocks your datanodes are storing (== how big your disks are * how
many disks you have * how full your disks are). What are those numbers for
you? We experienced anywhere from 1-3 hours for the upgrade.

HTH
Ravi

On Thu, Mar 24, 2016 at 1:16 AM, Chathuri Wimalasena <kamalas...@gmail.com>
wrote:

> Hi Ravi,
>
> Thank you for all the information, Our application is indexing twitter
> data to HBase and then do some data analytics on top of that. That's why
> HDFS data is very important to us. We cannot tolerate any data loss with
> the update. Do you remember how long it took for you to upgrade it from
> 2.4.1 to 2.7.1 ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 23, 2016 at 7:09 PM, Ravi Prakash <ravihad...@gmail.com>
> wrote:
>
>> Hi Chathuri!
>>
>> Technically there is a rollback option during upgrade. I don't know how
>> well it has been tested, but the idea is that old metadata is not deleted
>> until the cluster administrator says $ hdfs dfsadmin -finalizeUpgrade . I'm
>> fairly confident that the HDFS upgrade will work smoothly. We have upgraded
>> quite a few Hadoop-2.4.1 clusters to Hadoop-2.7.1 successfully (never
>> having to roll back). Its your applications that work on top of HDFS and
>> YARN that I'd be concerned about.
>>
>> HTH
>> Ravi
>>
>> On Wed, Mar 23, 2016 at 2:22 PM, Chathuri Wimalasena <
>> kamalas...@gmail.com> wrote:
>>
>>> Thanks for information Ravi. Is there a way that I can back up data
>>> before the  update ? I was thinking about this approach..
>>>
>>> Copy the current hadoop directories to a new set of directories.
>>> Point hadoop to this new set
>>> Start the migration with the backup set
>>>
>>> Please let me know if people have done this upgrade successfully. I
>>> believe many things can go wrong in a lengthy upgrade like this. The data
>>> in the cluster is very important.
>>> Thanks,
>>> Chathuri
>>>
>>> On Wed, Mar 23, 2016 at 4:37 PM, Ravi Prakash <ravihad...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chathuri!
>>>>
>>>>- When we upgrade, does it change the namenode data structures and
>>>>data nodes? I assume it only changes the name node...
>>>>
>>>> It changes the NN as well as DN layout. As a matter of fact, this
>>>> upgrade will take a long time on Datanodes as well because of
>>>> https://issues.apache.org/jira/browse/HDFS-6482
>>>>
>>>>- What are the risks with this upgrade ?
>>>>
>>>> What Hadoop applications do you run on top of your cluster? The hope is
>>>> that everything continues working smoothly for the most part, but
>>>> inevitably some backward incompatible changes creep in.
>>>>
>>>>- Is there a place where I can review the changes made to file
>>>>system from 2.5.1 to 2.7.2?
>>>>
>>>> The release notes. http://hadoop.apache.org/releases.html .You'd have
>>>> to accumulate all the changes in the versions.
>>>>
>>>> Practically, I'd try to run my application on your upgraded test
>>>> cluster.
>>>>
>>>> HTH
>>>>
>>>> Ravi
>>>>
>>>> On Wed, Mar 23, 2016 at 12:17 PM, Chathuri Wimalasena <
>>>> kamalas...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a hadoop production deployment with 1 name node and 10 data
>>>>> nodes which has more than 20TB of data in HDFS. We are currently using
>>>>> Hadoop 2.5.1 and we want to update it to latest Hadoop version, 2.7.2.
>>>>>
>>>>> I followed the following link (
>>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html)
>>>>> and updated a single node system running in pseudo distributed mode and it
>>>>> went without any issues. But this system did not have that much data as 
>>>>> the
>>>>> production system.
>>>>>
>>>>> Since this is a production system, I'm reluctant to do this update. I
>>>>> would like to see what other people have done in these cases and their
>>>>> experiences... Here are few questions I have..
>>>>>
>>>>>- When we upgrade, does it change the namenode data structures and
>>>>>data nodes? I assume it only changes the name node...
>>>>>- What are the risks with this upgrade ?
>>>>>- Is there a place where I can review the changes made to file
>>>>>system from 2.5.1 to 2.7.2?
>>>>>
>>>>> I would really appreciate if you can share your experiences.
>>>>>
>>>>> Thanks in advance,
>>>>> Chathuri
>>>>>
>>>>
>>>>
>>>
>>
>

Re: INotify stability

2015-09-16 Thread Ravi Prakash

Hi Mohammad!
Thanks for reporting the issue. Could you please take a heap dump of the NN and 
analyze it to see where the memory is being spent?
ThanksRavi



 On Tuesday, September 15, 2015 11:53 AM, Mohammad Islam 
 wrote:
   

 Hi,We were using INotify feature in one of our internal service. Looks like it 
creates a lot of memory pressure on NN. Memory usage goes very high and remains 
the same causing expensive GC.
Did anyone use this feature in any service? Is there any con to setup? We are 
using latest CDH. 
Regards,Mohammad

Fw: important message

2015-09-14 Thread Ravi Prakash

Hey friend!

 

Check it out http://isttp.org/necessary.php?1o7f0

 

Ravi Prakash

Fw: important message

2015-09-14 Thread Ravi Prakash

Hey friend!

 

Check it out http://xecuuho.net/sent.php?dtya

 

Ravi Prakash

Fw: important

2015-09-08 Thread Ravi Prakash

Hello!

 

Important message, visit http://schevelyura.ru/written.php?id6

 

Ravi Prakash

Re: hdfs: weird lease expiration issue

2015-08-21 Thread Ravi Prakash

Hi Bogdan!
This is because the second application attempt appears to HDFS as a new client. 
Are you sure the second client experienced write errors because *its* lease was 
removed?
Yongjun has a great writeup : 
http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
 (Thanks Yongjun). To quote
The lease manager maintains a soft limit (1 minute) and hard limit (1 hour) 
for the expiration time (these limits are currently non-configurable), and all 
leases maintained by the lease manager abide by the same soft and hard limits. 
Before the soft limit expires, the client holding the lease of a file has 
exclusive write access to the file. If the soft limit expires and the client 
has not renewed the lease or closed the file (the lease of a file is released 
when the file is closed), another client can forcibly take over the lease. If 
the hard limit expires and the client has not renewed the lease, HDFS assumes 
that the client has quit and will automatically close the file on behalf of the 
client, thereby recovering the lease.
HTHRavi

 


 On Friday, August 21, 2015 10:05 AM, Bogdan Raducanu lrd...@gmail.com 
wrote:
   

 I have an application that continuously appends to an hdfs file and keeps it 
open a long time.At some point the application crashed and left the file 
open.It was then restarted and it resumed normal operation, completing some 
writes (appending to the file). But, an hour after the crash it experienced 
write errors because its lease was removed.Digging in the NN log I found this 
weird behavior.
Events timeline:
1. 15:25: application crashes
2. 15:28: application restarted, writing doesn't start immediately
3. 15:37 first write
4. some more writes
5. new block needed: 2015-08-11 15:52:59,223 INFO 
org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /path/to/my/file. ... 
blk_1079708083_9361735{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]...|RBW], 
ReplicaUnderConstruction[[DISK]...|RBW], 
ReplicaUnderConstruction[[DISK]...|RBW]]}
6. some more writes; application uses hsync so we can see the writes in the nn 
log: 2015-08-11 15:52:59,234 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
fsync: /path/to/my/file for DFSClient_NONMAPREDUCE_-1953764790_1
7. 60 minutes after crash: 2015-08-11 16:25:18,397 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_NONMAPREDUCE_830713991_1, pendingcreates: 1] has expired hard limit
8. 2015-08-11 16:25:18,398 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_NONMAPREDUCE_830713991_1, pendingcreates: 1], 
src=/path/to/my/file
9. 2015-08-11 16:25:18,398 INFO BlockStateChange: BLOCK* 
blk_1079708083_9361735{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, 
replicas=[ReplicaUnderConstruction[[DISK]...|RBW], 
ReplicaUnderConstruction[[DISK]...|RBW], 
ReplicaUnderConstruction[[DISK]...|RBW]]} recovery started, 
primary=ReplicaUnderConstruction[[DISK]...|RBW]
10. 2015-08-11 16:25:18,398 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.internalReleaseLease: File /path/to/my/file has not been closed. 
Lease recovery is in progress. RecoveryId = 9361840 for block 
blk_1079708083_9361735{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, 
replicas=...
So, somehow the crashed client's lease remained and an hour after the crash it 
was removed.This happened even though during this hour another client obtained 
a lease and appended to the file. Also, there is no startFile: recover lease 
log message when the new client opens the file. It is like the old lease is not 
seen until the 1 hour hard limit expires.Any idea how this could happen? This 
is on a distribution based on 2.6.0, with HA

Re: Unable to pass complete tests on 2.7.1

2015-08-17 Thread Ravi Prakash

Hi Tucker!
Sadly, unit tests failing is usual for hadoop builds. You can use -DskipTests 
to build without running unit tests, or -fn (fail-never) to continue despite 
failures.
The maven-plugin helps us manage generated source code (e.g. protobuf files 
generate more java files which need to be compiled).
HTH
Ravi
 


 On Monday, August 17, 2015 5:26 PM, Tucker Berckmann 
tucker.berckm...@scaleflux.com wrote:
   

 Hello,

I am trying to build the Apache Hadoop 2.7.1 release from source on a 
clean Ubuntu 14.04 system, but the unit tests are failing (see command 
line 11 below).

Any help would be appreciated.

Also, I do not understand why I had to install the plugins (see command 
line 8) in order to get a successful compilation. Note that the 
compilation fails in (6) but passes in (10).

Thanks and regards,

Tucker

Command Line

(1) hadoop@hadoop-testnode:~$ lsb_release -a

No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:    trusty

(2) hadoop@hadoop-testnode:~$ bash install_part1.sh

( lots of output)

(3) hadoop@hadoop-testnode:~$ java -version

java version 1.7.0_80
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

(4) hadoop@hadoop-testnode:~$ bash install_part2.sh

( lots of output )

(5) hadoop@hadoop-testnode:~$ cd hadoop-2.7.1-src/

(6) hadoop@hadoop-testnode:~/hadoop-2.7.1-src$ mvn compile

( lots of output )

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main  SUCCESS [1.960s]
[INFO] Apache Hadoop Project POM . SUCCESS [1.276s]
[INFO] Apache Hadoop Annotations . SUCCESS [3.446s]
[INFO] Apache Hadoop Project Dist POM  SUCCESS [0.050s]
[INFO] Apache Hadoop Assemblies .. SUCCESS [0.054s]
[INFO] Apache Hadoop Maven Plugins ... SUCCESS 
[4:41.806s]
[INFO] Apache Hadoop MiniKDC . SUCCESS 
[1:57.117s]
[INFO] Apache Hadoop Auth  SUCCESS 
[2:14.411s]
[INFO] Apache Hadoop Auth Examples ... SUCCESS [0.245s]
[INFO] Apache Hadoop Common .. FAILURE [0.003s]

( some more output )

[ERROR] Failed to parse plugin descriptor for 
org.apache.hadoop:hadoop-maven-plugins:2.7.1 
(/home/hadoop/hadoop-2.7.1-src/hadoop-maven-plugins/target/classes): No 
plugin descriptor found at META-INF/maven/plugin.xml - [Help 1]

(7) hadoop@hadoop-testnode:~/hadoop-2.7.1-src$ cd hadoop-maven-plugins/

(8) hadoop@hadoop-testnode:~/hadoop-2.7.1-src/hadoop-maven-plugins$ mvn 
install

( lots of output )

(9) hadoop@hadoop-testnode:~/hadoop-2.7.1-src/hadoop-maven-plugins$ cd ..

(10) hadoop@hadoop-testnode:~/hadoop-2.7.1-src$ mvn compile

( lots of output, completes successfully )

(11) hadoop@hadoop-testnode:~/hadoop-2.7.1-src$ mvn test

( lots of output )

Tests in error:
  TestHttpServer.cleanup:151 NullPointer
TestHttpServerWebapps.testValidServerResource:41-HttpServerFunctionalTest.createServer:156
 
» FileNotFound
  TestSSLHttpServer.setup:75 » FileNotFound webapps/test not found in 
CLASSPATH
  TestSSLHttpServer.cleanup:96 NullPointer
  TestHttpCookieFlag.setUp:99 » FileNotFound webapps/test not found in 
CLASSPATH
  TestHttpCookieFlag.cleanup:147 NullPointer
  TestJMXJsonServlet.cleanup:46 NullPointer

Tests run: 2877, Failures: 15, Errors: 7, Skipped: 186

[INFO] 

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main  SUCCESS [0.200s]
[INFO] Apache Hadoop Project POM . SUCCESS [0.693s]
[INFO] Apache Hadoop Annotations . SUCCESS [0.390s]
[INFO] Apache Hadoop Project Dist POM  SUCCESS [0.074s]
[INFO] Apache Hadoop Assemblies .. SUCCESS [0.059s]
[INFO] Apache Hadoop Maven Plugins ... SUCCESS [0.766s]
[INFO] Apache Hadoop MiniKDC . SUCCESS [12.461s]
[INFO] Apache Hadoop Auth  SUCCESS 
[4:13.099s]
[INFO] Apache Hadoop Auth Examples ... SUCCESS [0.059s]
[INFO] Apache Hadoop Common .. FAILURE 
[15:33.867s]



install_part1.sh

rm -f hadoop-2.7.1-src.tar.gz
wget 
http://mirror.cogentco.com/pub/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz
tar xf hadoop-2.7.1-src.tar.gz
sudo apt-get purge openjdk*
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

install_part2.sh

sudo apt-get -y install maven
sudo apt-get -y install build-essential autoconf automake libtool cmake 
zlib1g-dev pkg-config libssl-dev
sudo apt-get -y install libprotobuf-dev protobuf-compiler
sudo apt-get install

Re: Documentation inconsistency about append write in HDFS

2015-08-03 Thread Ravi Prakash

Thanks Thanh! Yes! Could you please post a patch?
 


 On Sunday, August 2, 2015 8:50 PM, Thanh Hong Dai hdth...@tma.com.vn 
wrote:
   

 !--#yiv919757 _filtered #yiv919757 {font-family:MS 
Mincho;panose-1:2 2 6 9 4 2 5 8 3 4;} _filtered #yiv919757 
{font-family:Cambria Math;panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered 
#yiv919757 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered 
#yiv919757 {panose-1:2 2 6 9 4 2 5 8 3 4;}#yiv919757 #yiv919757 
p.yiv919757MsoNormal, #yiv919757 li.yiv919757MsoNormal, 
#yiv919757 div.yiv919757MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:Calibri, 
sans-serif;}#yiv919757 a:link, #yiv919757 
span.yiv919757MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv919757 a:visited, 
#yiv919757 span.yiv919757MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv919757 
span.yiv919757EmailStyle17 {font-family:Calibri, 
sans-serif;color:windowtext;}#yiv919757 .yiv919757MsoChpDefault 
{font-family:Calibri, sans-serif;} _filtered #yiv919757 {margin:72.0pt 
72.0pt 72.0pt 72.0pt;}#yiv919757 div.yiv919757WordSection1 {}--In the 
latest version of the documentation 
(http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Simple_Coherency_Model
 and also documentation for version 2.x), it’s mentioned that “A file once 
created, written, and closed need not be changed. “ and “There is a plan to 
support appending-writes to files in the future.”   However, as far as I know, 
HDFS has supported append write since 0.21, based on this JIRA 
(https://issues.apache.org/jira/browse/HDFS-265) and the old version of the 
documentation in 2012 
(https://web.archive.org/web/20121221171824/http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html#Appending-Writes+and+File+Syncs)
   Various posts on the Internet also suggests that append write has been 
available in HDFS, and will always be available in Hadoop version 2 branch.  
Can we update the documentation to reflect the most recent change? (Or will 
append write be deprecated or is it not ready for production use?)

Re: Web based file manager for HDFS?

2015-07-27 Thread Ravi Prakash

Hi Caesar!
I'm going to try to get that functionality as part of [HDFS-7588] Improve the 
HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading 
files - ASF JIRA in the next 2 months.
Ravi

|   |
|   |   |   |   |   |
| [HDFS-7588] Improve the HDFS Web UI browser to allow chowning / chmoding, 
creating dirs and uploadin...The new HTML5 web browser is neat, however it 
lacks a few features that might make it more useful: 1. chown 2. chmod 3. 
Uploading files 4. mkdir  |
|  |
| View on issues.apache.org | Preview by Yahoo |
|  |
|   |

 


 On Wednesday, July 22, 2015 4:16 PM, Tatsuo Kawasaki tat...@cloudera.com 
wrote:
   

 Hi Caesar,

Let's try Hue if you can use WebHDFS or HttpFs.http://gethue.com
Hue has a web based file manager and Hive/Impala query editor, etc.
Thanks,-- Tatsuo 
2015/07/23 3:22、Caesar Samsi cmsa...@hotmail.com のメッセージ:


!--#yiv3947760316 _filtered #yiv3947760316 {font-family:Calibri;panose-1:2 15 
5 2 2 2 4 3 2 4;}#yiv3947760316 #yiv3947760316 p.yiv3947760316MsoNormal, 
#yiv3947760316 li.yiv3947760316MsoNormal, #yiv3947760316 
div.yiv3947760316MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:Calibri, 
sans-serif;}#yiv3947760316 a:link, #yiv3947760316 
span.yiv3947760316MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3947760316 a:visited, #yiv3947760316 
span.yiv3947760316MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3947760316 
span.yiv3947760316EmailStyle17 {font-family:Calibri, 
sans-serif;color:windowtext;}#yiv3947760316 .yiv3947760316MsoChpDefault 
{font-family:Calibri, sans-serif;} _filtered #yiv3947760316 {margin:1.0in 
1.0in 1.0in 1.0in;}#yiv3947760316 div.yiv3947760316WordSection1 {}--Hello,  
I’m looking for a web based file manager, simple enough to upload and download 
files (text and binary).  I would appreciate if you have pointers to one.  I’m 
running Hadoop HDFS (i.e. non CDH or Hortonworks which I understand have it).  
Thank you, Caesar.

Re: YARN and LinuxContainerExecutor in simple security mode

2015-07-06 Thread Ravi Prakash

Hi Tomasz!
I believe that's true. 

Ravi 


 On Tuesday, June 30, 2015 4:56 AM, Tomasz Fruboes 
tomasz.frub...@fuw.edu.pl wrote:
   

 Dear Ravi,

  thanks for answer. I went through the discussion in the ticket you 
mention and did some experimentation. My understanding is the following 
- as long as I dont explicitly allow for this using

  hadoop.proxyuser.username.groups
  hadoop.proxyuser.username.hosts

user processes spawned by yarn on worknodes will always run with the uid 
of that user. Is that right?

  Thanks,
  Tomasz




W dniu 29.06.2015 o 21:43, Ravi Prakash pisze:
 Hi Tomasz!

 It is tricky to set up, but there are no implications to security if you
 configure it correctly. Please read the discussion on [YARN-2424] LCE
 should support non-cgroups, non-secure mode - ASF JIRA
 https://issues.apache.org/jira/browse/YARN-2424

 HTH
 Ravi
     
     
     
     
 [YARN-2424] LCE should support non-cgroups, non-secure mode - ASF JIRA
 https://issues.apache.org/jira/browse/YARN-2424
 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.
 View on issues.apache.org https://issues.apache.org/jira/browse/YARN-2424
     
 Preview by Yahoo






 On Thursday, June 25, 2015 2:30 AM, Tomasz Fruboes
 tomasz.frub...@fuw.edu.pl wrote:


 Dear Experts,

    I'm running a small YARN cluster configured to use simple security,
 LinuxContainerExecutor and


 yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false

    in order to get correct uid when executing jobs. This is needed to
 access files from network exported filesystem.

    I was wondering - does this posses any security risk (since
 nonsecure-mode.limit is set to true by default in the simple security
 mode)? I.e. is there a known way for a user to get uid of different user
 with such configuration?

    Cheers,
      Tomasz

Re: YARN and LinuxContainerExecutor in simple security mode

2015-06-29 Thread Ravi Prakash

Hi Tomasz!
It is tricky to set up, but there are no implications to security if you 
configure it correctly. Please read the discussion on [YARN-2424] LCE should 
support non-cgroups, non-secure mode - ASF JIRA 

HTH
Ravi

|   |
|   |   |   |   |   |
| [YARN-2424] LCE should support non-cgroups, non-secure mode - ASF JIRAAfter 
YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  |
|  |
| View on issues.apache.org | Preview by Yahoo |
|  |
|   |






 On Thursday, June 25, 2015 2:30 AM, Tomasz Fruboes 
tomasz.frub...@fuw.edu.pl wrote:
   

 Dear Experts,

  I'm running a small YARN cluster configured to use simple security, 
LinuxContainerExecutor and

  yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false

  in order to get correct uid when executing jobs. This is needed to 
access files from network exported filesystem.

  I was wondering - does this posses any security risk (since 
nonsecure-mode.limit is set to true by default in the simple security 
mode)? I.e. is there a known way for a user to get uid of different user 
with such configuration?

  Cheers,
    Tomasz

Re: Web Address appears to be ignored

2015-05-19 Thread Ravi Prakash

Ewan!
This sounds like a bug. Please open a JIRA.
ThanksRavi
 


 On Tuesday, May 19, 2015 8:09 AM, Ewan Higgs ewan.hi...@ugent.be wrote:
   

 Hi all,
I am setting up a Hadoop cluster where the nodes have FQDNames inside 
the cluster, but the DNS where these names are registered is behind some 
login nodes. So any user who tries to access the web interface needs to 
use the IPs instead.

I set the 'yarn.nodemanager.webapp.address' and 
'yarn.resourcemanager.webapp.address' to the appropriate IP:port. I 
don't give it the FQDN in this config field.

However, when I access the web app it all works inside each web app. 
However, when I cross from the Resource Manager to the Node Manager web 
app, the href url uses the FQDN that I don't want. Obviously this is a 
dead link to the user and can only be fixed if they copy and paste the 
appropriate IP address for the node (not a pleasant user experience).

Is there a way to convince the web app to not use the FQDN or is this a 
potential bug? Or maybe this will end up as WONTFIX - open up your DNS.

Yours,
Ewan

1 2 >

1 - 100 of 187 matches

Mail list logo