from:"Suresh Srinivas"

Re: HDFS Append Problem

2015-03-05 Thread Suresh Srinivas

Please take this up CDH mailing list.

From: Molnár Bálint molnarcsi...@gmail.com
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will 
come periodically during the days, so I cannot use the factory writer because 
it will create a lot of small MapFiles. (I want to store these files in the 
HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the 
org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My 
problem appears in the upload phase. When I try to upload a set (1GB) of small 
files, the free space of the HDFS decreases fast. The program only uploads 
400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 
minutes, the HDFS goes back to normal size (500MB), and none of my files are 
lost. If I don't terminate the upload, the HDFS goes out of free space and the 
program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 
1.

Any ideas how to solve this issue?

Thanks

Re: Error while executing command on CDH5

2015-03-04 Thread Suresh Srinivas

Can you please use CDH mailing listd for this question?

From: SP sajid...@gmail.com
Sent: Wednesday, March 04, 2015 11:00 AM
To: user@hadoop.apache.org
Subject: Error while executing command on CDH5

Hello All,

Why am I getting this error every time I execute a command. It was working fine 
with CDH4 version. When I upgraded to CDH5 version this message started showing 
up.

does any one have resolution for this error

sudo -u hdfs hadoop fs -ls /
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Found 1 items
drwxrwxrwt   - hdfs hadoop  0 2015-03-04 10:30 /tmp

Thanks
SP

Re: DFS Used V/S Non DFS Used

2014-10-10 Thread Suresh Srinivas

Here is the information from -
https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories -
Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space.
Otherwise this is the size by which temporary files exceed the reserved
space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel manojsamelt...@gmail.com
wrote:

Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1
and /disk2. And each of these disk has a directory for data node and a
directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says /disk1/datanode,/disk2/datanode.

With this, what does the DFS and NonDFS indicates? Does it indicates
SUM(/disk*/datanode) SUM(/disk*/non-datanode) etc. resp. ?

Thanks,

--
http://hortonworks.com/download/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

Re: Significance of PID files

2014-07-07 Thread Suresh Srinivas

When a daemon process is started, the process ID of the process is captured
in a pid file. It is used for following purposes:
- During a daemon startup, the existence of pid file is used to determine
that the process is already running.
- When a daemon is stooped, hadoop scripts sends kill TERM signal  to the
process ID captured in pid file for graceful shutdown. After a timeout, if
the process still exists, kill -9 is sent for forced shutdown.

For more details, see the relevant code in
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh




On Fri, Jul 4, 2014 at 10:00 AM, Vijaya Narayana Reddy Bhoomi Reddy 
vijay.bhoomire...@gmail.com wrote:

 Hi,

 Can anyone please explain the significance of the pid files in Hadoop i.e.
 purpose and usage etc?

 Thanks  Regards
 Vijay




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: hadoop 2.2.0 HA: standby namenode generate a long list of loading edits

2014-06-11 Thread Suresh Srinivas

Henry,

I suspect this is what is happening. On active namenode, oncethe  existing
set of editlogs during startup are loaded, it becomes active and from then
it has no need to load any more edits. It only generates edits. On the
other hand, standby namenode not only loads the edits during startup, it
also continuously loads the edits being generated by the active. Hence the
difference.

Regards,
Suresh


On Wed, Jun 11, 2014 at 7:49 PM, Henry Hung ythu...@winbond.com wrote:

  Hi All,



 I’m using QJM with 2 namenodes, in the active namenode, the main page’s
 loading edits panel only show 10 records, but in standby namenode, the
 loading edits panel show a lot more records, never count it, but I think it
 has  100 records.

 Is this a problem?



 Here I provide some of the data:




 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1080storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1361storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1830storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000140storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10001638storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002099storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002359storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000332storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000421storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005210storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005529storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000577storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005831storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005951storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006089storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006154storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006291storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006482storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec





 Best regards,

 Henry

 --
 The privileged confidential information contained in this email is
 intended for use only by the addressees as indicated by the original sender
 of this email. If you are not the addressee indicated in this email or are
 not responsible for delivery of the email to such a person, please kindly
 reply to the sender indicating this fact and delete all copies of it from
 your computer and network server immediately. Your cooperation is highly
 appreciated. It is advised that any unauthorized use of confidential
 information of Winbond is strictly prohibited; and any information in this
 email irrelevant to the official business of Winbond shall be deemed as
 neither given nor endorsed by Winbond.

Re: hadoop 2.2.0 HA: standby namenode generate a long list of loading edits

2014-06-11 Thread Suresh Srinivas

On Wed, Jun 11, 2014 at 8:27 PM, Henry Hung ythu...@winbond.com wrote:

  @Suresh,



 Q1: But is this kind of behavior can cause some problem in fail over
 event? I’m afraid that standby namenode will took a long time to be active.


Can you please explain how you arrived at this?



 Q2: Is there a way to purge the loading edit records? Should I do restart
 on standby namenode?




Other than showing a long list of loaded edits, there is nothing to be
concerned here. I agree that this is confusing and we could change this
where we print only last set of loaded edits instead of entire list.


  Best regards,

 Henry



 *From:* Suresh Srinivas [mailto:sur...@hortonworks.com]
 *Sent:* Thursday, June 12, 2014 11:23 AM
 *To:* hdfs-u...@hadoop.apache.org
 *Subject:* Re: hadoop 2.2.0 HA: standby namenode generate a long list of
 loading edits



 Henry,



 I suspect this is what is happening. On active namenode, oncethe  existing
 set of editlogs during startup are loaded, it becomes active and from then
 it has no need to load any more edits. It only generates edits. On the
 other hand, standby namenode not only loads the edits during startup, it
 also continuously loads the edits being generated by the active. Hence the
 difference.



 Regards,

 Suresh



 On Wed, Jun 11, 2014 at 7:49 PM, Henry Hung ythu...@winbond.com wrote:

  Hi All,



 I’m using QJM with 2 namenodes, in the active namenode, the main page’s
 loading edits panel only show 10 records, but in standby namenode, the
 loading edits panel show a lot more records, never count it, but I think it
 has  100 records.

 Is this a problem?



 Here I provide some of the data:




 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1080storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1361storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1830storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000140storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10001638storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002099storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002359storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000332storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000421storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005210storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005529storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000577storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005831storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005951storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006089storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006154storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006291storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e
 (0/0)

 100.00%

 0sec


 http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006482storageInfo

Re: how can i monitor Decommission progress?

2014-06-05 Thread Suresh Srinivas

The namenode webui provides that information. Click on the main webui the link 
associated with decommissioned nodes. 

Sent from phone

 On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote:
 
 use
 
 $hadoop dfsadmin -report
 
 
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370
 
 
 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote:
 hi,maillist:
   i decommission three node out of my cluster,but question is 
 how can i see the decommission progress?,i just can see admin state from web 
 ui
 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: listing a 530k files directory

2014-05-30 Thread Suresh Srinivas

Listing such a directory should not be a big problem. Can you cut and paste the 
command output. 

Which release are you using?

Sent from phone

 On May 30, 2014, at 5:49 AM, Guido Serra z...@fsfe.org wrote:
 
 already tried, didn't work (24cores at 100% and a-lot-memory, stilll ... GC 
 overhead limit exceed)
 
 thanks anyhow
 
 On 05/30/2014 02:43 PM, bharath vissapragada wrote:
 Hi Guido,
 
 You can set client side heap in HADOOP_OPTS variable before running the ls 
 command.
 
 export HADOOP_OPTS=-Xmx3g; hadoop fs -ls /
 
 - Bharath
 
 
 On Fri, May 30, 2014 at 5:22 PM, Guido Serra z...@fsfe.org wrote:
 Hi,
 do you have an idea on how to look at the content of a 530k-files HDFS 
 folder?
 (yes, I know it is a bad idea to have such setup, but that’s the status and 
 I’d like to debug it)
 and the only tool that doesn’t go out of memory is hdfs dfs -count folder/“
 
 -ls goes out of memory, -count with the folder/* goes out of memory …
 I’d like at least at the first 10 file names, see the size, maybe open one
 
 thanks,
 G.
 
 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: any optimize suggestion for high concurrent write into hdfs?

2014-02-20 Thread Suresh Srinivas

Another alternative is to write block sized chunks into multiple hdfs files 
concurrently followed by concat to all those into a single file. 

Sent from phone

 On Feb 20, 2014, at 8:15 PM, Chen Wang chen.apache.s...@gmail.com wrote:
 
 Ch,
 you may consider using flume as it already has a flume sink that can sink to 
 hdfs. What I did is to set up a flume listening on an Avro sink, and then 
 sink to hdfs. Then in my application, i just send my data to avro socket.
 Chen
 
 
 On Thu, Feb 20, 2014 at 5:07 PM, ch huang justlo...@gmail.com wrote:
 hi,maillist:
   is there any optimize for large of write into hdfs in same time ? 
 thanks
 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS Federation address performance issue

2014-01-28 Thread Suresh Srinivas

Response inline...

On Tue, Jan 28, 2014 at 10:04 AM, Anfernee Xu anfernee...@gmail.com wrote:

Hi,

Based on
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits,
the overall performance can be improved by federation, but I'm not sure
federation address my usercase, could someone elaborate it?

My usercase is I have one single NM and several DN, and I have bunch of
concurrent MR jobs which will create new files(plan files and
sub-directory) under the same parent directory, the questions are:

1) Will these concurrent writes(new file, plan files and sub-directory
under the same parent directory) run in sequential because WRITE-once
control govened by single NM?

Namenode commits multiple requests in a batch. In Namenode it self, the
lock for write operations make them sequential. But this is a short
duration lock and hence will make from the multiple clients perspective,
the creation of files as simultaneous.

If you are talking about a single client, with a single thread, then it
would be sequential.

Hope that makes sense.

I need this answer to estimate the necessity of moving to HDFS federation.

Thanks

--
--Anfernee

--
http://hortonworks.com/download/

Re: compatibility between new client and old server

2013-12-18 Thread Suresh Srinivas

2.x is a new major release. 1.x and 2.x are not compatible.

In 1.x, the RPC wire protocol used java serialization. In 2.x, the RPC wire
protocol uses protobuf. A client must be compiled against 2.x and should
use appropriate jars from 2.x to work with 2.x.


On Wed, Dec 18, 2013 at 10:45 AM, Ken Been ken.b...@twosigma.com wrote:

  I am trying to make a 2.2.0 Java client work with a 1.1.2 server.  The
 error I am currently getting is below.  I’d like to know if my problem is
 because I have configured something wrong or because the versions are
 simply not compatible for what I want to do.  Thanks in advance for any
 help.



 Ken



 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)

 at org.apache.hadoop.ipc.Client.call(Client.java:1351)

 at org.apache.hadoop.ipc.Client.call(Client.java:1300)

 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

 at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:601)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

 at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)

 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)

 at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)

 at
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)

 at my code...

 Caused by: java.io.EOFException

 at java.io.DataInputStream.readInt(DataInputStream.java:392)

 at
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)

 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDP 2.0 GA?

2013-11-05 Thread Suresh Srinivas

Please send the questions related to a vendor specific distro to vendor
mailing list. In this case - http://hortonworks.com/community/forums/.


On Tue, Nov 5, 2013 at 10:49 AM, Jim Falgout jim.falg...@actian.com wrote:

  HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2.


  --
 *From:* John Lilley john.lil...@redpoint.net
 *Sent:* Tuesday, November 05, 2013 12:34 PM
 *To:* user@hadoop.apache.org
 *Subject:* HDP 2.0 GA?


 I noticed that HDP 2.0 is available for download here:

 http://hortonworks.com/products/hdp-2/?b=1#install

 Is this the final “GA” version that tracks Apache Hadoop 2.2?

 Sorry I am just a little confused by the different numbering schemes.

 *Thanks*

 *John*






-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS / Federated HDFS - Doubts

2013-10-16 Thread Suresh Srinivas

On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison sediso...@gmail.com wrote:

 I have couple of questions about HDFS federation:

 Can I state different block store directories for each namespace on a
 datanode ?


No. The main idea of federation was not to physically partition the storage
across namespace, but to use all the available storage across the
namespaces, to ensure better utilzation.


 Can I have some datanodes dedicated to a particular namespace only ?


As I said earlier, all the datanodes are shared across namespaces. If you
want to dedicate datanodes to a particular namespace, you might as well
create it as two separate clusters with different set of datanodes and a
separate namespace.



 This seems quite interesting. Way to go !


 On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan natar...@umn.edu
  wrote:

 Hi All,

 While trying to understand federated HDFS in detail I had few doubts and
 listing them down for your help.

1. In case of *HDFS(without HDFS federation)*, the metadata or the
data about the blocks belonging to the files in HDFS is maintained in the
main memory of the name node or it is stored on permanent storage of the
namenode and is brought in the main memory on demand basis ? [Krishna]
Based on my understanding, I assume the entire metadata is in main memory
which is an issue by itself. Please correct me if my understanding is 
 wrong.
2. In case of* federated HDFS*, the metadata or the data about the
blocks belonging to files in a particular namespace is maintained in the
main memory of the namenode or it is stored on the permanent storage of 
 the
namenode and is brought in the main memory on demand basis ?
3. Are the metadata information stored in separate cluster
nodes(block management layer separation) as discussed in Appendix B of 
 this
document ?

 https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
4. I would like to know if the following proposals are already
implemented in federated HDFS. (

 http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
 slide-17)
- Separation of namespace and block management layers (same as qn.3)
   - Partial namespace in memory for further scalability
   - Move partial namespace from one namenode to another

 Thanks,
 Krishna





-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS federation Configuration

2013-09-23 Thread Suresh Srinivas


 I'm not able to follow the page completely.
 Can you pls help me to get some clear step by step or little bit more
 details in the configuration side?


Have you setup a non-federated cluster before. If you have, the page should
be easy to follow. If you have not setup a non-federated cluster before, I
suggest doing so, before looking at this document.

I think the document already has step by step instructions.

 I


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS federation Configuration

2013-09-19 Thread Suresh Srinivas

Have you looked at -
http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-hdfs/Federation.html

Let me know if the document is not clear or needs improvements.

Regards,
Suresh



On Thu, Sep 19, 2013 at 12:01 PM, Manickam P manicka...@outlook.com wrote:

  Guys,

 I need some tutorials to configure fedration. Can you pls suggest me some?




 Thanks,
 Manickam P




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Name node High Availability in Cloudera 4.1.1

2013-09-19 Thread Suresh Srinivas

Please do not cross-post these emails to hdfs-user. The relevant email list
is only cdh-user.


On Thu, Sep 19, 2013 at 1:44 AM, Pavan Kumar Polineni 
smartsunny...@gmail.com wrote:

 Hi all,

 *Name Node High Availability  Job tracker high availability* is there in
 Cloudera 4.1.1 ?

 If not, Then what are the properties need to change in Cloudera 4.1.1 to
 make the cluster as High availability.

 please help on this.. Thanks in Advance

 --
  Pavan Kumar Polineni




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-13 Thread Suresh Srinivas

Shahab,

I agree with your arguments. Really well put. Only things I would add is -
we do not want sales/marketing folks getting involved in these kinds of
threads and pollute it with sales pitches, unsubstantiated claims, and make
it a forum for marketing pitch. This can also have community repercussions
as you have rightly pointed out.

Wearing my own hadoop PMC hat, we do put Apache release regularly. Bigtop
also provides excellent stack packaging as well. In this forum my wish is
to see discussions around that than vendor related. There are already many
outside forums for this.

Regards,
Suresh


On Fri, Sep 13, 2013 at 10:48 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 I think, in my opinion, it is a wrong idea because:

 1- Many of the participants here are employees for these very companies
 that are under discussion. This puts these respective employees in very
 difficult position. It is very hard to come with a correct response.
 Comments can be misconstrued easily.
 2- Also, when we talk about vendor distributions of the software, it is
 not longer purely about open source. Now companies with the related
 corporate legal baggage also gets in the mix.
 3- The discussion would be on not only positive things about each vendor
 but in fact negatives. The latter type of  discussion which can get
 unpleasant very easily.
 4- Somebody mentioned that, this is a very lightly moderated platform and
 thus this discussion should be allowed. I think this is one of the reasons
 that it should not be because, people can say things casually, without much
 thought, or without taking care of the context or the possible
 interpretations and get in trouble.
 5- The risk here is not only that serious repercussions can occur (which
 very well can) but the greater risk is that it can cause misunderstanding
 between individuals, industries and companies.
 6-People here lot of time reply quickly just to resolve or help the
 'technical' issue. Now they will have to take care how they frame the
 response. Re: 4

 I know some will feel that I have created a highly exaggerated scenario
 above, but what I am trying to say is that, it is a slippery slope. If we
 allow this then this can go anywhere.

 By the way, I do not work for any of these vendors.

 More importantly, I am not saying that this discussion should not be had,
 I am just saying that this is a wrong forum.

 Just my 2 cents (or,...this was rather a dollar.)

 Regards,
 Shahab


 On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann mattm...@apache.orgwrote:

 Errr, what's wrong with discussing these types of issues on list?

 Nothing public here, and as long as it's kept to facts, this should
 not be a problem and Apache is a fine place to have such discussions.

 My 2c.





 -Original Message-
 From: Xuri Nagarin secs...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Thursday, September 12, 2013 4:39 PM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: Re: Cloudera Vs Hortonworks Vs MapR

 I understand it can be contentious issue especially given that a lot of
 contributors to this list work for one or the other vendor or have some
 stake in any kind of evaluation. But, I see no reason why users should
 not be able to compare notes
  and share experiences. Over time, genuine pain points or issues or
 claims will bubble up and should only help the community. Sure, there
 will be a few flame wars but this already isn't a very tightly moderated
 list.
 
 
 
 
 
 
 
 On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng
 a...@maprtech.com wrote:
 
 Raj,
 
 
 As others noted, this is not a great place for this discussion.  I'd
 suggest contacting the vendors you are interested in as I'm sure we'd all
 be happy to provide you more details.
 
 
 I don't know about the others, but for MapR, just send an email to
 sa...@mapr.com mailto:sa...@mapr.com and I'm sure someone will get
 back
 to you with more information.
 
 
 Best Regards,
 Aaron Eng
 
 
 
 On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj hadoop...@yahoo.com
 wrote:
 
 
 Hi,
 
 We are trying to evaluate different implementations of Hadoop for our big
 data enterprise project.
 
 Can the forum members advise on what are the advantages and disadvantages
 of each implementation i.e. Cloudera Vs Hortonworks Vs MapR.
 
 Thanks in advance.
 
 Regards,
 Raj
 
 
 
 
 
 
 
 
 
 
 






-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Suresh Srinivas

Raj,

You can also use Apache Hadoop releases. Bigtop does fine job as well
putting together consumable Hadoop stack.

As regards to vendor solutions, this is not the right forum. There are
other forums for this. Please refrain from this type of discussions on
Apache forum.

Regards,
Suresh


On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj hadoop...@yahoo.com wrote:

 Hi,

 We are trying to evaluate different implementations of Hadoop for our big
 data enterprise project.

 Can the forum members advise on what are the advantages and disadvantages
 of each implementation i.e. Cloudera Vs Hortonworks Vs MapR.

 Thanks in advance.

 Regards,
 Raj




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Symbolic Link in Hadoop 1.0.4

2013-09-05 Thread Suresh Srinivas

FileContext APIs and symlink functionality is not available in 1.0. It is
only available in 0.23 and 2.x release.


On Thu, Sep 5, 2013 at 8:06 AM, Gobilliard, Olivier 
olivier.gobilli...@cartesian.com wrote:

  Hi,



 I am using Hadoop 1.0.4 and need to create a symbolic link in HDSF.

 This feature has been added in Hadoop 0.21.0 (
 https://issues.apache.org/jira/browse/HDFS-245) in the new FileContext
 API (
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html
 ).

 However, I cannot find the FileContext API in the 1.0.4 release (
 http://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/). I cannot find
 it in any of the 1.X releases actually.



 Has this functionality been moved to another Class?



 Many thanks,

 Olivier


 __
 This email and any attachments are confidential. If you have received this
 email in error please notify the sender immediately
 by replying to this email and then delete from your computer without
 copying or distributing in any other way.

 Cartesian Limited - Registered in England and Wales with number 3230513
 Registered office: Descartes House, 8 Gate Street, London, WC2A 3HP
 www.cartesian.com




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Documentation for Hadoop's RPC mechanism

2013-08-20 Thread Suresh Srinivas

Create a Jira and post it into hadoop documentation. I can help you with the 
review and commit. 

Sent from phone

On Aug 20, 2013, at 10:40 AM, Elazar Leibovich elaz...@gmail.com wrote:

 Hi,
 
 I've written some documentation for Hadoop's RPC mechanism internals:
 
 http://hadoop.quora.com/Hadoop-RPC-mechanism
 
 I'll be very happy if the community can review it. You should be able to edit 
 it directly, or just send your comments to the list.
 
 Except, I'm looking for a good place to put it. Where does it fit? Would it 
 fit Hadoop's Wiki? Hadoop's Source?
 
 Thanks

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Suresh Srinivas

Folks, can you please take this thread to CDH related mailing list?


On Tue, Aug 13, 2013 at 3:07 PM, Brad Cox bradj...@gmail.com wrote:

 That link got my hopes up. But Cloudera Manager  (what I'm running; on
 CDH4) does not offer an Export Client Config option. What am I missing?

 On Aug 13, 2013, at 4:04 PM, Shahab Yunus shahab.yu...@gmail.com wrote:

 You should not use LocalJobRunner. Make sure that the mapred.job.tracker
 property does not point to 'local' an instead to your job-tracker host and
 port.

 *But before that* as Sandy said, your client machine (from where you will
 be kicking of your jobs and apps) should be using config files which will
 have your cluster's configuration. This is the alternative that you should
 follow if you don't want to bundle the configs for your cluster in the
 application itself (either in java code or separate copies of relevant
 properties set of config files.) This was something which I was suggesting
 early on to just to get you started using your cluster instead of local
 mode.

 By the way have you seen the following link? It gives you step by step
 information about how to generate config files from your cluster specific
 to your cluster and then how to place them and use the from any machine you
 want to designate as your client. Running your jobs form one of the
 datanodes without proper config would not work.

 https://ccp.cloudera.com/display/FREE373/Generating+Client+Configuration

 Regards,
 Shahab


 On Tue, Aug 13, 2013 at 1:07 PM, Pavan Sudheendra pavan0...@gmail.com
 wrote:

 Yes Sandy, I'm referring to LocalJobRunner. I'm actually running the
 job on one datanode..

 What changes should i make so that my application would take advantage
 of the cluster as a whole?

 On Tue, Aug 13, 2013 at 10:33 PM,  sandy.r...@cloudera.com wrote:

 Nothing in your pom.xml should affect the configurations your job runs

 with.


 Are you running your job from a node on the cluster? When you say

 localhost configurations, do you mean it's using the LocalJobRunner?


 -sandy

 (iphnoe tpying)

 On Aug 13, 2013, at 9:07 AM, Pavan Sudheendra pavan0...@gmail.com

 wrote:


 When i actually run the job on the multi node cluster, logs shows it
 uses localhost configurations which i don't want..

 I just have a pom.xml which lists all the dependencies like standard
 hadoop, standard hbase, standard zookeeper etc., Should i remove these
 dependencies?

 I want the cluster settings to apply in my map-reduce application..
 So, this is where i'm stuck at..

 On Tue, Aug 13, 2013 at 9:30 PM, Pavan Sudheendra pavan0...@gmail.com

 wrote:

 Hi Shabab and Sandy,
 The thing is we have a 6 node cloudera cluster running.. For
 development purposes, i was building a map-reduce application on a
 single node apache distribution hadoop with maven..

 To be frank, i don't know how to deploy this application on a multi
 node cloudera cluster. I am fairly well versed with Multi Node Apache
 Hadoop Distribution.. So, how can i go forward?

 Thanks for all the help :)

 On Tue, Aug 13, 2013 at 9:22 PM,  sandy.r...@cloudera.com wrote:

 Hi Pavan,

 Configuration properties generally aren't included in the jar itself

 unless you explicitly set them in your java code. Rather they're picked up
 from the mapred-site.xml file located in the Hadoop configuration directory
 on the host you're running your job from.


 Is there an issue you're coming up against when trying to run your

 job on a cluster?


 -Sandy

 (iphnoe tpying)

 On Aug 13, 2013, at 4:19 AM, Pavan Sudheendra pavan0...@gmail.com

 wrote:


 Hi,
 I'm currently using maven to build the jars necessary for my
 map-reduce program to run and it works for a single node cluster..

 For a multi node cluster, how do i specify my map-reduce program to
 ingest the cluster settings instead of localhost settings?
 I don't know how to specify this using maven to build my jar.

 I'm using the cdh distribution by the way..
 --
 Regards-
 Pavan




 --
 Regards-
 Pavan




 --
 Regards-
 Pavan




 --
 Regards-
 Pavan


 Dr. Brad J. CoxCell: 703-594-1883 Blog: http://bradjcox.blogspot.com
 http://virtualschool.edu







-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re:

2013-07-12 Thread Suresh Srinivas

Please use CDH mailing list. This is apache hadoop mailing list. 

Sent from phone

On Jul 12, 2013, at 7:51 PM, Anit Alexander anitama...@gmail.com wrote:

 Hello,
 
 I am encountering a problem in cdh4 environment. 
 I can successfully run the map reduce job in the hadoop cluster. But when i 
 migrated the same map reduce to my cdh4 environment it creates an error 
 stating that it cannot read the next block(each block is 64 mb). Why is that 
 so?
 
 Hadoop environment: hadoop 1.0.3
 java version 1.6
 
 chd4 environment: CDH4.2.0
 java version 1.6
 
 Regards,
 Anit Alexander

Re: Cloudera links and Document

2013-07-11 Thread Suresh Srinivas

Sathish, this mailing list for Apache Hadoop related questions. Please post
questions related to other distributions to appropriate vendor's mailing
list.



On Thu, Jul 11, 2013 at 6:28 AM, Sathish Kumar sa848...@gmail.com wrote:

 Hi All,

 Can anyone help me the link or document that explain the below.

 How Cloudera Manager works and handle the clusters (Agent and Master
 Server)?
 How the Cloudera Manager Process Flow works?
 Where can I locate Cloudera configuration files and explanation in brief?


 Regards
 Sathish




-- 
http://hortonworks.com/download/

Re: data loss after cluster wide power loss

2013-07-03 Thread Suresh Srinivas

On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:
  Dave,
 
  Thanks for the detailed email. Sorry I did not read all the details you
 had
  sent earlier completely (on my phone). As you said, this is not related
 to
  data loss related to HBase log and hsync. I think you are right; the
 rename
  operation itself might not have hit the disk. I think we should either
  ensure metadata operation is synced on the datanode or handle it being
  reported as blockBeingWritten. Let me spend sometime to debug this issue.

 In theory, ext3 is journaled, so all metadata operations should be
 durable in the case of a power outage.  It is only data operations
 that should be possible to lose.  It is the same for ext4.  (Assuming
 you are not using nonstandard mount options.)


ext3 journal may not hit the disk right. From what I read, if you do not
specifically
call sync, even the metadata operations do not hit disk.

See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt

commit=nrsec(*) Ext3 can be told to sync all its data and metadata
every 'nrsec' seconds. The default value is 5 seconds.
This means that if you lose your power, you will lose
as much as the latest 5 seconds of work (your
filesystem will not be damaged though, thanks to the
journaling).  This default value (or any low value)
will hurt performance, but it's good for data-safety.
Setting it to 0 will have the same effect as leaving
it at the default (5 seconds).
Setting it to very large values will improve

performance.

Re: HDFS file section rewrite

2013-07-02 Thread Suresh Srinivas

HDFS only supports regular writes and append. Random write is not
supported. I do not know of any feature/jira that is underway to support
this feature.


On Tue, Jul 2, 2013 at 9:01 AM, John Lilley john.lil...@redpoint.netwrote:

  I’m sure this has been asked a zillion times, so please just point me to
 the JIRA comments: is there a feature underway to allow for re-writing of
 HDFS file sections?

 Thanks

 John

 ** **




-- 
http://hortonworks.com/download/

Re: data loss after cluster wide power loss

2013-07-01 Thread Suresh Srinivas

Dave,

Thanks for the detailed email. Sorry I did not read all the details you had
sent earlier completely (on my phone). As you said, this is not related to
data loss related to HBase log and hsync. I think you are right; the rename
operation itself might not have hit the disk. I think we should either
ensure metadata operation is synced on the datanode or handle it being
reported as blockBeingWritten. Let me spend sometime to debug this issue.

One surprising thing is, all the replicas were reported as
blockBeingWritten.

Regards,
Suresh


On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote:

 (Removing hbase list and adding hdfs-dev list as this is pretty internal
 stuff).

 Reading through the code a bit:

 FSDataOutputStream.close calls
 DFSOutputStream.close calls
 DFSOutputStream.closeInternal
  - sets currentPacket.lastPacketInBlock = true
  - then calls
 DFSOutputStream.flushInternal
  - enqueues current packet
  - waits for ack

 BlockReceiver.run
  - if (lastPacketInBlock  !receiver.finalized) calls
 FSDataset.finalizeBlock calls
 FSDataset.finalizeBlockInternal calls
 FSVolume.addBlock calls
 FSDir.addBlock calls
 FSDir.addBlock
  - renames block from blocksBeingWritten tmp dir to current dest dir

 This looks to me as I would expect a synchronous chain from a DFS client
 to moving the file from blocksBeingWritten to the current dir so that once
 the file is closed that it the block files would be in the proper directory
 - even if the contents of the file are still in the OS buffer rather than
 synced to disk.  It's only after this moving of blocks that
 NameNode.complete file is called.  There are several conditions and loops
 in there that I'm not certain this chain is fully reliable in all cases
 without a greater understanding of the code.

 Could it be the case that the rename operation itself is not synced and
 that ext3 lost the fact that the block files were moved?
 Or is there a bug in the close file logic that for some reason the block
 files are not always moved into place when a file is closed?

 Thanks for your patience,
 Dave


 On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net wrote:

 Thanks for the response, Suresh.

 I'm not sure that I understand the details properly.  From my reading of
 HDFS-744 the hsync API would allow a client to make sure that at any point
 in time it's writes so far hit the disk.  For example, for HBase it could
 apply a fsync after adding some edits to its WAL to ensure those edits are
 fully durable for a file which is still open.

 However, in this case the dfs file was closed and even renamed.  Is it
 the case that even after a dfs file is closed and renamed that the data
 blocks would still not be synced and would still be stored by the datanode
 in blocksBeingWritten rather than in current?  If that is case, would
 it be better for the NameNode not to reject replicas that are in
 blocksBeingWritten, especially if it doesn't have any other replicas
 available?

 Dave


 On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas 
 sur...@hortonworks.comwrote:

 Yes this is a known issue.

 The HDFS part of this was addressed in
 https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is
 not
 available in 1.x  release. I think HBase does not use this API yet.


 On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote:

  We're running HBase over HDFS 1.0.2 on about 1000 nodes.  On Saturday
 the
  data center we were in had a total power failure and the cluster went
 down
  hard.  When we brought it back up, HDFS reported 4 files as CORRUPT.
  We
  recovered the data in question from our secondary datacenter, but I'm
  trying to understand what happened and whether this is a bug in HDFS
 that
  should be fixed.
 
  From what I can tell the file was created and closed by the dfs client
  (hbase).  Then HBase renamed it into a new directory and deleted some
 other
  files containing the same data.  Then the cluster lost power.  After
 the
  cluster was restarted, the datanodes reported into the namenode but the
  blocks for this file appeared as blocks being written - the namenode
  rejected them and the datanodes deleted the blocks.  At this point
 there
  were no replicas for the blocks and the files were marked CORRUPT.  The
  underlying file systems are ext3.  Some questions that I would love get
  answers for if anyone with deeper understanding of HDFS can chime in:
 
   - Is this a known scenario where data loss is expected?  (I found
  HDFS-1539 but that seems different)
   - When are blocks moved from blocksBeingWritten to current?  Does that
  happen before a file close operation is acknowledged to a hdfs client?
   - Could it be that the DataNodes actually moved the blocks to current
 but
  after the restart ext3 rewound state somehow (forgive my ignorance of
  underlying file system behavior)?
   - Is there any other explanation for how this can happen?
 
  Here is a sequence of selected

Re: Please explain FSNamesystemState TotalLoad

2013-06-07 Thread Suresh Srinivas

On Fri, Jun 7, 2013 at 9:10 AM, Nick Niemeyer nnieme...@riotgames.comwrote:

  Regarding TotalLoad, what would be normal operating tolerances per node
 for this metric?  When should one become concerned?  Thanks again to
 everyone participating in this community.  :)


Why do you want to be concered :) I have not seen many issues related to
high TotalLoad.

This is mainly useful in terms of understanding how many concurrent
jobs/file accesses are happening and how busy datanodes are. When you are
debugging issues where cluster slow down due to overload, or correlating a
run of big jobs, this is useful. Knowing what it represent, you would find
many other uses as well.


   From: Suresh Srinivas sur...@hortonworks.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Thursday, June 6, 2013 4:14 PM
 To: hdfs-u...@hadoop.apache.org user@hadoop.apache.org
 Subject: Re: Please explain FSNamesystemState TotalLoad

   It is the total number of transceivers (readers and writers) reported
 by all the datanodes. Datanode reports this count in periodic heartbeat to
 the namenode.


 On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.comwrote:

   Can someone please explain what TotalLoad represents below?  Thanks
 for your response in advance!

  Version: hadoop-0.20-namenode-0.20.2+923.197-1

  Example pulled from the output of via the name node:
   # curl -i http://localhost:50070/jmx

  {
 name : hadoop:service=NameNode,name=FSNamesystemState,
 modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem,
 CapacityTotal : #,
 CapacityUsed : #,
 CapacityRemaining : #,
* TotalLoad : #,*
 BlocksTotal : #,
 FilesTotal : #,
 PendingReplicationBlocks : 0,
 UnderReplicatedBlocks : 0,
 ScheduledReplicationBlocks : 0,
 FSState : Operational
   }


  Thanks,
 Nick




  --
 http://hortonworks.com/download/




-- 
http://hortonworks.com/download/

Re: Please explain FSNamesystemState TotalLoad

2013-06-06 Thread Suresh Srinivas

It is the total number of transceivers (readers and writers) reported by
all the datanodes. Datanode reports this count in periodic heartbeat to the
namenode.


On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.comwrote:

   Can someone please explain what TotalLoad represents below?  Thanks for
 your response in advance!

  Version: hadoop-0.20-namenode-0.20.2+923.197-1

  Example pulled from the output of via the name node:
   # curl -i http://localhost:50070/jmx

  {
 name : hadoop:service=NameNode,name=FSNamesystemState,
 modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem,
 CapacityTotal : #,
 CapacityUsed : #,
 CapacityRemaining : #,
* TotalLoad : #,*
 BlocksTotal : #,
 FilesTotal : #,
 PendingReplicationBlocks : 0,
 UnderReplicatedBlocks : 0,
 ScheduledReplicationBlocks : 0,
 FSState : Operational
   }


  Thanks,
 Nick




-- 
http://hortonworks.com/download/

Re: How to test the performance of NN?

2013-06-05 Thread Suresh Srinivas

What do you mean by it is not telling me any thing about performance? Also
I do not understand the part, only about potential failures.. Can you add
more details.

nnbench is the best microbenchmark for nn performance test.


On Wed, Jun 5, 2013 at 3:17 PM, Mark Kerzner mark.kerz...@shmsoft.comwrote:

 Hi,

 I am trying to create a more efficient namenode, and for that I need to
 the standard distribution, and then compare it to my version.

 Which benchmark should I run? I am doing nnbench, but it is not telling me
 anything about performance, only about potential failures.

 Thank you.
 Sincerely,
 Mark




-- 
http://hortonworks.com/download/

Re: cloudera4.2 source code ant

2013-05-17 Thread Suresh Srinivas

Folks, this is Apache Hadoop mailing list. For vendor distro related questions, 
please use the appropriate vendor mailing list. 

Sent from a mobile device

On May 17, 2013, at 2:06 AM, Kun Ling lkun.e...@gmail.com wrote:

 Hi dylan,
 
  I have not build CDH source code using ant, However I have met a similar 
 dependencies resolve filed problem.
 
 Acccording to my experience,   this is much like a package network 
 download issue. 
 
 You may try to remove the .ivy2  and .m2   directories in your home 
 directory, and  run ant clean; ant to try again.
 
 
Hope it is helpful to you.
 
 
 yours,
 Kun Ling 
 
 
 On Fri, May 17, 2013 at 4:42 PM, dylan dwld0...@gmail.com wrote:
 hello, 
 
  there is a problem i can't resolved, i want to remote connect the 
 hadoop ( cloudera cdh4.2.0 ) via eclipse plugin.There’s have no 
 hadoop-eclipse-pluge.jar ,so i download the  hadoop of cdh4.2.0  tarbal and 
 when i complie, the error is below:
 
  
 
 ivy-resolve-common:
 
 [ivy:resolve] :: resolving dependencies :: 
 org.apache.hadoop#eclipse-plugin;working@master
 
 [ivy:resolve]confs: [common]
 
 [ivy:resolve]found commons-logging#commons-logging;1.1.1 in maven2
 
 [ivy:resolve] :: resolution report :: resolve 5475ms :: artifacts dl 2ms
 
-
 
|  |modules||   artifacts   |
 
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
 
-
 
|  common  |   2   |   0   |   0   |   0   ||   1   |   0   |
 
-
 
 [ivy:resolve] 
 
 [ivy:resolve] :: problems summary ::
 
 [ivy:resolve]  WARNINGS
 
 [ivy:resolve]   ::
 
 [ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
 
 [ivy:resolve]   ::
 
 [ivy:resolve]   :: log4j#log4j;1.2.16: several problems occurred 
 while resolving dependency: log4j#log4j;1.2.16 {common=[master]}:
 
 [ivy:resolve]reactor-repo: unable to get resource for 
 log4j#log4j;1.2.16: res=${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.pom: 
 java.net.MalformedURLException: no protocol: 
 ${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.pom
 
 [ivy:resolve]reactor-repo: unable to get resource for 
 log4j#log4j;1.2.16: res=${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.jar: 
 java.net.MalformedURLException: no protocol: 
 ${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.jar
 
 [ivy:resolve]   ::
 
 [ivy:resolve] 
 
 [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 
  
 
 BUILD FAILED
 
 /home/paramiao/hadoop-2.0.0-mr1-cdh4.2.0/src/contrib/build-contrib.xml:440: 
 impossible to resolve dependencies:
 
resolve failed - see output for details
 
  
 
 so could someone tell me where i am wrong and how could make it success? 
 
  
 
 best regards!
 
 
 
 
 -- 
 http://www.lingcc.com

Re: CDH4 installation along with MRv1 from tarball

2013-03-20 Thread Suresh Srinivas

Can you guys please take this thread to CDH mailing list?

Sent from phone

On Mar 20, 2013, at 2:48 PM, rohit sarewar rohitsare...@gmail.com wrote:

 Hi Jens
 
 These are not complete version of Hadoop.
 1) hadoop-0.20-mapreduce-0.20.2+1341 (has only MRv1)
 2) hadoop-2.0.0+922 (has HDFS+ Yarn)
 
 I request you to read the comments in this link 
 https://issues.cloudera.org/browse/DISTRO-447
 
 
 
 
 
 On Tue, Mar 19, 2013 at 1:17 PM, Jens Scheidtmann 
 jens.scheidtm...@gmail.com wrote:
 Rohit,
 
 What are you trying to achieve with two different complete versions of 
 hadoop?
 
 Thanks,
 
 Jens
 
 
 
 2013/3/18 rohit sarewar rohitsare...@gmail.com
 Need some guidance on CDH4 installation from tarballs
 
 I have downloaded two files from  
 https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs 
 
 1) hadoop-0.20-mapreduce-0.20.2+1341 (has only MRv1)
 2) hadoop-2.0.0+922 (has HDFS+ Yarn)

Re: Regarding: Merging two hadoop clusters

2013-03-14 Thread Suresh Srinivas

 I have two different hadoop clusters in production. One cluster is used as
 backing for HBase and the other for other things. Both hadoop clusters are
 using the same version 1.0 and I want to merge them and make them one. I
 know, one possible solution is to copy the data across, but the data is
 really huge on these clusters and it will hard for me to compromise with
 huge downtime.
 Is there any optimal way to merge two hadoop clusters.


This is not a supported feature. Hence this activity would require
understanding low level Hadoop details, quite a bit of hacking and is not
straightforward. Copying data from the clusters is the simplest solution.

Re: Hadoop cluster hangs on big hive job

2013-03-11 Thread Suresh Srinivas

I have seen one such problem related to big hive jobs that open a lot of
files. See HDFS-4496 for more details. Snippet from the description:
The following issue was observed in a cluster that was running a Hive job
and was writing to 100,000 temporary files (each task is writing to 1000s
of files). When this job is killed, a large number of files are left open
for write. Eventually when the lease for open files expires, lease recovery
is started for all these files in a very short duration of time. This
causes a large number of commitBlockSynchronization where logSync is
performed with the FSNamesystem lock held. This overloads the namenode
resulting in slowdown.

Could this be the cause? Can you see namenode log to see if you have lease
recovery activity? If not, can you send some information about what is
happening in the namenode logs at the time of this slowdown?



On Mon, Mar 11, 2013 at 1:32 PM, Daning Wang dan...@netseer.com wrote:

 [hive@mr3-033 ~]$ hadoop version
 Hadoop 1.0.4
 Subversion
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
 1393290
 Compiled by hortonfo on Wed Oct  3 05:13:58 UTC 2012


 On Sun, Mar 10, 2013 at 8:16 AM, Suresh Srinivas 
 sur...@hortonworks.comwrote:

 What is the version of hadoop?

 Sent from phone

 On Mar 7, 2013, at 11:53 AM, Daning Wang dan...@netseer.com wrote:

 We have hive query processing zipped csv files. the query was scanning
 for 10 days(partitioned by date). data for each day around 130G. The
 problem is not consistent since if you run it again, it might go through.
 but the problem has never happened on the smaller jobs(like processing only
 one days data).

 We don't have space issue.

 I have attached log file when problem happening. it is stuck like
 following(just search 19706 of 49964)

 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_19_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_39_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_32_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_00_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_24_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_08_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 

 Thanks,

 Daning


 On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård 
 haavard.kongsga...@gmail.com wrote:

 hadoop logs?
 On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote:

 We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
 running big jobs. Basically all the nodes are dead, from that
 trasktracker's log looks it went into some kinds of loop forever.

 All the log entries like this when problem happened.

 Any idea how to debug the issue?

 Thanks in advance.


 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_12_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_28_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_36_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_16_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_19_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_39_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_32_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_00_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_24_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_08_0 0.131468% reduce  copy (19706 of
 49964 at 0.00 MB/s) 
 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201302270947_0010_r_39_0

Re: Hadoop cluster hangs on big hive job

2013-03-10 Thread Suresh Srinivas

What is the version of hadoop?

Sent from phone

On Mar 7, 2013, at 11:53 AM, Daning Wang dan...@netseer.com wrote:

 We have hive query processing zipped csv files. the query was scanning for 10 
 days(partitioned by date). data for each day around 130G. The problem is not 
 consistent since if you run it again, it might go through. but the problem 
 has never happened on the smaller jobs(like processing only one days data).
 
 We don't have space issue.
 
 I have attached log file when problem happening. it is stuck like 
 following(just search 19706 of 49964)
 
 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_19_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_39_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_32_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_00_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_24_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_08_0 0.131468% reduce  copy (19706 of 49964 
 at 0.00 MB/s) 
 
 Thanks,
 
 Daning
 
 
 On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård 
 haavard.kongsga...@gmail.com wrote:
 hadoop logs?
 
 On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote:
 We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while 
 running big jobs. Basically all the nodes are dead, from that 
 trasktracker's log looks it went into some kinds of loop forever.
 
 All the log entries like this when problem happened.
 
 Any idea how to debug the issue?
 
 Thanks in advance.
 
 
 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_12_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_28_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_36_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_16_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_19_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_39_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_32_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_00_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_24_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_08_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_39_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_04_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_43_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_12_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_28_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_24_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_36_0 0.131468% reduce  copy (19706 of 
 49964 at 0.00 MB/s)  
 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201302270947_0010_r_16_0 0.131468% reduce  copy (19706 of

Re: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade

2013-03-08 Thread Suresh Srinivas

Please followup on Jenkins failures. Looks like the patch is generated at
the wrong directory.


On Thu, Feb 28, 2013 at 1:34 AM, Azuryy Yu azury...@gmail.com wrote:

 Who can review this JIRA(https://issues.apache.org/jira/browse/HDFS-4533),
 which is very simple.


 -- Forwarded message --
 From: Hadoop QA (JIRA) j...@apache.org
 Date: Wed, Feb 27, 2013 at 4:53 PM
 Subject: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional
 parameters besides -upgrade
 To: azury...@gmail.com



 [
 https://issues.apache.org/jira/browse/HDFS-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588130#comment-13588130]

 Hadoop QA commented on HDFS-4533:
 -

 {color:red}-1 overall{color}.  Here are the results of testing the latest
 attachment
   http://issues.apache.org/jira/secure/attachment/12571164/HDFS-4533.patch
   against trunk revision .

 {color:red}-1 patch{color}.  The patch command could not apply the
 patch.

 Console output:
 https://builds.apache.org/job/PreCommit-HDFS-Build/4008//console

 This message is automatically generated.

  start-dfs.sh ignored additional parameters besides -upgrade
  ---
 
  Key: HDFS-4533
  URL: https://issues.apache.org/jira/browse/HDFS-4533
  Project: Hadoop HDFS
   Issue Type: Bug
   Components: datanode, namenode
 Affects Versions: 2.0.3-alpha
 Reporter: Fengdong Yu
   Labels: patch
  Fix For: 2.0.4-beta
 
  Attachments: HDFS-4533.patch
 
 
  start-dfs.sh only takes -upgrade option and ignored others.
  So If run the following command, it will ignore the clusterId option.
  start-dfs.sh -upgrade -clusterId 1234

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira




-- 
http://hortonworks.com/download/

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread Suresh Srinivas

Can you please take this Cloudera mailing list?


On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am trying to run all Hadoop servers on a single Ubuntu localhost. All
 ports are open and my /etc/hosts file is

 127.0.0.1   frigate frigate.domain.locallocalhost
 # The following lines are desirable for IPv6 capable hosts
 ::1 ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters

 When trying to install cluster Cloudera manager fails with the following
 messages:

 Installation failed. Failed to receive heartbeat from agent.

 I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
 provider. What configuration is missing?

 Thanks!




-- 
http://hortonworks.com/download/

Re: QJM HA and ClusterID

2013-02-26 Thread Suresh Srinivas

looks start-dfs.sh has a bug. It only takes -upgrade option and ignores
clusterId.

Consider running the command (which is what start-dfs.sh calls):
bin/hdfs start namenode -upgrade -clusterId your cluster ID

Please file a bug, if you can, for start-dfs.sh bug which ignores
additional parameters.


On Tue, Feb 26, 2013 at 4:50 PM, Azuryy Yu azury...@gmail.com wrote:

 Anybody here? Thanks!


 On Tue, Feb 26, 2013 at 9:57 AM, Azuryy Yu azury...@gmail.com wrote:

 Hi all,
 I've stay on this question several days. I want upgrade my cluster from
 hadoop-1.0.3 to hadoop-2.0.3-alpha, I've configured QJM successfully.

 How to customize clusterID by myself. It generated a random clusterID now.

 It doesn't work when I run:

 start-dfs.sh -upgrade -clusterId 12345-test

 Thanks!





-- 
http://hortonworks.com/download/

Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with hive-0.9.0-cdh4.1.2)

2013-02-07 Thread Suresh Srinivas

Please only use CDH mailing list and do not copy this to hdfs-user.


On Thu, Feb 7, 2013 at 7:20 AM, samir das mohapatra samir.help...@gmail.com
 wrote:

 Any Suggestion...


 On Thu, Feb 7, 2013 at 4:17 PM, samir das mohapatra 
 samir.help...@gmail.com wrote:

 Hi All,
   I could not see the hive meta  store DB under Mysql  database Under
 mysql user hadoop.

 Example:

 $  mysql –u root -p
  $ Add hadoop user (CREATE USER ‘hadoop'@'localhost' IDENTIFIED BY ‘
 hadoop';)
  $GRANT ALL ON *.* TO ‘hadoop'@‘% IDENTIFIED BY ‘hadoop’
  $ Example (GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost'
 IDENTIFIED BY 'hadoop' WITH GRANT OPTION;)

 Bellow  configuration i am follwing
 

 property
 namejavax.jdo.option.ConnectionURL/name

 valuejdbc:mysql://localhost:3306/hadoop?createDatabaseIfNotExist=true/value
 /property
 property
 namejavax.jdo.option.ConnectionDriverName/name
 valuecom.mysql.jdbc.Driver/value
 /property
 property
   namejavax.jdo.option.ConnectionUserName/name
   valuehadoop/value
 /property
 property
namejavax.jdo.option.ConnectionPassword/name
valuehadoop/value

 /property


  Note: Previously i was using cdh3 it was perfectly creating under mysql
 metastore DB but when i changed cdh3 to cdh4.1.2 with hive as above subject
 line , It is not creating.


 Any suggestiong..

 Regrads,
 samir.





-- 
http://hortonworks.com/download/

Re: Application of Cloudera Hadoop for Dataset analysis

2013-02-05 Thread Suresh Srinivas

Please take this thread to CDH mailing list.


On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku 
sharathchandr...@gmail.com wrote:

 Hi,

 I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I
 would like to get the following clarifications regarding cloudera hadoop
 distribution. I am using a CDH4 Demo VM for now.

 1. After I upload the files into the file browser, if I have to link
 two-three datasets using a key in those files, what should I do? Do I have
 to run a query over them?

 2. My objective is that I have some data collected over a few years and
 now, I would like to link all of them, as in a database using keys and then
 run queries over them to find out particular patterns. Later I would like
 to implement some Machine learning algorithms on them for predictive
 analysis. Will this be possible on the demo VM?

 I am totally new to this. Can I get some help on this? I would be very
 grateful for the same.


 --
 Thanks and Regards,
 *Sharath Chandra Guntuku*
 Undergraduate Student (Final Year)
 *Computer Science Department*
 *Email*: f2009...@hyderabad.bits-pilani.ac.in

 *BITS-Pilani*, Hyderabad Campus
 Jawahar Nagar, Shameerpet, RR Dist,
 Hyderabad - 500078, Andhra Pradesh




-- 
http://hortonworks.com/download/

Re: Advice on post mortem of data loss (v 1.0.3)

2013-02-05 Thread Suresh Srinivas

Sorry to hear you are having issues. Few questions and comments inline.

On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan
psheri...@millennialmedia.com wrote:

Yesterday, I bounced my DFS cluster. We realized that ulimit –u was,
in extreme cases, preventing the name node from creating threads. This had
only started occurring within the last day or so. When I brought the name
node back up, it had essentially been rolled back by one week, and I lost
all changes which had been made since then.

There are a few other factors to consider.

1. I had 3 locations for dfs.name.dir — one local and two NFS. (I
originally thought this was 2 local and one NFS when I set it up.) On
1/24, the day which we effectively rolled back to, the second NFS mount
started showing as FAILED on dfshealth.jsp. We had seen this before
without issue, so I didn't consider it critical.

What do you mean by rolled back to?
I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for
local disk and nfs for NFS). nfs2 was shown as failed.

1. When I brought the name node back up, because of discovering the
above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding
the one which had failed.

When you brought the namenode backup, with the changed configuration you
have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1
have the latest edits so far. Correct? How did you add l2? Can you describe
this procedure in detail?

Reviewing the name node log from the day with the NFS outage, I see:

When you say NFS outage here, this is the failure corresponding to nfs2
from above. Is that correct?

2013-01-24 16:33:11,794 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit
log.
java.io.IOException: Input/output error
at sun.nio.ch.FileChannelImpl.force0(Native Method)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage:
Removing storage dir /rdisks/xx

Unfortunately, since I wasn't expecting anything terrible to happen, I
didn't look too closely at the file system while the name node was down.
When I brought it up, the time stamp on the previous checkpoint directory
in the dfs.name.dir was right around the above error message. The current
directory basically had an fsimage and an empty edits log with the current
time stamps.

Which storage directory are you talking about here?

So: what happened? Should this failure have led to my data loss? I
would have thought the local directory would be fine in this scenario. Did
I have any other options for data recovery?

I am not sure how you concluded that you lost a week's data and the
namenode rolled back by one week? Please share the namenode logs
corresponding to the restart.

This is how it should have worked.
- When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded,
corresponding to removal of a storage directory.
- If there is any checkpointing that happened, it would have also
incremented the timestamp.
- When the namenode starts up, it chooses l1 and nfs1 because the recorded
timestamp is the latest on these directories and loads fsimage and edits
from those directories. Namenode also performs checkpoint and writes new
consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2
and nfs1.

If you provide more details on how l2 was added, we may be able to
understand what happened.

Regards,
Suresh

--
http://hortonworks.com/download/

Re: ClientProtocol Version mismatch. (client = 69, server = 1)

2013-01-29 Thread Suresh Srinivas

Please take this up in CDH mailing list. Most likely you are using client
that is not from 2.0 release of Hadoop.


On Tue, Jan 29, 2013 at 12:33 PM, Kim Chew kchew...@gmail.com wrote:

 I am using CDH4 (2.0.0-mr1-cdh4.1.2) vm running on my mbp.

 I was trying to invoke a remote method in the ClientProtocol via RPC,
 however I am getting this exception.

 2013-01-29 11:20:45,810 ERROR
 org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException as:training (auth:SIMPLE)
 cause:org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
 org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch.
 (client = 69, server = 1)
 2013-01-29 11:20:45,810 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 6 on 8020, call
 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from
 192.168.140.1:50597: error: org.apache.hadoop.ipc.RPC$VersionMismatch:
 Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version
 mismatch. (client = 69, server = 1)
 org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
 org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch.
 (client = 69, server = 1)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.getProtocolImpl(ProtobufRpcEngine.java:400)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

 I could understand if the Server's ClientProtocol has version number
 60 or something else, but how could it has a version number of 1?

 Thanks.

 Kim




-- 
http://hortonworks.com/download/

Re: Using distcp with Hadoop HA

2013-01-29 Thread Suresh Srinivas

Currently, as you have pointed out, client side configuration based
failover is used in HA setup. The configuration must define namenode
addresses  for the nameservices of both the clusters. Are the datanodes
belonging to the two clusters running on the same set of nodes? Can you
share the configuration you are using, to diagnose the problem?

- I am trying to do a distcp from cluster A to cluster B. Since no
 operations are supported on the standby namenode, I need to specify either
 the active namenode while using distcp or use the failover proxy provider
 (dfs.client.failover.proxy.provider.clusterA) where I can specify the two
 namenodes for cluster B and the failover code inside HDFS will figure it
 out.



 - If I use the failover proxy provider, some of my datanodes on cluster A
 would connect to the namenode on cluster B and vice versa. I am assuming
 that is because I have configured both nameservices in my hdfs-site.xml for
 distcp to work.. I have configured dfs.nameservice.id to be the right one
 but the datanodes do not seem to respect that.

 What is the best way to use distcp with Hadoop HA configuration without
 having the datanodes to connect to the remote namenode? Thanks

 Regards,
 Dhaval




-- 
http://hortonworks.com/download/

Re: Cohesion of Hadoop team?

2013-01-18 Thread Suresh Srinivas

On Fri, Jan 18, 2013 at 6:48 AM, Glen Mazza gma...@talend.com wrote:

  Hi, looking at the derivation of the 0.23.x  2.0.x branches on one hand,
 and the 1.x branches on the other, as described here:

 http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCD0CAB8B.1098F%25evans%40yahoo-inc.com%3E

 One gets the impression the Hadoop committers are split into two teams,
 with one team working on 0.23.x/2.0.2 and another team working on 1.x,
 running the risk of increasingly diverging products eventually competing
 with each other.  Is that the case?


I am not sure how you came to this conclusion. The way I see it is, all the
folks are working on trunk. Subset of this work from trunk is pushed to
older releases such as 1.x or 0.23.x. In Apache Hadoop, features always go
to trunk first before going to any older releases 1.x or 0.23.x. That means
trunk is a superset of all the features.

Is there expected to be a Hadoop 3.0 where the results of the two lines of
 development will merge or is it increasingly likely the subteams will
 continue their separate routes?


2.0.3-alpha, which is the latest release based off of trunk, that is in
final stage of completion should have all the features that all the other
releases have. Let me know if there are any exceptions to this that you
know of.



 Thanks,
 Glen

 --
 Glen Mazza
 Talend Community Coders - coders.talend.com
 blog: www.jroller.com/gmazza




-- 
http://hortonworks.com/download/

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas

You did free up lot of old generation with reducing young generation,
right? The extra 5G of RAM for the old generation should have helped.

Based on my calculation, for the current number of objects you have, you
need roughly:
12G of total heap with young generation size of 1G. This assumes the
average file name size is 32 bytes.

In later releases (= 0.20.204), several memory optimization and startup
optimizations have been done. It should help you as well.



On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 So it turns out the issue was just the size of the filesystem.
 2012-12-27 16:37:22,390 WARN
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
 New Image Size: 4,354,340,042

 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you
 need about 3x ram as your FSImage size. If you do not have enough you die a
 slow death.

 On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  Do not have access to my computer. Based on reading the previous email, I
  do not see any thing suspicious on the list of objects in the histo live
  dump.
 
  I would like to hear from you about if it continued to grow. One instance
  of this I had seen in the past was related to weak reference related to
  socket objects.  I do not see that happening here though.
 
  Sent from phone
 
  On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   Tried this..
  
   NameNode is still Ruining my Xmas on its slow death march to OOM.
  
   http://imagebin.org/240453
  
  
   On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
  sur...@hortonworks.comwrote:
  
   -XX:NewSize=1G -XX:MaxNewSize=1G
 




-- 
http://hortonworks.com/download/

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas

I do not follow what you mean here.

 Even when I forced a GC it cleared 0% memory.
Is this with new younggen setting? Because earlier, based on the
calculation I posted, you need ~11G in old generation. With 6G as the
default younggen size, you actually had just enough memory to fit the
namespace in oldgen. Hence you might not have seen Full GC freeing up
enough memory.

Have you tried Full GC with 1G youngen size have you tried this? I supsect
you would see lot more memory freeing up.

 One would think that since the entire NameNode image is stored in memory
that the heap would not need to grow beyond that
Namenode image that you see during checkpointing is the size of file
written after serializing file system namespace in memory. This is not what
is directly stored in namenode memory. Namenode stores data structures that
corresponds to file system directory tree and block locations. Out of this
only file system directory is serialized and written to fsimage. Blocks
locations are not.




On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I am not sure GC had a factor. Even when I forced a GC it cleared 0%
 memory. One would think that since the entire NameNode image is stored in
 memory that the heap would not need to grow beyond that, but that sure does
 not seem to be the case. a 5GB image starts off using 10GB of memory and
 after burn in it seems to use about 15GB memory.

 So really we say the name node data has to fit in memory but what we
 really mean is the name node data must fit in memory 3x

 On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  You did free up lot of old generation with reducing young generation,
  right? The extra 5G of RAM for the old generation should have helped.
 
  Based on my calculation, for the current number of objects you have, you
  need roughly:
  12G of total heap with young generation size of 1G. This assumes the
  average file name size is 32 bytes.
 
  In later releases (= 0.20.204), several memory optimization and startup
  optimizations have been done. It should help you as well.
 
 
 
  On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   So it turns out the issue was just the size of the filesystem.
   2012-12-27 16:37:22,390 WARN
   org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint
  done.
   New Image Size: 4,354,340,042
  
   Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So
 you
   need about 3x ram as your FSImage size. If you do not have enough you
  die a
   slow death.
  
   On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas 
 sur...@hortonworks.com
   wrote:
  
Do not have access to my computer. Based on reading the previous
  email, I
do not see any thing suspicious on the list of objects in the histo
  live
dump.
   
I would like to hear from you about if it continued to grow. One
  instance
of this I had seen in the past was related to weak reference related
 to
socket objects.  I do not see that happening here though.
   
Sent from phone
   
On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com
 
wrote:
   
 Tried this..

 NameNode is still Ruining my Xmas on its slow death march to OOM.

 http://imagebin.org/240453


 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
sur...@hortonworks.comwrote:

 -XX:NewSize=1G -XX:MaxNewSize=1G
   
  
 
 
 
  --
  http://hortonworks.com/download/
 




-- 
http://hortonworks.com/download/

Re: NN Memory Jumps every 1 1/2 hours

2012-12-27 Thread Suresh Srinivas


 I tried your suggested setting and forced GC from Jconsole and once it
 crept up nothing was freeing up.


That is very surprising. If possible, take a live dump when namenode starts
up (when memory used is low) and when namenode memory consumption has gone
up considerably, closer to the heap limit.

BTW, are you running with that configuration - with younggen size set to
smaller size?



 So just food for thought:

 You said average file name size is 32 bytes. Well most of my data sits in

 /user/hive/warehouse/
 Then I have a tables with partitions.

 Does it make sense to just move this to /u/h/w?


In the directory structure in the namenode memory, there is one inode for
user, hive and warehouse. So it would save only couple of bytes. However on
fsimage in older releases, /user/hive/warehouse is repeated for every file.
This in the later release has been optimized. But these optimizations
affect only the fsimage and not the memory consumed on the namenode.


 Will I be saving 400,000,000 bytes of memory if I do?
 On Thu, Dec 27, 2012 at 5:41 PM, Suresh Srinivas sur...@hortonworks.com
 wrote:

  I do not follow what you mean here.
 
   Even when I forced a GC it cleared 0% memory.
  Is this with new younggen setting? Because earlier, based on the
  calculation I posted, you need ~11G in old generation. With 6G as the
  default younggen size, you actually had just enough memory to fit the
  namespace in oldgen. Hence you might not have seen Full GC freeing up
  enough memory.
 
  Have you tried Full GC with 1G youngen size have you tried this? I
 supsect
  you would see lot more memory freeing up.
 
   One would think that since the entire NameNode image is stored in
 memory
  that the heap would not need to grow beyond that
  Namenode image that you see during checkpointing is the size of file
  written after serializing file system namespace in memory. This is not
 what
  is directly stored in namenode memory. Namenode stores data structures
 that
  corresponds to file system directory tree and block locations. Out of
 this
  only file system directory is serialized and written to fsimage. Blocks
  locations are not.
 
 
 
 
  On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   I am not sure GC had a factor. Even when I forced a GC it cleared 0%
   memory. One would think that since the entire NameNode image is stored
 in
   memory that the heap would not need to grow beyond that, but that sure
  does
   not seem to be the case. a 5GB image starts off using 10GB of memory
 and
   after burn in it seems to use about 15GB memory.
  
   So really we say the name node data has to fit in memory but what we
   really mean is the name node data must fit in memory 3x
  
   On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas 
 sur...@hortonworks.com
   wrote:
  
You did free up lot of old generation with reducing young generation,
right? The extra 5G of RAM for the old generation should have helped.
   
Based on my calculation, for the current number of objects you have,
  you
need roughly:
12G of total heap with young generation size of 1G. This assumes the
average file name size is 32 bytes.
   
In later releases (= 0.20.204), several memory optimization and
  startup
optimizations have been done. It should help you as well.
   
   
   
On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo 
  edlinuxg...@gmail.com
wrote:
   
 So it turns out the issue was just the size of the filesystem.
 2012-12-27 16:37:22,390 WARN
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
 Checkpoint
done.
 New Image Size: 4,354,340,042

 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed.
 So
   you
 need about 3x ram as your FSImage size. If you do not have enough
 you
die a
 slow death.

 On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas 
   sur...@hortonworks.com
 wrote:

  Do not have access to my computer. Based on reading the previous
email, I
  do not see any thing suspicious on the list of objects in the
 histo
live
  dump.
 
  I would like to hear from you about if it continued to grow. One
instance
  of this I had seen in the past was related to weak reference
  related
   to
  socket objects.  I do not see that happening here though.
 
  Sent from phone
 
  On Dec 23, 2012, at 10:34 AM, Edward Capriolo 
  edlinuxg...@gmail.com
   
  wrote:
 
   Tried this..
  
   NameNode is still Ruining my Xmas on its slow death march to
 OOM.
  
   http://imagebin.org/240453
  
  
   On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
  sur...@hortonworks.comwrote:
  
   -XX:NewSize=1G -XX:MaxNewSize=1G
 

   
   
   
--
http://hortonworks.com/download/
   
  
 
 
 
  --
  http://hortonworks.com/download/
 




-- 
http://hortonworks.com/download/

Re: NN Memory Jumps every 1 1/2 hours

2012-12-23 Thread Suresh Srinivas

Do not have access to my computer. Based on reading the previous email, I do 
not see any thing suspicious on the list of objects in the histo live dump.

I would like to hear from you about if it continued to grow. One instance of 
this I had seen in the past was related to weak reference related to socket 
objects.  I do not see that happening here though. 

Sent from phone

On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Tried this..
 
 NameNode is still Ruining my Xmas on its slow death march to OOM.
 
 http://imagebin.org/240453
 
 
 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas 
 sur...@hortonworks.comwrote:
 
 -XX:NewSize=1G -XX:MaxNewSize=1G

Re: NN Memory Jumps every 1 1/2 hours

2012-12-22 Thread Suresh Srinivas

This looks to me is because of larger default young generation size in
newer java releases - see
http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html#heap_size.
I can see looking at your GC logs, around 6G space being used for young
generation (though I do not see logs related to minor collection). That
means for the same given number of objects, you have smaller old generation
space and hence old generation collection can no longer perform well.

It is unfortunate that such changes are made in java and that causes
previously working applications to fail. My suggestion is to not depend on
default young generation sizes any more. At large JVM sizes, the defaults
chosen by the JDK no longer works well. So I suggest protecting yourself
from such changes by explicitly specifying young generation size. Given my
experience of tuning GC at Yahoo clusters, at the number of objects you
have and total heap size you are allocating, I suggest setting the young
generation to 1G.

You can do that by adding

-XX:NewSize=1G -XX:MaxNewSize=1G

Let me know how it goes.

On Sat, Dec 22, 2012 at 5:59 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 6333.934: [Full GC 10391746K-9722532K(17194656K), 63.9812940 secs]




-- 
http://hortonworks.com/download/

Re: Is there an additional overhead when storing data in HDFS?

2012-11-20 Thread Suresh Srinivas

HDFS uses 4GB for the file + checksum data.

Default is for every 512 bytes of data, 4 bytes of checksum are stored. In
this case additional 32MB data.

On Tue, Nov 20, 2012 at 11:00 PM, WangRamon ramon_w...@hotmail.com wrote:

 Hi All

 I'm wondering if there is an additional overhead when storing some data
 into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is
 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or
 more then 4GB to store it? If it takes more than 4GB space, why?

 Thanks
 Ramon




-- 
http://hortonworks.com/download/

Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread Suresh Srinivas

Vinay, if the Hadoop docs are not clear in this regard, can you please
create a jira to add these details?

On Fri, Nov 16, 2012 at 12:31 AM, Vinayakumar B vinayakuma...@huawei.comwrote:

 Hi,

 ** **

 If you are moving from NonHA (single master) to HA, then follow the below
 steps.

 **1.   **Configure the another namenode’s configuration in the
 running namenode and all datanode’s configurations. And configure logical
 *fs.defaultFS*

 **2.   **Configure the shared storage related configuration.

 **3.   **Stop the running NameNode and all datanodes.

 **4.   **Execute ‘hdfs namenode –initializeSharedEdits’ from the
 existing namenode installation, to transfer the edits to shared storage.**
 **

 **5.   **Now format zkfc using ‘hdfs zkfc –formatZK’ and start zkfc
 using ‘hadoop-daemon.sh start zkfc’

 **6.   **Now restart the namenode from existing installation. If all
 configurations are fine, then NameNode should start successfully as
 STANDBY, then zkfc will make it to ACTIVE.

 ** **

 **7.   **Now install the NameNode in another machine (master2) with
 same configuration, except ‘dfs.ha.namenode.id’.

 **8.   **Now instead of format, you need to copy the name dir
 contents from another namenode (master1) to master2’s name dir. For this
 you are having 2 options.

 **a.   **Execute ‘hdfs namenode -bootStrapStandby’  from the master2
 installation.

 **b.  **Using ‘scp’ copy entire contents of name dir from master1 to
 master2’s name dir.

 **9.   **Now start the zkfc for second namenode ( No need to do zkfc
 format now). Also start the namenode (master2)

 ** **

 Regards,

 Vinay-

 *From:* Uma Maheswara Rao G [mailto:mahesw...@huawei.com]
 *Sent:* Friday, November 16, 2012 1:26 PM
 *To:* user@hadoop.apache.org
 *Subject:* RE: High Availability - second namenode (master2) issue:
 Incompatible namespaceIDs

 ** **

 If you format namenode, you need to cleanup storage directories of
 DataNode as well if that is having some data already. DN also will have
 namespace ID saved and compared with NN namespaceID. if you format NN, then
 namespaceID will be changed and DN may have still older namespaceID. So,
 just cleaning the data in DN would be fine.

  

 Regards,

 Uma
 --

 *From:* hadoop hive [hadooph...@gmail.com]
 *Sent:* Friday, November 16, 2012 1:15 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: High Availability - second namenode (master2) issue:
 Incompatible namespaceIDs

 Seems like you havn't format your cluster (if its 1st time made). 

 On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk a...@hsk.hk wrote:

 Hi, 

 ** **

 Please help!

 ** **

 I have installed a Hadoop Cluster with a single master (master1) and have
 HBase running on the HDFS.  Now I am setting up the second master
  (master2) in order to form HA.  When I used JPS to check the cluster, I
 found :

 ** **

 2782 Jps

 2126 NameNode

 2720 SecondaryNameNode

 i.e. The datanode on this server could not be started

 ** **

 In the log file, found: 

 2012-11-16 10:28:44,851 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
 = 1356148070; datanode namespaceID = 1151604993

 ** **

 ** **

 ** **

 One of the possible solutions to fix this issue is to:  stop the cluster,
 reformat the NameNode, restart the cluster.

 QUESTION: As I already have HBASE running on the cluster, if I reformat
 the NameNode, do I need to reinstall the entire HBASE? I don't mind to have
 all data lost as I don't have many data in HBASE and HDFS, however I don't
 want to re-install HBASE again.

 ** **

 ** **

 On the other hand, I have tried another solution: stop the DataNode, edit
 the namespaceID in current/VERSION (i.e. set namespaceID=1151604993),
 restart the datanode, it doesn't work:

 Warning: $HADOOP_HOME is deprecated.

 starting master2, logging to
 /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out*
 ***

 Exception in thread main java.lang.NoClassDefFoundError: master2

 Caused by: java.lang.ClassNotFoundException: master2

 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

 Could not find the main class: master2.  Program will exit.

 QUESTION: Any other solutions?

 ** **

 ** **

 ** **

 Thanks

 ** **

 ** **

 ** **

   

 ** **

 ** **

 ** **




-- 
http://hortonworks.com/download/

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas

- A datanode is typically kept free with up to 5 free blocks (HDFS block
size) of space.
- Disk space is used by mapreduce jobs to store temporary shuffle spills
also. This is what dfs.datanode.du.reserved is used to configure. The
configuration is available in hdfs-site.xml. If you have not configured it
then reserved space is 0. Not only mapreduce, other files also might take
up the disk space.

When these errors are thrown, please send the namenode web UI information.
It has storage related information in the cluster summary. That will help
debug.


On Tue, Sep 4, 2012 at 9:41 AM, Keith Wiley kwi...@keithwiley.com wrote:

 I've been running up against the good old fashioned replicated to 0
 nodes gremlin quite a bit recently.  My system (a set of processes
 interacting with hadoop, and of course hadoop itself) runs for a while (a
 day or so) and then I get plagued with these errors.  This is a very simple
 system, a single node running pseudo-distributed.  Obviously, the
 replication factor is implicitly 1 and the datanode is the same machine as
 the namenode.  None of the typical culprits seem to explain the situation
 and I'm not sure what to do.  I'm also not sure how I'm getting around it
 so far.  I fiddle desperately for a few hours and things start running
 again, but that's not really a solution...I've tried stopping and
 restarting hdfs, but that doesn't seem to improve things.

 So, to go through the common suspects one by one, as quoted on
 http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo:

 • No DataNode instances being up and running. Action: look at the servers,
 see if the processes are running.

 I can interact with hdfs through the command line (doing directory
 listings for example).  Furthermore, I can see that the relevant java
 processes are all running (NameNode, SecondaryNameNode, DataNode,
 JobTracker, TaskTracker).

 • The DataNode instances cannot talk to the server, through networking or
 Hadoop configuration problems. Action: look at the logs of one of the
 DataNodes.

 Obviously irrelevant in a single-node scenario.  Anyway, like I said, I
 can perform basic hdfs listings, I just can't upload new data.

 • Your DataNode instances have no hard disk space in their configured data
 directories. Action: look at the dfs.data.dir list in the node
 configurations, verify that at least one of the directories exists, and is
 writeable by the user running the Hadoop processes. Then look at the logs.

 There's plenty of space, at least 50GB.

 • Your DataNode instances have run out of space. Look at the disk capacity
 via the Namenode web pages. Delete old files. Compress under-used files.
 Buy more disks for existing servers (if there is room), upgrade the
 existing servers to bigger drives, or add some more servers.

 Nope, 50GBs free, I'm only uploading a few KB at a time, maybe a few MB.

 • The reserved space for a DN (as set in dfs.datanode.du.reserved is
 greater than the remaining free space, so the DN thinks it has no free space

 I grepped all the files in the conf directory and couldn't find this
 parameter so I don't really know anything about it.  At any rate, it seems
 rather esoteric, I doubt it is related to my problem.  Any thoughts on this?

 • You may also get this message due to permissions, eg if JT can not
 create jobtracker.info on startup.

 Meh, like I said, the system basicaslly works...and then stops working.
  The only explanation that would really make sense in that context is
 running out of space...which isn't happening. If this were a permission
 error, or a configuration error, or anything weird like that, then the
 whole system would never get up and running in the first place.

 Why would a properly running hadoop system start exhibiting this error
 without running out of disk space?  THAT's the real question on the table
 here.

 Any ideas?


 
 Keith Wiley kwi...@keithwiley.com keithwiley.com
 music.keithwiley.com

 Yet mark his perfect self-contentment, and hence learn his lesson, that
 to be
 self-contented is to be vile and ignorant, and that to aspire is better
 than to
 be blindly and impotently happy.
--  Edwin A. Abbott, Flatland

 




-- 
http://hortonworks.com/download/

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas

Keith,

Assuming that you were seeing the problem when you captured the namenode
webUI info, it is not related to what I suspect. This might be a good
question for CDH forums given this is not an Apache release.

Regards,
Suresh

On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley kwi...@keithwiley.com wrote:

 On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote:

  When these errors are thrown, please send the namenode web UI
 information. It has storage related information in the cluster summary.
 That will help debug.

 Sure thing.  Thanks.  Here's what I currently see.  It looks like the
 problem isn't the datanode, but rather the namenode.  Would you agree with
 that assessment?

 NameNode 'localhost:9000'

 Started: Tue Sep 04 10:06:52 PDT 2012
 Version: 0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9
 Compiled:Thu Jan 26 11:55:16 PST 2012 by root from Unknown
 Upgrades:There are no upgrades in progress.

 Browse the filesystem
 Namenode Logs
 Cluster Summary

 Safe mode is ON. Resources are low on NN. Safe mode must be turned off
 manually.
 1639 files and directories, 585 blocks = 2224 total. Heap Size is 39.55 MB
 / 888.94 MB (4%)
 Configured Capacity  :   49.21 GB
 DFS Used :   9.9 MB
 Non DFS Used :   2.68 GB
 DFS Remaining:   46.53 GB
 DFS Used%:   0.02 %
 DFS Remaining%   :   94.54 %
 Live Nodes   :   1
 Dead Nodes   :   0
 Decommissioning Nodes:   0
 Number of Under-Replicated Blocks:   5

 NameNode Storage:

 Storage Directory   TypeState
 /var/lib/hadoop-0.20/cache/hadoop/dfs/name  IMAGE_AND_EDITS Active

 Cloudera's Distribution including Apache Hadoop, 2012.


 
 Keith Wiley kwi...@keithwiley.com keithwiley.com
 music.keithwiley.com

 And what if we picked the wrong religion?  Every week, we're just making
 God
 madder and madder!
--  Homer Simpson

 




-- 
http://hortonworks.com/download/

Re: Hadoop WebUI

2012-08-01 Thread Suresh Srinivas

Clement,

To get the details related to how to contribute - see
http://wiki.apache.org/hadoop/HowToContribute.

UI is simple because it serves the purpose. More sophisticated UI for
management and monitoring is being done in Ambari, see -
http://incubator.apache.org/ambari/.

The core hadoop UIs could be better. Please create a jira with your
proposal and a brief design document. Create separate jira for HDFS and
MapReduce (depending on where you want to to the work).

Regards,
Suresh

On Wed, Aug 1, 2012 at 10:27 AM, Clement Jebakumar jeba.r...@gmail.comwrote:


 hi,

 I have observed for very longtime, that hadoop ui is simple.(ofcourse it
 has information which are required). but still

 Is there any reason for it? I thought of working on the UI as it is
 required for my cloud setup.. If i work on this, i can give the patch of my
 contributution to hadoop. how i can do my contrib to hadoop?

 Now currenly i am doing my updates in trunk.. is that ok to do with trunk?
 Give your views?

 *Clement Jebakumar,*
 111/27 Keelamutharamman Kovil Street,
 Tenkasi, 627 811
 http://www.declum.com/clement.html




-- 
http://hortonworks.com/download/

Re: Namenode and Jobtracker dont start

2012-07-18 Thread Suresh Srinivas

Can you share information on the java version that you are using.
- Is it as obvious as some previous processes still running and new
processes cannot bind to the port?
- Another pointer -
http://stackoverflow.com/questions/8360913/weird-java-net-socketexception-permission-denied-connect-error-when-running-groo

On Wed, Jul 18, 2012 at 7:29 AM, Björn-Elmar Macek
ma...@cs.uni-kassel.dewrote:

 Hi,

 i have lately been running into problems since i started running hadoop on
 a cluster:

 The setup is the following:
 1 Computer is NameNode and Jobtracker
 1 Computer is SecondaryNameNode
 2 Computers are TaskTracker and DataNode

 I ran into problems with running the wordcount example: NameNode and
 Jobtracker do not start properly both having connection problems of some
 kind.
 And this is although ssh is configured that way, that no prompt happens
 when i connect from any node in the cluster to any other.

 Is there any reason why this happens?

 The logs look like the following:
 \ JOBTRACKER**__
 2012-07-18 16:08:05,808 INFO org.apache.hadoop.mapred.**JobTracker:
 STARTUP_MSG:
 /
 STARTUP_MSG: Starting JobTracker
 STARTUP_MSG:   host = 
 its-cs100.its.uni-kassel.de/**141.51.205.10http://its-cs100.its.uni-kassel.de/141.51.205.10
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.0.2
 STARTUP_MSG:   build = https://svn.apache.org/repos/**
 asf/hadoop/common/branches/**branch-1.0.2https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2-r
  1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
 /
 2012-07-18 16:08:06,479 INFO org.apache.hadoop.metrics2.**impl.MetricsConfig:
 loaded properties from hadoop-metrics2.properties
 2012-07-18 16:08:06,534 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source MetricsSystem,sub=Stats registered.
 2012-07-18 16:08:06,554 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
 Scheduled snapshot period at 10 second(s).
 2012-07-18 16:08:06,554 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
 JobTracker metrics system started
 2012-07-18 16:08:07,157 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source QueueMetrics,q=default registered.
 2012-07-18 16:08:10,395 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source ugi registered.
 2012-07-18 16:08:10,417 INFO org.apache.hadoop.security.**
 token.delegation.**AbstractDelegationTokenSecretM**anager: Updating the
 current master key for generating delegation tokens
 2012-07-18 16:08:10,436 INFO org.apache.hadoop.mapred.**JobTracker:
 Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
 limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
 2012-07-18 16:08:10,438 INFO org.apache.hadoop.util.**HostsFileReader:
 Refreshing hosts (include/exclude) list
 2012-07-18 16:08:10,440 INFO org.apache.hadoop.security.**
 token.delegation.**AbstractDelegationTokenSecretM**anager: Starting
 expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
 2012-07-18 16:08:10,465 INFO org.apache.hadoop.security.**
 token.delegation.**AbstractDelegationTokenSecretM**anager: Updating the
 current master key for generating delegation tokens
 2012-07-18 16:08:10,510 INFO org.apache.hadoop.mapred.**JobTracker:
 Starting jobtracker with owner as bmacek
 2012-07-18 16:08:10,620 WARN org.apache.hadoop.mapred.**JobTracker: Error
 starting tracker: java.net.SocketException: Permission denied
 at sun.nio.ch.Net.bind(Native Method)
 at sun.nio.ch.**ServerSocketChannelImpl.bind(**
 ServerSocketChannelImpl.java:**119)
 at sun.nio.ch.**ServerSocketAdaptor.bind(**
 ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
 at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**301)
 at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
 at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
 at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
 at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.**
 java:2306)
 at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.**
 java:2192)
 at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.**
 java:2186)
 at org.apache.hadoop.mapred.**JobTracker.startTracker(**
 JobTracker.java:300)
 at org.apache.hadoop.mapred.**JobTracker.startTracker(**
 JobTracker.java:291)
 at org.apache.hadoop.mapred.**JobTracker.main(JobTracker.**java:4978)

 2012-07-18 16:08:13,861 WARN 
 org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
 Source name QueueMetrics,q=default already exists!
 2012-07-18 16:08:13,885 WARN 
 org.apache.hadoop.metrics2.**impl.MetricsSystemImpl:
 Source name ugi already exists!
 2012-07-18 16:08:13,885 INFO org.apache.hadoop.security.**

Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?

2012-05-08 Thread Suresh Srinivas

This change in merged into branch-1 and will be available in release 1.1.

On Mon, May 7, 2012 at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote:

 Can someone backport HADOOP-6546: BloomMapFile can return false negatives
 to branch-1 for the next 1+ release?

 Without this fix BloomMapFile is somewhat useless because having no false
 negatives is a core feature of BloomFilters. I am surprised that both
 hadoop 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.

Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?

2012-05-07 Thread Suresh Srinivas

I have marked it for 1.1. I will follow up on promoting the path.

Regards,
Suresh

On May 7, 2012, at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote:

 Can someone backport HADOOP-6546: BloomMapFile can return false negatives to 
 branch-1 for the next 1+ release?
 
 Without this fix BloomMapFile is somewhat useless because having no false 
 negatives is a core feature of BloomFilters. I am surprised that both hadoop 
 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Suresh Srinivas

This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor,
decommission some of the nodes and create a new cluster with those nodes
and do distcp.

Could you share with us the reasons you want to migrate from Apache 205?

Regards,
Suresh

On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Honestly that is a hassle, going from 205 to cdh3u3 is probably more
 or a cross-grade then an upgrade or downgrade. I would just stick it
 out. But yes like Michael said two clusters on the same gear and
 distcp. If you are using RF=3 you could also lower your replication to
 rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving
 stuff.


 On Thu, May 3, 2012 at 7:25 AM, Michel Segel michael_se...@hotmail.com
 wrote:
  Ok... When you get your new hardware...
 
  Set up one server as your new NN, JT, SN.
  Set up the others as a DN.
  (Cloudera CDH3u3)
 
  On your existing cluster...
  Remove your old log files, temp files on HDFS anything you don't need.
  This should give you some more space.
  Start copying some of the directories/files to the new cluster.
  As you gain space, decommission a node, rebalance, add node to new
 cluster...
 
  It's a slow process.
 
  Should I remind you to make sure you up you bandwidth setting, and to
 clean up the hdfs directories when you repurpose the nodes?
 
  Does this make sense?
 
  Sent from a remote device. Please excuse any typos...
 
  Mike Segel
 
  On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote:
 
  Yeah I know :-)
  and this is not a production cluster ;-) and yes there is more hardware
  coming :-)
 
  On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.com
 wrote:
 
  Well, you've kind of painted yourself in to a corner...
  Not sure why you didn't get a response from the Cloudera lists, but
 it's a
  generic question...
 
  8 out of 10 TB. Are you talking effective storage or actual disks?
  And please tell me you've already ordered more hardware.. Right?
 
  And please tell me this isn't your production cluster...
 
  (Strong hint to Strata and Cloudea... You really want to accept my
  upcoming proposal talk... ;-)
 
 
  Sent from a remote device. Please excuse any typos...
 
  Mike Segel
 
  On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com
 wrote:
 
  Yes. This was first posted on the cloudera mailing list. There were no
  responses.
 
  But this is not related to cloudera as such.
 
  cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
  hadoop 0.20.205
 
  There is an upgrade namenode option when we are migrating to a higher
  version say from 0.20 to 0.20.205
  but here I am downgrading from 0.20.205 to 0.20 (cdh3)
  Is this possible?
 
 
  On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi 
 prash1...@gmail.com
  wrote:
 
  Seems like a matter of upgrade. I am not a Cloudera user so would not
  know
  much, but you might find some help moving this to Cloudera mailing
 list.
 
  On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com
  wrote:
 
  There is only one cluster. I am not copying between clusters.
 
  Say I have a cluster running apache 0.20.205 with 10 TB storage
  capacity
  and has about 8 TB of data.
  Now how can I migrate the same cluster to use cdh3 and use that
 same 8
  TB
  of data.
 
  I can't copy 8 TB of data using distcp because I have only 2 TB of
 free
  space
 
 
  On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar 
 nitinpawar...@gmail.com
  wrote:
 
  you can actually look at the distcp
 
  http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
 
  but this means that you have two different set of clusters
 available
  to
  do
  the migration
 
  On Thu, May 3, 2012 at 12:51 PM, Austin Chungath 
 austi...@gmail.com
  wrote:
 
  Thanks for the suggestions,
  My concerns are that I can't actually copyToLocal from the dfs
  because
  the
  data is huge.
 
  Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do
 a
  namenode upgrade. I don't have to copy data out of dfs.
 
  But here I am having Apache hadoop 0.20.205 and I want to use CDH3
  now,
  which is based on 0.20
  Now it is actually a downgrade as 0.20.205's namenode info has to
 be
  used
  by 0.20's namenode.
 
  Any idea how I can achieve what I am trying to do?
 
  Thanks.
 
  On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar 
  nitinpawar...@gmail.com
  wrote:
 
  i can think of following options
 
  1) write a simple get and put code which gets the data from DFS
 and
  loads
  it in dfs
  2) see if the distcp  between both versions are compatible
  3) this is what I had done (and my data was hardly few hundred
 GB)
  ..
  did a
  dfs -copyToLocal and then in the new grid did a copyFromLocal
 
  On Thu, May 3, 2012 at 11:41 AM, Austin Chungath 
  austi...@gmail.com
 
  wrote:
 
  Hi,
  I am migrating from Apache hadoop 0.20.205 to CDH3u3.
  I don't want to lose the data

Re: hadoop permission guideline

2012-03-22 Thread Suresh Srinivas

Can you please take this discussion CDH mailing list?

On Mar 22, 2012, at 7:51 AM, Michael Wang michael.w...@meredith.com wrote:

 I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to 
 install all needed packages. When it was installed, the root is used.  I 
 found the installation created some users, such as hdfs, hive, 
 mapred,hue,hbase...
 After the installation, should we change some permission or ownership of some 
 directories/files? For example, to use HIVE. It works fine with root user, 
 since the metatore directory belongs to root. But in order to let other user 
 use HIVE, I have to change metastore ownership to a specific non-root user, 
 then it works. Is it the best practice?
 Another example is the start-all.sh, stop-all.sh they all belong to root. 
 Should I change them to other user? I guess there are more cases...
 
 Thanks,
 
 
 
 This electronic message, including any attachments, may contain proprietary, 
 confidential or privileged information for the sole use of the intended 
 recipient(s). You are hereby notified that any unauthorized disclosure, 
 copying, distribution, or use of this message is prohibited. If you have 
 received this message in error, please immediately notify the sender by reply 
 e-mail and delete it.

Re: Issue when starting services on CDH3

2012-03-15 Thread Suresh Srinivas

Guys, can you please take this up in CDH related mailing lists.

On Thu, Mar 15, 2012 at 10:01 AM, Manu S manupk...@gmail.com wrote:

 Because for large clusters we have to run namenode in a single node,
 datanode in another nodes
 So we can start namenode and jobtracker in master node and datanode n
 tasktracker in slave nodes

 For getting more clarity You can check the service status after starting

 Verify these:
 dfs.name.dir hdfs:hadoop drwx--
 dfs.data.dir hdfs:hadoop drwx--

 mapred.local.dir mapred:hadoop drwxr-xr-x

 Please follow each steps in this link
 https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster
  On Mar 15, 2012 9:52 PM, Manish Bhoge manishbh...@rocketmail.com
 wrote:

  Ys, I understand the order and I formatted namenode before starting
  services. As I suspect there may be ownership and an access issue. Not
 able
  to nail down issue exactly. I also have question why there are 2 routes
 to
  start services. When we have start-all.sh script then why need to go to
  init.d to start services??
 
 
  Thank you,
  Manish
  Sent from my BlackBerry, pls excuse typo
 
  -Original Message-
  From: Manu S manupk...@gmail.com
  Date: Thu, 15 Mar 2012 21:43:26
  To: common-user@hadoop.apache.org; manishbh...@rocketmail.com
  Reply-To: common-user@hadoop.apache.org
  Subject: Re: Issue when starting services on CDH3
 
  Did you check the service status?
  Is it like dead, but pid exist?
 
  Did you check the ownership and permissions for the
  dfs.name.dir,dfs.data.dir,mapped.local.dir etc ?
 
  The order for starting daemons are like this:
  1 namenode
  2 datanode
  3 jobtracker
  4 tasktracker
 
  Did you format the namenode before starting?
  On Mar 15, 2012 9:31 PM, Manu S manupk...@gmail.com wrote:
 
   Dear manish
   Which daemons are not starting?
  
   On Mar 15, 2012 9:21 PM, Manish Bhoge manishbh...@rocketmail.com
   wrote:
   
I have CDH3 installed in standalone mode. I have install all hadoop
   components. Now when I start services (namenode,secondary namenode,job
   tracker,task tracker) I can start gracefully from /usr/lib/hadoop/
   ./bin/start-all.sh. But when start the same servises from
   /etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to
  start
   Hue also which is in init.d that also I couldn't start. Here I suspect
   authentication issue. Because all the services in init.d are under root
   user and root group. Please suggest I am stuck here. I tried hive and
 it
   seems it running fine.
Thanks
Manish.
Sent from my BlackBerry, pls excuse typo

Re: Questions about HDFS’s placement policy

2012-03-14 Thread Suresh Srinivas

See my comments inline:

On Wed, Mar 14, 2012 at 9:24 AM, Giovanni Marzulli 
giovanni.marzu...@ba.infn.it wrote:

  Hello,

 I'm trying HDFS on a small test cluster and I need to clarify some doubts
 about hadoop behaviour.

 Some details of my cluster:
 Hadoop version: 0.20.2
 I have two racks (rack1, rack2). Three datanodes for every rack.
 Replication factor is set to 3.

 HDFS’s placement policy is to put one replica on one node in the local
 rack, another on a node in a different (remote) rack, and the last on a
 different node in the same remote rack.
 Instead, I noticed that sometimes, a few blocks of files are stored as
 follows: two replicas in the local rack and a replica in a different rack. Are
 there exceptions that cause different behaviour than default placement
 policy?


Your description of replica placement is correct. However a node chosen
based on this placement may not be a good target, due to the traffic on the
node, remaining space etc. See BlockPlacementPolicyDefault#isGoodTarget().
Given the small cluster size, you may be seeing different behavior based on
load of individual nodes, racks etc.

Likewise, at times some blocks are read from nodes in the remote rack
 instead of nodes in the local rack. Why does it happen?


This is surprising. Not sure if the topology is correctly configired.


 Another thing: if I have two datacenters and two racks for each of them
 (so a hierarchical network topology), where two remote replicas arestored? 
 Does Hadoop consider the hierarchy and stores one replica in the
 local datacenter and two replicas in the other datacenter? Or the two
 replicas are stored in a totally random rack?

 Hadoop clusters are not spread across datacenters.

Regards,
Suresh

Re: What is the NEW api?

2012-03-11 Thread Suresh Srinivas

 there are many people talking about the NEW API
This might be related to releases 0.21 or later, where append and related 
functionality is re-implemented. 

1.0 comes from 0.20.205 and has same API as 0.20-append.

Sent from phone

On Mar 11, 2012, at 6:27 PM, WangRamon ramon_w...@hotmail.com wrote:

 
 
 
 
 Hi all I've been with Hadoop-0.20-append for a few time and I plan to upgrade 
 to 1.0.0 release, but i find there are many people taking about the NEW API, 
 so I'm lost, can anyone please tell me what is the new API? Is the OLD one 
 available in the 1.0.0 release? Thanks CheersRamon

Re: Backupnode in 1.0.0?

2012-02-23 Thread Suresh Srinivas

On Thu, Feb 23, 2012 at 12:41 AM, Jeremy Hansen jer...@skidrow.la wrote:

 Thanks.  Could you clarify what BackupNode does?

 -jeremy


Namenode currently keeps the entire file system namespace in memory. It
logs the write operations (create, delete file etc.) into a journal file
called editlog. This journal needs to be merged with the file system image
periodically to avoid journal file growing to a large size. This is called
checkpointing. Checkpoint also reduces the startup time, since the namenode
need not load large editlog file.

Prior to release 0.21, another node called SecondaryNamenode was used  for
checkpointing. It periodically gets the file system image and edit, load it
into memory and write checkpoint image. This image is then then shipped to
the Namenode.

In 0.21, BackupNode was introduced. Unlike SecondaryNamenode, it gets edits
streamed from the Namenode. It periodically writes the checkpoint image and
ships it back to Namenode. The goal was for this to become Standby node,
towards Namenode HA. Konstantin and few others are pursuing this.

I have not seen any deployments of BackupNode in production. I would love
to hear if any one has deployed it in production and how stable it is.

Regards,
Suresh

Re: Backupnode in 1.0.0?

2012-02-22 Thread Suresh Srinivas

Joey,

Can you please answer the question from in the context of Apache releases.
Not sure if CDH4b1 needs to be mentioned in the context of this mailing
list.

Regards,
Suresh

On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote:

 Check out this branch for the 0.22 version of Bigtop:

 https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/

 However, I don't think BackupNode is what you want. It sounds like you
 want HA which is coming in (hopefully) 0.23.2 and is also available
 today in CDH4b1.

 -Joey

 On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote:
  By the way, I don't see anything 0.22 based in the bigtop repos.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:
 
  I guess I thought that backupnode would provide some level of namenode
 redundancy.  Perhaps I don't fully understand.
 
  I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
  Check out the Apache Bigtop project. I believe they have 0.22 RPMs.
 
  Out of curiosity, why are you interested in BackupNode?
 
  -Joey
 
  Sent from my iPhone
 
  On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
  Any possibility of getting spec files to create packages for 0.22?
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
  BackupNode is major functionality with change in required in RPC
 protocols,
  configuration etc. Hence it will not be available in bug fix release
 1.0.1.
 
  It is also unlikely to be not available on minor releases in 1.x
 release
  streams.
 
  Regards,
  Suresh
 
  On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la
 wrote:
 
 
  It looks as if backupnode isn't supported in 1.0.0?  Any chances
 it's in
  1.0.1?
 
  Thanks
  -jeremy
 
 
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434

Re: Setting up Federated HDFS

2012-02-07 Thread Suresh Srinivas

On Tue, Feb 7, 2012 at 4:51 PM, Chandrasekar chandruseka...@gmail.comwrote:

   In which file should i specify all this information about
 nameservices and the list of namenodes?


hdfs-site.xml is the appropriate place, since it is hdfs-specific
configuration.

  If there are multiple namenodes, then which one should i specify in
 core-site.xml as fs.defaultFS?


core-site.xml is the right place for fs.defaultFS.

Given you have multiple namespaces from in federation setup, fs.defaultFS
should point to ViewFileSystem for a unified view of the namespaces to the
clients. There is an open bug HDFS-2558 to track this. I will get to this
as soon as I can.

Regards,
Suresh

Re: HDFS Federation Exception

2012-01-11 Thread Suresh Srinivas

Thanks for figuring that. Could you create an HDFS Jira for this issue?

On Wednesday, January 11, 2012, Praveen Sripati praveensrip...@gmail.com
wrote:
 Hi,

 The documentation (1) suggested to set the `dfs.namenode.rpc-address.ns1`
property to `hdfs://nn-host1:rpc-port` in the example. Changing the value
to `nn-host1:rpc-port` (removing hdfs://) solved the problem. The document
needs to be updated.

 (1) -
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html

 Praveen

 On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati praveensrip...@gmail.com
wrote:

 Hi,

 Got the latest code to see if any bugs were fixed and did try federation
with the same configuration, but was getting similar exception.

 2012-01-11 15:25:35,321 ERROR namenode.NameNode (NameNode.java:main(803))
- Exception in namenode join
 java.io.IOException: Failed on local exception: java.net.SocketException:
Unresolved address; Host Details : local host is: hdfs; destination host
is: (unknown):0;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:895)
 at org.apache.hadoop.ipc.Server.bind(Server.java:231)
 at org.apache.hadoop.ipc.Server$Listener.init(Server.java:313)
 at org.apache.hadoop.ipc.Server.init(Server.java:1600)
 at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:576)
 at
org.apache.hadoop.ipc.WritableRpcEngine$Server.init(WritableRpcEngine.java:322)
 at
org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:282)
 at
org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46)
 at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550)
 at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.init(NameNodeRpcServer.java:145)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:356)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:334)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:458)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:450)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
 Caused by: java.net.SocketException: Unresolved address
 at sun.nio.ch.Net.translateToSocketException(Net.java:58)
 at sun.nio.ch.Net.translateException(Net.java:84)
 at sun.nio.ch.Net.translateException(Net.java:90)
 at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:61)
 at org.apache.hadoop.ipc.Server.bind(Server.java:229)
 ... 14 more
 Caused by: java.nio.channels.UnresolvedAddressException
 at sun.nio.ch.Net.checkAddress(Net.java:30)
 at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:122)
 at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
 ... 15 more

 Regards,
 Praveen

 On Wed, Jan 11, 2012 at 12:24 PM, Praveen Sripati 
praveensrip...@gmail.com wrote:

 Hi,

 I am trying to setup a HDFS federation and getting the below error. Also,
pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I
miss something in the configuration files?

 2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803))
- Exception in namenode join
 java.lang.IllegalArgumentException: Can't parse port ''
 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329)
 at org.apache.hadoop.hdfs.server.namenode.N

Re: datanode failing to start

2012-01-09 Thread Suresh Srinivas

Can you please send your notes on what info is out of date or better still
create a jira so that it can be addressed.

On Fri, Jan 6, 2012 at 3:11 PM, Dave Kelsey da...@gamehouse.com wrote:

 gave up and installed version 1.
 it installed correctly and worked, thought the instructions for setup and
 the location of scripts and configs are now out of date.

 D

 On 1/5/2012 10:25 AM, Dave Kelsey wrote:


 java version 1.6.0_29
 hadoop: 0.20.203.0

 I'm attempting to setup the pseudo-distributed config on a mac 10.6.8.
 I followed the steps from the QuickStart (http://wiki.apache.org./**
 hadoop/QuickStart http://wiki.apache.org./hadoop/QuickStart) and
 succeeded with Stage 1: Standalone Operation.
 I followed the steps for Stage 2: Pseudo-distributed Configuration.
 I set the JAVA_HOME variable in conf/hadoop-env.sh and I changed
 tools.jar to the location of classes.jar (a mac version of tools.jar)
 I've modified the three .xml files as described in the QuickStart.
 ssh'ing to localhost has been configured and works with passwordless
 authentication.
 I formatted the namenode with bin/hadoop namenode -format as the
 instructions say

 This is what I see when I run bin/start-all.sh

 root# bin/start-all.sh
 starting namenode, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-namenode-Hoot-2.local.out
 localhost: starting datanode, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-datanode-Hoot-2.local.out
 localhost: Exception in thread main java.lang.**NoClassDefFoundError:
 server
 localhost: Caused by: java.lang.**ClassNotFoundException: server
 localhost: at java.net.URLClassLoader$1.run(**
 URLClassLoader.java:202)
 localhost: at java.security.**AccessController.doPrivileged(**Native
 Method)
 localhost: at java.net.URLClassLoader.**
 findClass(URLClassLoader.java:**190)
 localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:**
 306)
 localhost: at sun.misc.Launcher$**AppClassLoader.loadClass(**
 Launcher.java:301)
 localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:**
 247)
 localhost: starting secondarynamenode, logging to
 /Users/admin/hadoop/hadoop-0.**20.203.0/bin/../logs/hadoop-**
 root-secondarynamenode-Hoot-2.**local.out
 starting jobtracker, logging to /Users/admin/hadoop/hadoop-0.**
 20.203.0/bin/../logs/hadoop-**root-jobtracker-Hoot-2.local.**out
 localhost: starting tasktracker, logging to /Users/admin/hadoop/hadoop-0.
 **20.203.0/bin/../logs/hadoop-**root-tasktracker-Hoot-2.local.**out

 There are 4 processes running:
 ps -fax | grep hadoop | grep -v grep | wc -l
  4

 They are:
 SecondaryNameNode
 TaskTracker
 NameNode
 JobTracker


 I've searched to see if anyone else has encountered this and not found
 anything

 d

 p.s. I've also posted this to core-u...@hadoop.apache.org which I've yet
 to find how to subscribe to.

Re: HDFS load balancing for non-local reads

2012-01-05 Thread Suresh Srinivas

Currently it sorts the block locations as:
# local node
# local rack node
# random order of remote nodes

See DatanodeManager#sortLocatedBlock(...) and
NetworkTopology#pseudoSortByDistance(...).

You can play around with other policies by plugging in different
NetworkTopology.

On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay rbc...@ncsu.edu wrote:

 Hi-

 ** **

 How does the NameNode handle load balancing of non-local reads with
 multiple block locations when locality is equal?

 ** **

 IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
 same block, does the NameNode consider current client count or any other
 load indicators when deciding which DataNode will satisfy the read
 request?  Or, is the client provided a list of all split locations and is
 allowed to make this choice themselves?

 ** **

 Thanks!

 ** **

 -Ben

 ** **

Re: HDFS Backup nodes

2011-12-13 Thread Suresh Srinivas

Srivas,

As you may know already, NFS is just being used in the first prototype for
HA.

Two options for editlog store are:
1. Using BookKeeper. Work has already completed on trunk towards this. This
will replace need for NFS to  store the editlogs and is highly available.
This solution will also be used for HA.
2. We have a short term goal also to enable editlogs going to HDFS itself.
The work is in progress.

Regards,
Suresh



 -- Forwarded message --
 From: M. C. Srivas mcsri...@gmail.com
 Date: Sun, Dec 11, 2011 at 10:47 PM
 Subject: Re: HDFS Backup nodes
 To: common-user@hadoop.apache.org


 You are out of luck if you don't want to use NFS, and yet want redundancy
 for the NN.  Even the new NN HA work being done by the community will
 require NFS ... and the NFS itself needs to be HA.

 But if you use a Netapp, then the likelihood of the Netapp crashing is
 lower than the likelihood of a garbage-collection-of-death happening in the
 NN.

 [ disclaimer:  I don't work for Netapp, I work for MapR ]


 On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:

  Thanks Joey. We've had enough problems with nfs (mainly under very high
  load) that we thought it might be riskier to use it for the NN.
 
  randy
 
 
  On 12/07/2011 06:46 PM, Joey Echeverria wrote:
 
  Hey Rand,
 
  It will mark that storage directory as failed and ignore it from then
  on. In order to do this correctly, you need a couple of options
  enabled on the NFS mount to make sure that it doesn't retry
  infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
  options set.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
 
  What happens then if the nfs server fails or isn't reachable? Does hdfs
  lock up? Does it gracefully ignore the nfs copy?
 
  Thanks,
  randy
 
  - Original Message -
  From: Joey Echeverriaj...@cloudera.com
  To: common-user@hadoop.apache.org
  Sent: Wednesday, December 7, 2011 6:07:58 AM
  Subject: Re: HDFS Backup nodes
 
  You should also configure the Namenode to use an NFS mount for one of
  it's storage directories. That will give the most up-to-date back of
  the metadata in case of total node failure.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
   wrote:
 
  This means still we are relying on Secondary NameNode idealogy for
  Namenode's backup.
  Can OS-mirroring of Namenode is a good alternative keep it alive all
 the
  time ?
 
  Thanks,
  Praveenesh
 
  On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
  mahesw...@huawei.comwrote:
 
   AFAIK backup node introduced in 0.21 version onwards.
  __**__
  From: praveenesh kumar [praveen...@gmail.com]
  Sent: Wednesday, December 07, 2011 12:40 PM
  To: common-user@hadoop.apache.org
  Subject: HDFS Backup nodes
 
  Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
 
  Thanks,
  Praveenesh
 
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434

Re: Difference between DFS Used and Non-DFS Used

2011-07-08 Thread Suresh Srinivas

non DFS storage is not required, it is provided as information only to shown
how the storage is being used.

The available storage on the disks is used for both DFS and non DFS
(mapreduce shuffle output and any other files that could be on the disks).

See if you have unnecessary files or shuffle output that is lingering on
these disks, that is contributing to 250GB. Delete the unneeded files and
you should be able to reclaim some of the 250GB.

On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
sagar_shu...@persistent.co.inwrote:

 Thanks Harsh. My first question still remains unanswered - Why does it
 require non-DFS storage?. If it is cache data then it should get flushed
 from the system after certain interval of time. And if it is useful data
 then it should have been part of used DFS data.

 I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
 is around 250 GB which is quite ridiculous.

 Thanks,
 Sagar

 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: Friday, July 08, 2011 4:42 PM
 To: common-user@hadoop.apache.org
 Subject: Re: Difference between DFS Used and Non-DFS Used

 It is just for information's sake (cause it can be computed with the
 data collected). The space is accounted just to let you know that
 there's something being stored on the DataNodes apart from just the
 HDFS data, in case you are running out of space.

 On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
 sagar_shu...@persistent.co.in wrote:
  Hi Harsh,
  Thanks for your reply.
 
  But why does it require non-DFS storage ? And why that space is accounted
 differently from regular DFS storage ?
 
  Ideally, it should have been part of same storage.
 
  Thanks,
  Sagar
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: Thursday, July 07, 2011 6:04 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Difference between DFS Used and Non-DFS Used
 
  DFS used is a count of all the space used by the dfs.data.dirs. The
  non-dfs used space is whatever space is occupied beyond that (which
  the DN does not account for).
 
  On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
  sagar_shu...@persistent.co.in wrote:
  Hi,
What is the difference between DFS Used and Non-DFS used ?
 
  Thanks,
  Sagar
 
  DISCLAIMER
  ==
  This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.
 
 
 
 
 
  --
  Harsh J
 
  DISCLAIMER
  ==
  This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.
 
 



 --
 Harsh J

 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.




-- 
Regards,
Suresh

Re: Rapid growth in Non DFS Used disk space

2011-05-14 Thread suresh srinivas

dfs.data.dir/current is used by datanodes to store blocks. This directory
should only have files starting with blk-*

Things to check:
- Are there other files that are not blk related?
- Did you manually copy the content of one storage dir to another? (some
folks did this when they added new disks)


On Fri, May 13, 2011 at 1:41 PM, Kester, Scott skes...@weather.com wrote:

 We have a job that cleans up the mapred.local directory, so that¹s not it.
  I have done some further looking at data usage on the datanodes and 99%
 of the space used is under the dfs.data.dir/current directory.  What would
 be under 'current' that wasn't part of HDFS?

 On 5/13/11 3:12 PM, Allen Wittenauer a...@apache.org wrote:

 
 On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:
 
 
  2) Any ideas on what is driving the growth in Non DFS Used space?   I
  looked for things like growing log files on the datanodes but didn't
 find
  anything.
 
 
  Logs are one possible culprit. Another is to look for old files that
 might
  be orphaned in your mapred.local.dir - there have been bugs in the past
  where we've leaked files. If you shut down the TaskTrackers, you can
 safely
  delete everything from within mapred.local.dirs.
 
Part of our S.O.P. during Hadoop bounces is to wipe mapred.local
 out.
 The TT doesn't properly clean up after itself.




-- 
Regards,
Suresh

Re: CDH and Hadoop

2011-03-24 Thread suresh srinivas

On Thu, Mar 24, 2011 at 7:04 PM, Rita rmorgan...@gmail.com wrote:

 Oh! Thats for the heads up on that...

 I guess I will go with the cloudera source then


 On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch dar...@darose.net
 wrote:

  They do, but IIRC, they recently announced that they're going to be
  discontinuing it.
 
  DR


Yahoo! discontinued the distribution in favor of making Apache Hadoop the
most stable and the go to place for Hadoop releases. So all the advantages
of using Yahoo distribution, you get in Apache Hadoop release.

Please see the details of announcement here:

http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/

Re: hadoop fs -du hbase table size

2011-03-15 Thread suresh srinivas

When you brought down the DN, the blocks in it were replicated to the
remaining DNs. When the DN was added back, the blocks in it were over
replicated, resulting in deletion of the extra replica.

On Mon, Mar 14, 2011 at 7:34 AM, Alex Baranau alex.barano...@gmail.comwrote:

 Hello,

 As far as I understand, since hadoop fs -du command uses Linux' du
 internally this mean that the number of replicas (at the moment of command
 run) affect the result. Is that correct?

 I have the following case.
 I have a small (1 master + 5 slaves each with DN, TT  RS) test HBase
 cluster with replication set to 2. The tables data size is monitoried with
 the help of hadoop fs -du command. There's a table which is constantly
 written to: data is only added in it.
 At some point I decided to reconfigure one of the slaves and shut it down.
 After reconfiguration (HBase already marked it as dead one) I brought it up
 again. Things went smoothly. However on the table size graph (I drew from
 data fetched with hadoop fs -du command) I noticed a little spike up on
 data size and then it went down to the normal/expected values. Can it be so
 that at some point of the taking out/reconfiguring/adding back node
 procedure at some point blocks were over-replicated? I'd expect them to be
 under-replicated for some time (as DN is down) and I'd expect to see the
 inverted spike: small decrease in data amount and then back to expected
 rate (after all blocks got replicated again). Any ideas?

 Thank you,

 Alex Baranau
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase




-- 
Regards,
Suresh

Re: copy a file from hdfs to local file system with java

2011-02-25 Thread suresh srinivas

For an example how it is done, look at FsShell#copyToLocal() and its
internal implementation. It uses FileUtil#copy() method to do this copying.

On Fri, Feb 25, 2011 at 5:08 AM, Alessandro Binhara binh...@gmail.comwrote:

  How to copy a file from a HDS to local file system with a JAVA API ?

 where i can find a documentation and example about it?


 thanks




-- 
Regards,
Suresh

Re: corrupt blocks after restart

2011-02-19 Thread suresh srinivas

The problem is that replicas for 3609 blocks are not reported to namenode.

Do you have datanodes in exclude file? What is the number of registered
nodes before start compared to what it is now? Removing all the datanodes
from exclude file (if there are any) and restarting the cluster should fix
the issue.

On Fri, Feb 18, 2011 at 5:43 PM, Chris Tarnas c...@email.com wrote:

 I've hit a data curroption problem in a system we were rapidly loading up,
 and I could really use some pointers on where to look for the root of the
 problem as well as any possible solutions. I'm running the cdh3b3 build of
 Hadoop 0.20.2. I experienced some issues with a client (Hbase regionserver)
 getting an IOException talking with the namenode. I thought the namenode
 might have been resourced starved (maybe not enough RAM). I first ran a fsck
 and the filesystem was healthy and then shutdown hadoop (stop-all.sh) to
 update the hadoop-env.sh to allocate more memory to the namenode, then
 started up hadoop again (start-all.sh).

 After starting up the server I ran another fsck and now the filesystem is
 corrupt and about 1/3 or less of the size it should be. All of the datanodes
 are online, but it is as if they are all incomplete.

 I've tried using the previous checkpoint from the secondary namenode to no
 avail. This is the fsck summary

 blocks of total size 442716 B.Status: CORRUPT
  Total size:416302602463 B
  Total dirs:7571
  Total files:   7525
  Total blocks (validated):  8516 (avg. block size 48884758 B)
  
  CORRUPT FILES:3343
  MISSING BLOCKS:   3609
  MISSING SIZE: 169401218659 B
  CORRUPT BLOCKS:   3609
  
  Minimally replicated blocks:   4907 (57.62095 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   4740 (55.659935 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 0.7557539
  Corrupt blocks:3609
  Missing replicas:  8299 (128.94655 %)
  Number of data-nodes:  10
  Number of racks:   1

 The namenode had quite a few WARNS like this one (The list of excluded
 nodes is all of the nodes in the system!)

 2011-02-18 17:06:40,506 WARN
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
 enough replicas, still in need of 1(excluded: 10.56.24.15:50010,
 10.56.24.19:50010, 10.56.24.16:50010, 10.56.24.20:50010, 10.56.24.14:50010,
 10.56.24.17:50010, 10.56.24.13:50010, 10.56.24.18:50010, 10.56.24.11:50010,
 10.56.24.12:50010)



 I grepped for errors and warns on all 10 of the datanode logs and only
 found that over the last day two nodes had a total of 8 warns and 1 error:

 node 5:

 2011-02-18 03:44:56,642 WARN
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification
 failed for blk_-8223286903671115311_101182. Exception : java.io.IOException:
 Input/output error
 2011-02-18 03:45:04,440 WARN
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification
 failed for blk_-8223286903671115311_101182. Exception : java.io.IOException:
 Input/output error
 2011-02-18 06:53:17,081 WARN
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification
 failed for blk_8689822798201808529_99687. Exception : java.io.IOException:
 Input/output error
 2011-02-18 06:53:25,105 WARN
 org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification
 failed for blk_8689822798201808529_99687. Exception : java.io.IOException:
 Input/output error
 2011-02-18 12:09:09,613 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:  Could not read or failed
 to veirfy checksum for data at offset 25624576 for block
 blk_-8776727553170755183_302602 got : java.io.IOException: Input/output
 error
 2011-02-18 12:17:03,874 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:  Could not read or failed
 to veirfy checksum for data at offset 2555904 for block
 blk_-1372864350494009223_328898 got : java.io.IOException: Input/output
 error
 2011-02-18 13:15:40,637 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:  Could not read or failed
 to veirfy checksum for data at offset 458752 for block
 blk_5554094539319851344_322246 got : java.io.IOException: Input/output error
 2011-02-18 13:12:13,587 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.56.24.15:50010,
 storageID=DS-1424058120-10.56.24.15-50010-1297226452840, infoPort=50075,
 ipcPort=50020):DataXceiver

 Node 9:

 2011-02-18 12:02:58,879 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:  Could not read or failed
 to veirfy checksum for data at offset 16711680 for block
 blk_-5196887735268731000_300861 got : java.io.IOException: Input/output
 error

 Many thanks for any help or where I should look.
 -chris





-- 
Regards,
Suresh

Re: Data Nodes do not start

2011-02-09 Thread suresh srinivas

On Tue, Feb 8, 2011 at 11:05 PM, rahul patodi patodira...@gmail.com wrote:

 I think you should copy the namespaceID of your master which is in
 name/current/VERSION file to all the slaves


This is a sure recipe for disaster. The VERSION file is a file system meta
data file not to be messed around with. At worst, this can cause loss of
entire file system data! Rahul please update your blog to reflect this.

Some background on namespace ID:
A namespace ID is created on the namenode when it is formatted. This is
propagated to datanodes when they register the first time with namenode.
From then on, this ID is burnt into the datanodes.

A mismatch in namespace ID of datanode and namenode means:
# Datanode is pointing to a wrong namenode, perhaps in a different cluster
(config of datanode points to wrong namenode address).
# Namenode was running with a storage directory previously. It was changed
to some other storage directory with a different file system image.


Why does editing namespace ID is a bad idea?
Given that either namenode has loaded wrong namespace or datanode is
pointing to wrong namenode, messing around with namespaceID either on
namenode/datanode, results in datanode being able to register with the
namenode. When datanode sends block report, the blocks on the datanode do
not belong to the namespace loaded by the namenode, resulting in deletion of
all the blocks on the datanode.

Please find out if any of these problem exist in your setup and fix it.

75 matches

Mail list logo