Re: HDFS Append Problem
Please take this up CDH mailing list. From: Molnár Bálint molnarcsi...@gmail.com Sent: Thursday, March 05, 2015 4:53 AM To: user@hadoop.apache.org Subject: HDFS Append Problem Hi Everyone! I 'm experiencing an annoying problem. My Scenario is: I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the factory writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.) I' m trying to create a code to append Map files. I use the org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job. My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB. The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors. I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1. Any ideas how to solve this issue? Thanks
Re: Error while executing command on CDH5
Can you please use CDH mailing listd for this question? From: SP sajid...@gmail.com Sent: Wednesday, March 04, 2015 11:00 AM To: user@hadoop.apache.org Subject: Error while executing command on CDH5 Hello All, Why am I getting this error every time I execute a command. It was working fine with CDH4 version. When I upgraded to CDH5 version this message started showing up. does any one have resolution for this error sudo -u hdfs hadoop fs -ls / SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items drwxrwxrwt - hdfs hadoop 0 2015-03-04 10:30 /tmp Thanks SP
Re: DFS Used V/S Non DFS Used
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259 Here are the definition of data reported on the Web UI: Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved DFS Used: Space used by DFS Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space. DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used) DFS Used %: (DFS Used / Configured Capacity) * 100 DFS Remaining % = (DFS Remaining / Configured Capacity) * 100 On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel manojsamelt...@gmail.com wrote: Hi, Not clear how this computation is done For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And each of these disk has a directory for data node and a directory for non-datanode usage. /disk1/datanode /disk1/non-datanode /disk2/datanode /disk2/non-datanode The dfs.datanode.data.dir says /disk1/datanode,/disk2/datanode. With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode) SUM(/disk*/non-datanode) etc. resp. ? Thanks, -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Significance of PID files
When a daemon process is started, the process ID of the process is captured in a pid file. It is used for following purposes: - During a daemon startup, the existence of pid file is used to determine that the process is already running. - When a daemon is stooped, hadoop scripts sends kill TERM signal to the process ID captured in pid file for graceful shutdown. After a timeout, if the process still exists, kill -9 is sent for forced shutdown. For more details, see the relevant code in http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh On Fri, Jul 4, 2014 at 10:00 AM, Vijaya Narayana Reddy Bhoomi Reddy vijay.bhoomire...@gmail.com wrote: Hi, Can anyone please explain the significance of the pid files in Hadoop i.e. purpose and usage etc? Thanks Regards Vijay -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: hadoop 2.2.0 HA: standby namenode generate a long list of loading edits
Henry, I suspect this is what is happening. On active namenode, oncethe existing set of editlogs during startup are loaded, it becomes active and from then it has no need to load any more edits. It only generates edits. On the other hand, standby namenode not only loads the edits during startup, it also continuously loads the edits being generated by the active. Hence the difference. Regards, Suresh On Wed, Jun 11, 2014 at 7:49 PM, Henry Hung ythu...@winbond.com wrote: Hi All, I’m using QJM with 2 namenodes, in the active namenode, the main page’s loading edits panel only show 10 records, but in standby namenode, the loading edits panel show a lot more records, never count it, but I think it has 100 records. Is this a problem? Here I provide some of the data: http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1080storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1361storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1830storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000140storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10001638storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002099storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002359storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000332storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000421storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005210storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005529storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000577storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005831storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005951storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006089storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006154storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006291storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006482storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec Best regards, Henry -- The privileged confidential information contained in this email is intended for use only by the addressees as indicated by the original sender of this email. If you are not the addressee indicated in this email or are not responsible for delivery of the email to such a person, please kindly reply to the sender indicating this fact and delete all copies of it from your computer and network server immediately. Your cooperation is highly appreciated. It is advised that any unauthorized use of confidential information of Winbond is strictly prohibited; and any information in this email irrelevant to the official business of Winbond shall be deemed as neither given nor endorsed by Winbond.
Re: hadoop 2.2.0 HA: standby namenode generate a long list of loading edits
On Wed, Jun 11, 2014 at 8:27 PM, Henry Hung ythu...@winbond.com wrote: @Suresh, Q1: But is this kind of behavior can cause some problem in fail over event? I’m afraid that standby namenode will took a long time to be active. Can you please explain how you arrived at this? Q2: Is there a way to purge the loading edit records? Should I do restart on standby namenode? Other than showing a long list of loaded edits, there is nothing to be concerned here. I agree that this is confusing and we could change this where we print only last set of loaded edits instead of entire list. Best regards, Henry *From:* Suresh Srinivas [mailto:sur...@hortonworks.com] *Sent:* Thursday, June 12, 2014 11:23 AM *To:* hdfs-u...@hadoop.apache.org *Subject:* Re: hadoop 2.2.0 HA: standby namenode generate a long list of loading edits Henry, I suspect this is what is happening. On active namenode, oncethe existing set of editlogs during startup are loaded, it becomes active and from then it has no need to load any more edits. It only generates edits. On the other hand, standby namenode not only loads the edits during startup, it also continuously loads the edits being generated by the active. Hence the difference. Regards, Suresh On Wed, Jun 11, 2014 at 7:49 PM, Henry Hung ythu...@winbond.com wrote: Hi All, I’m using QJM with 2 namenodes, in the active namenode, the main page’s loading edits panel only show 10 records, but in standby namenode, the loading edits panel show a lot more records, never count it, but I think it has 100 records. Is this a problem? Here I provide some of the data: http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1080storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1361storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1830storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000140storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10001638storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002099storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10002359storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000332storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000421storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005210storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005529storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=1000577storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005831storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10005951storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006089storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006154storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006291storageInfo=-47%3A1313059004%3A1395811267413%3ACID-9e1b67b3-8190-4652-b34a-210212a50a9e (0/0) 100.00% 0sec http://fchdgw1.ctfab.com:8480/getJournal?jid=hadoop_prodsegmentTxId=10006482storageInfo
Re: how can i monitor Decommission progress?
The namenode webui provides that information. Click on the main webui the link associated with decommissioned nodes. Sent from phone On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote: use $hadoop dfsadmin -report Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i decommission three node out of my cluster,but question is how can i see the decommission progress?,i just can see admin state from web ui -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: listing a 530k files directory
Listing such a directory should not be a big problem. Can you cut and paste the command output. Which release are you using? Sent from phone On May 30, 2014, at 5:49 AM, Guido Serra z...@fsfe.org wrote: already tried, didn't work (24cores at 100% and a-lot-memory, stilll ... GC overhead limit exceed) thanks anyhow On 05/30/2014 02:43 PM, bharath vissapragada wrote: Hi Guido, You can set client side heap in HADOOP_OPTS variable before running the ls command. export HADOOP_OPTS=-Xmx3g; hadoop fs -ls / - Bharath On Fri, May 30, 2014 at 5:22 PM, Guido Serra z...@fsfe.org wrote: Hi, do you have an idea on how to look at the content of a 530k-files HDFS folder? (yes, I know it is a bad idea to have such setup, but that’s the status and I’d like to debug it) and the only tool that doesn’t go out of memory is hdfs dfs -count folder/“ -ls goes out of memory, -count with the folder/* goes out of memory … I’d like at least at the first 10 file names, see the size, maybe open one thanks, G. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: any optimize suggestion for high concurrent write into hdfs?
Another alternative is to write block sized chunks into multiple hdfs files concurrently followed by concat to all those into a single file. Sent from phone On Feb 20, 2014, at 8:15 PM, Chen Wang chen.apache.s...@gmail.com wrote: Ch, you may consider using flume as it already has a flume sink that can sink to hdfs. What I did is to set up a flume listening on an Avro sink, and then sink to hdfs. Then in my application, i just send my data to avro socket. Chen On Thu, Feb 20, 2014 at 5:07 PM, ch huang justlo...@gmail.com wrote: hi,maillist: is there any optimize for large of write into hdfs in same time ? thanks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS Federation address performance issue
Response inline... On Tue, Jan 28, 2014 at 10:04 AM, Anfernee Xu anfernee...@gmail.com wrote: Hi, Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits, the overall performance can be improved by federation, but I'm not sure federation address my usercase, could someone elaborate it? My usercase is I have one single NM and several DN, and I have bunch of concurrent MR jobs which will create new files(plan files and sub-directory) under the same parent directory, the questions are: 1) Will these concurrent writes(new file, plan files and sub-directory under the same parent directory) run in sequential because WRITE-once control govened by single NM? Namenode commits multiple requests in a batch. In Namenode it self, the lock for write operations make them sequential. But this is a short duration lock and hence will make from the multiple clients perspective, the creation of files as simultaneous. If you are talking about a single client, with a single thread, then it would be sequential. Hope that makes sense. I need this answer to estimate the necessity of moving to HDFS federation. Thanks -- --Anfernee -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: compatibility between new client and old server
2.x is a new major release. 1.x and 2.x are not compatible. In 1.x, the RPC wire protocol used java serialization. In 2.x, the RPC wire protocol uses protobuf. A client must be compiled against 2.x and should use appropriate jars from 2.x to work with 2.x. On Wed, Dec 18, 2013 at 10:45 AM, Ken Been ken.b...@twosigma.com wrote: I am trying to make a 2.2.0 Java client work with a 1.1.2 server. The error I am currently getting is below. I’d like to know if my problem is because I have configured something wrong or because the versions are simply not compatible for what I want to do. Thanks in advance for any help. Ken at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at my code... Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891) -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDP 2.0 GA?
Please send the questions related to a vendor specific distro to vendor mailing list. In this case - http://hortonworks.com/community/forums/. On Tue, Nov 5, 2013 at 10:49 AM, Jim Falgout jim.falg...@actian.com wrote: HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2. -- *From:* John Lilley john.lil...@redpoint.net *Sent:* Tuesday, November 05, 2013 12:34 PM *To:* user@hadoop.apache.org *Subject:* HDP 2.0 GA? I noticed that HDP 2.0 is available for download here: http://hortonworks.com/products/hdp-2/?b=1#install Is this the final “GA” version that tracks Apache Hadoop 2.2? Sorry I am just a little confused by the different numbering schemes. *Thanks* *John* -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS / Federated HDFS - Doubts
On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison sediso...@gmail.com wrote: I have couple of questions about HDFS federation: Can I state different block store directories for each namespace on a datanode ? No. The main idea of federation was not to physically partition the storage across namespace, but to use all the available storage across the namespaces, to ensure better utilzation. Can I have some datanodes dedicated to a particular namespace only ? As I said earlier, all the datanodes are shared across namespaces. If you want to dedicate datanodes to a particular namespace, you might as well create it as two separate clusters with different set of datanodes and a separate namespace. This seems quite interesting. Way to go ! On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan natar...@umn.edu wrote: Hi All, While trying to understand federated HDFS in detail I had few doubts and listing them down for your help. 1. In case of *HDFS(without HDFS federation)*, the metadata or the data about the blocks belonging to the files in HDFS is maintained in the main memory of the name node or it is stored on permanent storage of the namenode and is brought in the main memory on demand basis ? [Krishna] Based on my understanding, I assume the entire metadata is in main memory which is an issue by itself. Please correct me if my understanding is wrong. 2. In case of* federated HDFS*, the metadata or the data about the blocks belonging to files in a particular namespace is maintained in the main memory of the namenode or it is stored on the permanent storage of the namenode and is brought in the main memory on demand basis ? 3. Are the metadata information stored in separate cluster nodes(block management layer separation) as discussed in Appendix B of this document ? https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf 4. I would like to know if the following proposals are already implemented in federated HDFS. ( http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17) - Separation of namespace and block management layers (same as qn.3) - Partial namespace in memory for further scalability - Move partial namespace from one namenode to another Thanks, Krishna -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS federation Configuration
I'm not able to follow the page completely. Can you pls help me to get some clear step by step or little bit more details in the configuration side? Have you setup a non-federated cluster before. If you have, the page should be easy to follow. If you have not setup a non-federated cluster before, I suggest doing so, before looking at this document. I think the document already has step by step instructions. I -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS federation Configuration
Have you looked at - http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-hdfs/Federation.html Let me know if the document is not clear or needs improvements. Regards, Suresh On Thu, Sep 19, 2013 at 12:01 PM, Manickam P manicka...@outlook.com wrote: Guys, I need some tutorials to configure fedration. Can you pls suggest me some? Thanks, Manickam P -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Name node High Availability in Cloudera 4.1.1
Please do not cross-post these emails to hdfs-user. The relevant email list is only cdh-user. On Thu, Sep 19, 2013 at 1:44 AM, Pavan Kumar Polineni smartsunny...@gmail.com wrote: Hi all, *Name Node High Availability Job tracker high availability* is there in Cloudera 4.1.1 ? If not, Then what are the properties need to change in Cloudera 4.1.1 to make the cluster as High availability. please help on this.. Thanks in Advance -- Pavan Kumar Polineni -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Cloudera Vs Hortonworks Vs MapR
Shahab, I agree with your arguments. Really well put. Only things I would add is - we do not want sales/marketing folks getting involved in these kinds of threads and pollute it with sales pitches, unsubstantiated claims, and make it a forum for marketing pitch. This can also have community repercussions as you have rightly pointed out. Wearing my own hadoop PMC hat, we do put Apache release regularly. Bigtop also provides excellent stack packaging as well. In this forum my wish is to see discussions around that than vendor related. There are already many outside forums for this. Regards, Suresh On Fri, Sep 13, 2013 at 10:48 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think, in my opinion, it is a wrong idea because: 1- Many of the participants here are employees for these very companies that are under discussion. This puts these respective employees in very difficult position. It is very hard to come with a correct response. Comments can be misconstrued easily. 2- Also, when we talk about vendor distributions of the software, it is not longer purely about open source. Now companies with the related corporate legal baggage also gets in the mix. 3- The discussion would be on not only positive things about each vendor but in fact negatives. The latter type of discussion which can get unpleasant very easily. 4- Somebody mentioned that, this is a very lightly moderated platform and thus this discussion should be allowed. I think this is one of the reasons that it should not be because, people can say things casually, without much thought, or without taking care of the context or the possible interpretations and get in trouble. 5- The risk here is not only that serious repercussions can occur (which very well can) but the greater risk is that it can cause misunderstanding between individuals, industries and companies. 6-People here lot of time reply quickly just to resolve or help the 'technical' issue. Now they will have to take care how they frame the response. Re: 4 I know some will feel that I have created a highly exaggerated scenario above, but what I am trying to say is that, it is a slippery slope. If we allow this then this can go anywhere. By the way, I do not work for any of these vendors. More importantly, I am not saying that this discussion should not be had, I am just saying that this is a wrong forum. Just my 2 cents (or,...this was rather a dollar.) Regards, Shahab On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann mattm...@apache.orgwrote: Errr, what's wrong with discussing these types of issues on list? Nothing public here, and as long as it's kept to facts, this should not be a problem and Apache is a fine place to have such discussions. My 2c. -Original Message- From: Xuri Nagarin secs...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday, September 12, 2013 4:39 PM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: Cloudera Vs Hortonworks Vs MapR I understand it can be contentious issue especially given that a lot of contributors to this list work for one or the other vendor or have some stake in any kind of evaluation. But, I see no reason why users should not be able to compare notes and share experiences. Over time, genuine pain points or issues or claims will bubble up and should only help the community. Sure, there will be a few flame wars but this already isn't a very tightly moderated list. On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng a...@maprtech.com wrote: Raj, As others noted, this is not a great place for this discussion. I'd suggest contacting the vendors you are interested in as I'm sure we'd all be happy to provide you more details. I don't know about the others, but for MapR, just send an email to sa...@mapr.com mailto:sa...@mapr.com and I'm sure someone will get back to you with more information. Best Regards, Aaron Eng On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj hadoop...@yahoo.com wrote: Hi, We are trying to evaluate different implementations of Hadoop for our big data enterprise project. Can the forum members advise on what are the advantages and disadvantages of each implementation i.e. Cloudera Vs Hortonworks Vs MapR. Thanks in advance. Regards, Raj -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your
Re: Cloudera Vs Hortonworks Vs MapR
Raj, You can also use Apache Hadoop releases. Bigtop does fine job as well putting together consumable Hadoop stack. As regards to vendor solutions, this is not the right forum. There are other forums for this. Please refrain from this type of discussions on Apache forum. Regards, Suresh On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj hadoop...@yahoo.com wrote: Hi, We are trying to evaluate different implementations of Hadoop for our big data enterprise project. Can the forum members advise on what are the advantages and disadvantages of each implementation i.e. Cloudera Vs Hortonworks Vs MapR. Thanks in advance. Regards, Raj -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Symbolic Link in Hadoop 1.0.4
FileContext APIs and symlink functionality is not available in 1.0. It is only available in 0.23 and 2.x release. On Thu, Sep 5, 2013 at 8:06 AM, Gobilliard, Olivier olivier.gobilli...@cartesian.com wrote: Hi, I am using Hadoop 1.0.4 and need to create a symbolic link in HDSF. This feature has been added in Hadoop 0.21.0 ( https://issues.apache.org/jira/browse/HDFS-245) in the new FileContext API ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html ). However, I cannot find the FileContext API in the 1.0.4 release ( http://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/). I cannot find it in any of the 1.X releases actually. Has this functionality been moved to another Class? Many thanks, Olivier __ This email and any attachments are confidential. If you have received this email in error please notify the sender immediately by replying to this email and then delete from your computer without copying or distributing in any other way. Cartesian Limited - Registered in England and Wales with number 3230513 Registered office: Descartes House, 8 Gate Street, London, WC2A 3HP www.cartesian.com -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Documentation for Hadoop's RPC mechanism
Create a Jira and post it into hadoop documentation. I can help you with the review and commit. Sent from phone On Aug 20, 2013, at 10:40 AM, Elazar Leibovich elaz...@gmail.com wrote: Hi, I've written some documentation for Hadoop's RPC mechanism internals: http://hadoop.quora.com/Hadoop-RPC-mechanism I'll be very happy if the community can review it. You should be able to edit it directly, or just send your comments to the list. Except, I'm looking for a good place to put it. Where does it fit? Would it fit Hadoop's Wiki? Hadoop's Source? Thanks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Maven Cloudera Configuration problem
Folks, can you please take this thread to CDH related mailing list? On Tue, Aug 13, 2013 at 3:07 PM, Brad Cox bradj...@gmail.com wrote: That link got my hopes up. But Cloudera Manager (what I'm running; on CDH4) does not offer an Export Client Config option. What am I missing? On Aug 13, 2013, at 4:04 PM, Shahab Yunus shahab.yu...@gmail.com wrote: You should not use LocalJobRunner. Make sure that the mapred.job.tracker property does not point to 'local' an instead to your job-tracker host and port. *But before that* as Sandy said, your client machine (from where you will be kicking of your jobs and apps) should be using config files which will have your cluster's configuration. This is the alternative that you should follow if you don't want to bundle the configs for your cluster in the application itself (either in java code or separate copies of relevant properties set of config files.) This was something which I was suggesting early on to just to get you started using your cluster instead of local mode. By the way have you seen the following link? It gives you step by step information about how to generate config files from your cluster specific to your cluster and then how to place them and use the from any machine you want to designate as your client. Running your jobs form one of the datanodes without proper config would not work. https://ccp.cloudera.com/display/FREE373/Generating+Client+Configuration Regards, Shahab On Tue, Aug 13, 2013 at 1:07 PM, Pavan Sudheendra pavan0...@gmail.com wrote: Yes Sandy, I'm referring to LocalJobRunner. I'm actually running the job on one datanode.. What changes should i make so that my application would take advantage of the cluster as a whole? On Tue, Aug 13, 2013 at 10:33 PM, sandy.r...@cloudera.com wrote: Nothing in your pom.xml should affect the configurations your job runs with. Are you running your job from a node on the cluster? When you say localhost configurations, do you mean it's using the LocalJobRunner? -sandy (iphnoe tpying) On Aug 13, 2013, at 9:07 AM, Pavan Sudheendra pavan0...@gmail.com wrote: When i actually run the job on the multi node cluster, logs shows it uses localhost configurations which i don't want.. I just have a pom.xml which lists all the dependencies like standard hadoop, standard hbase, standard zookeeper etc., Should i remove these dependencies? I want the cluster settings to apply in my map-reduce application.. So, this is where i'm stuck at.. On Tue, Aug 13, 2013 at 9:30 PM, Pavan Sudheendra pavan0...@gmail.com wrote: Hi Shabab and Sandy, The thing is we have a 6 node cloudera cluster running.. For development purposes, i was building a map-reduce application on a single node apache distribution hadoop with maven.. To be frank, i don't know how to deploy this application on a multi node cloudera cluster. I am fairly well versed with Multi Node Apache Hadoop Distribution.. So, how can i go forward? Thanks for all the help :) On Tue, Aug 13, 2013 at 9:22 PM, sandy.r...@cloudera.com wrote: Hi Pavan, Configuration properties generally aren't included in the jar itself unless you explicitly set them in your java code. Rather they're picked up from the mapred-site.xml file located in the Hadoop configuration directory on the host you're running your job from. Is there an issue you're coming up against when trying to run your job on a cluster? -Sandy (iphnoe tpying) On Aug 13, 2013, at 4:19 AM, Pavan Sudheendra pavan0...@gmail.com wrote: Hi, I'm currently using maven to build the jars necessary for my map-reduce program to run and it works for a single node cluster.. For a multi node cluster, how do i specify my map-reduce program to ingest the cluster settings instead of localhost settings? I don't know how to specify this using maven to build my jar. I'm using the cdh distribution by the way.. -- Regards- Pavan -- Regards- Pavan -- Regards- Pavan -- Regards- Pavan Dr. Brad J. CoxCell: 703-594-1883 Blog: http://bradjcox.blogspot.com http://virtualschool.edu -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re:
Please use CDH mailing list. This is apache hadoop mailing list. Sent from phone On Jul 12, 2013, at 7:51 PM, Anit Alexander anitama...@gmail.com wrote: Hello, I am encountering a problem in cdh4 environment. I can successfully run the map reduce job in the hadoop cluster. But when i migrated the same map reduce to my cdh4 environment it creates an error stating that it cannot read the next block(each block is 64 mb). Why is that so? Hadoop environment: hadoop 1.0.3 java version 1.6 chd4 environment: CDH4.2.0 java version 1.6 Regards, Anit Alexander
Re: Cloudera links and Document
Sathish, this mailing list for Apache Hadoop related questions. Please post questions related to other distributions to appropriate vendor's mailing list. On Thu, Jul 11, 2013 at 6:28 AM, Sathish Kumar sa848...@gmail.com wrote: Hi All, Can anyone help me the link or document that explain the below. How Cloudera Manager works and handle the clusters (Agent and Master Server)? How the Cloudera Manager Process Flow works? Where can I locate Cloudera configuration files and explanation in brief? Regards Sathish -- http://hortonworks.com/download/
Re: data loss after cluster wide power loss
On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote: On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should either ensure metadata operation is synced on the datanode or handle it being reported as blockBeingWritten. Let me spend sometime to debug this issue. In theory, ext3 is journaled, so all metadata operations should be durable in the case of a power outage. It is only data operations that should be possible to lose. It is the same for ext4. (Assuming you are not using nonstandard mount options.) ext3 journal may not hit the disk right. From what I read, if you do not specifically call sync, even the metadata operations do not hit disk. See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt commit=nrsec(*) Ext3 can be told to sync all its data and metadata every 'nrsec' seconds. The default value is 5 seconds. This means that if you lose your power, you will lose as much as the latest 5 seconds of work (your filesystem will not be damaged though, thanks to the journaling). This default value (or any low value) will hurt performance, but it's good for data-safety. Setting it to 0 will have the same effect as leaving it at the default (5 seconds). Setting it to very large values will improve performance.
Re: HDFS file section rewrite
HDFS only supports regular writes and append. Random write is not supported. I do not know of any feature/jira that is underway to support this feature. On Tue, Jul 2, 2013 at 9:01 AM, John Lilley john.lil...@redpoint.netwrote: I’m sure this has been asked a zillion times, so please just point me to the JIRA comments: is there a feature underway to allow for re-writing of HDFS file sections? Thanks John ** ** -- http://hortonworks.com/download/
Re: data loss after cluster wide power loss
Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should either ensure metadata operation is synced on the datanode or handle it being reported as blockBeingWritten. Let me spend sometime to debug this issue. One surprising thing is, all the replicas were reported as blockBeingWritten. Regards, Suresh On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote: (Removing hbase list and adding hdfs-dev list as this is pretty internal stuff). Reading through the code a bit: FSDataOutputStream.close calls DFSOutputStream.close calls DFSOutputStream.closeInternal - sets currentPacket.lastPacketInBlock = true - then calls DFSOutputStream.flushInternal - enqueues current packet - waits for ack BlockReceiver.run - if (lastPacketInBlock !receiver.finalized) calls FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock - renames block from blocksBeingWritten tmp dir to current dest dir This looks to me as I would expect a synchronous chain from a DFS client to moving the file from blocksBeingWritten to the current dir so that once the file is closed that it the block files would be in the proper directory - even if the contents of the file are still in the OS buffer rather than synced to disk. It's only after this moving of blocks that NameNode.complete file is called. There are several conditions and loops in there that I'm not certain this chain is fully reliable in all cases without a greater understanding of the code. Could it be the case that the rename operation itself is not synced and that ext3 lost the fact that the block files were moved? Or is there a bug in the close file logic that for some reason the block files are not always moved into place when a file is closed? Thanks for your patience, Dave On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying to understand what happened and whether this is a bug in HDFS that should be fixed. From what I can tell the file was created and closed by the dfs client (hbase). Then HBase renamed it into a new directory and deleted some other files containing the same data. Then the cluster lost power. After the cluster was restarted, the datanodes reported into the namenode but the blocks for this file appeared as blocks being written - the namenode rejected them and the datanodes deleted the blocks. At this point there were no replicas for the blocks and the files were marked CORRUPT. The underlying file systems are ext3. Some questions that I would love get answers for if anyone with deeper understanding of HDFS can chime in: - Is this a known scenario where data loss is expected? (I found HDFS-1539 but that seems different) - When are blocks moved from blocksBeingWritten to current? Does that happen before a file close operation is acknowledged to a hdfs client? - Could it be that the DataNodes actually moved the blocks to current but after the restart ext3 rewound state somehow (forgive my ignorance of underlying file system behavior)? - Is there any other explanation for how this can happen? Here is a sequence of selected
Re: Please explain FSNamesystemState TotalLoad
On Fri, Jun 7, 2013 at 9:10 AM, Nick Niemeyer nnieme...@riotgames.comwrote: Regarding TotalLoad, what would be normal operating tolerances per node for this metric? When should one become concerned? Thanks again to everyone participating in this community. :) Why do you want to be concered :) I have not seen many issues related to high TotalLoad. This is mainly useful in terms of understanding how many concurrent jobs/file accesses are happening and how busy datanodes are. When you are debugging issues where cluster slow down due to overload, or correlating a run of big jobs, this is useful. Knowing what it represent, you would find many other uses as well. From: Suresh Srinivas sur...@hortonworks.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday, June 6, 2013 4:14 PM To: hdfs-u...@hadoop.apache.org user@hadoop.apache.org Subject: Re: Please explain FSNamesystemState TotalLoad It is the total number of transceivers (readers and writers) reported by all the datanodes. Datanode reports this count in periodic heartbeat to the namenode. On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.comwrote: Can someone please explain what TotalLoad represents below? Thanks for your response in advance! Version: hadoop-0.20-namenode-0.20.2+923.197-1 Example pulled from the output of via the name node: # curl -i http://localhost:50070/jmx { name : hadoop:service=NameNode,name=FSNamesystemState, modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem, CapacityTotal : #, CapacityUsed : #, CapacityRemaining : #, * TotalLoad : #,* BlocksTotal : #, FilesTotal : #, PendingReplicationBlocks : 0, UnderReplicatedBlocks : 0, ScheduledReplicationBlocks : 0, FSState : Operational } Thanks, Nick -- http://hortonworks.com/download/ -- http://hortonworks.com/download/
Re: Please explain FSNamesystemState TotalLoad
It is the total number of transceivers (readers and writers) reported by all the datanodes. Datanode reports this count in periodic heartbeat to the namenode. On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.comwrote: Can someone please explain what TotalLoad represents below? Thanks for your response in advance! Version: hadoop-0.20-namenode-0.20.2+923.197-1 Example pulled from the output of via the name node: # curl -i http://localhost:50070/jmx { name : hadoop:service=NameNode,name=FSNamesystemState, modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem, CapacityTotal : #, CapacityUsed : #, CapacityRemaining : #, * TotalLoad : #,* BlocksTotal : #, FilesTotal : #, PendingReplicationBlocks : 0, UnderReplicatedBlocks : 0, ScheduledReplicationBlocks : 0, FSState : Operational } Thanks, Nick -- http://hortonworks.com/download/
Re: How to test the performance of NN?
What do you mean by it is not telling me any thing about performance? Also I do not understand the part, only about potential failures.. Can you add more details. nnbench is the best microbenchmark for nn performance test. On Wed, Jun 5, 2013 at 3:17 PM, Mark Kerzner mark.kerz...@shmsoft.comwrote: Hi, I am trying to create a more efficient namenode, and for that I need to the standard distribution, and then compare it to my version. Which benchmark should I run? I am doing nnbench, but it is not telling me anything about performance, only about potential failures. Thank you. Sincerely, Mark -- http://hortonworks.com/download/
Re: cloudera4.2 source code ant
Folks, this is Apache Hadoop mailing list. For vendor distro related questions, please use the appropriate vendor mailing list. Sent from a mobile device On May 17, 2013, at 2:06 AM, Kun Ling lkun.e...@gmail.com wrote: Hi dylan, I have not build CDH source code using ant, However I have met a similar dependencies resolve filed problem. Acccording to my experience, this is much like a package network download issue. You may try to remove the .ivy2 and .m2 directories in your home directory, and run ant clean; ant to try again. Hope it is helpful to you. yours, Kun Ling On Fri, May 17, 2013 at 4:42 PM, dylan dwld0...@gmail.com wrote: hello, there is a problem i can't resolved, i want to remote connect the hadoop ( cloudera cdh4.2.0 ) via eclipse plugin.There’s have no hadoop-eclipse-pluge.jar ,so i download the hadoop of cdh4.2.0 tarbal and when i complie, the error is below: ivy-resolve-common: [ivy:resolve] :: resolving dependencies :: org.apache.hadoop#eclipse-plugin;working@master [ivy:resolve]confs: [common] [ivy:resolve]found commons-logging#commons-logging;1.1.1 in maven2 [ivy:resolve] :: resolution report :: resolve 5475ms :: artifacts dl 2ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | common | 2 | 0 | 0 | 0 || 1 | 0 | - [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] WARNINGS [ivy:resolve] :: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :: [ivy:resolve] :: log4j#log4j;1.2.16: several problems occurred while resolving dependency: log4j#log4j;1.2.16 {common=[master]}: [ivy:resolve]reactor-repo: unable to get resource for log4j#log4j;1.2.16: res=${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.pom: java.net.MalformedURLException: no protocol: ${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.pom [ivy:resolve]reactor-repo: unable to get resource for log4j#log4j;1.2.16: res=${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.jar: java.net.MalformedURLException: no protocol: ${reactor.repo}/log4j/log4j/1.2.16/log4j-1.2.16.jar [ivy:resolve] :: [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS BUILD FAILED /home/paramiao/hadoop-2.0.0-mr1-cdh4.2.0/src/contrib/build-contrib.xml:440: impossible to resolve dependencies: resolve failed - see output for details so could someone tell me where i am wrong and how could make it success? best regards! -- http://www.lingcc.com
Re: CDH4 installation along with MRv1 from tarball
Can you guys please take this thread to CDH mailing list? Sent from phone On Mar 20, 2013, at 2:48 PM, rohit sarewar rohitsare...@gmail.com wrote: Hi Jens These are not complete version of Hadoop. 1) hadoop-0.20-mapreduce-0.20.2+1341 (has only MRv1) 2) hadoop-2.0.0+922 (has HDFS+ Yarn) I request you to read the comments in this link https://issues.cloudera.org/browse/DISTRO-447 On Tue, Mar 19, 2013 at 1:17 PM, Jens Scheidtmann jens.scheidtm...@gmail.com wrote: Rohit, What are you trying to achieve with two different complete versions of hadoop? Thanks, Jens 2013/3/18 rohit sarewar rohitsare...@gmail.com Need some guidance on CDH4 installation from tarballs I have downloaded two files from https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs 1) hadoop-0.20-mapreduce-0.20.2+1341 (has only MRv1) 2) hadoop-2.0.0+922 (has HDFS+ Yarn)
Re: Regarding: Merging two hadoop clusters
I have two different hadoop clusters in production. One cluster is used as backing for HBase and the other for other things. Both hadoop clusters are using the same version 1.0 and I want to merge them and make them one. I know, one possible solution is to copy the data across, but the data is really huge on these clusters and it will hard for me to compromise with huge downtime. Is there any optimal way to merge two hadoop clusters. This is not a supported feature. Hence this activity would require understanding low level Hadoop details, quite a bit of hacking and is not straightforward. Copying data from the clusters is the simplest solution.
Re: Hadoop cluster hangs on big hive job
I have seen one such problem related to big hive jobs that open a lot of files. See HDFS-4496 for more details. Snippet from the description: The following issue was observed in a cluster that was running a Hive job and was writing to 100,000 temporary files (each task is writing to 1000s of files). When this job is killed, a large number of files are left open for write. Eventually when the lease for open files expires, lease recovery is started for all these files in a very short duration of time. This causes a large number of commitBlockSynchronization where logSync is performed with the FSNamesystem lock held. This overloads the namenode resulting in slowdown. Could this be the cause? Can you see namenode log to see if you have lease recovery activity? If not, can you send some information about what is happening in the namenode logs at the time of this slowdown? On Mon, Mar 11, 2013 at 1:32 PM, Daning Wang dan...@netseer.com wrote: [hive@mr3-033 ~]$ hadoop version Hadoop 1.0.4 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290 Compiled by hortonfo on Wed Oct 3 05:13:58 UTC 2012 On Sun, Mar 10, 2013 at 8:16 AM, Suresh Srinivas sur...@hortonworks.comwrote: What is the version of hadoop? Sent from phone On Mar 7, 2013, at 11:53 AM, Daning Wang dan...@netseer.com wrote: We have hive query processing zipped csv files. the query was scanning for 10 days(partitioned by date). data for each day around 130G. The problem is not consistent since if you run it again, it might go through. but the problem has never happened on the smaller jobs(like processing only one days data). We don't have space issue. I have attached log file when problem happening. it is stuck like following(just search 19706 of 49964) 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) Thanks, Daning On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård haavard.kongsga...@gmail.com wrote: hadoop logs? On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote: We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running big jobs. Basically all the nodes are dead, from that trasktracker's log looks it went into some kinds of loop forever. All the log entries like this when problem happened. Any idea how to debug the issue? Thanks in advance. 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_16_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0
Re: Hadoop cluster hangs on big hive job
What is the version of hadoop? Sent from phone On Mar 7, 2013, at 11:53 AM, Daning Wang dan...@netseer.com wrote: We have hive query processing zipped csv files. the query was scanning for 10 days(partitioned by date). data for each day around 130G. The problem is not consistent since if you run it again, it might go through. but the problem has never happened on the smaller jobs(like processing only one days data). We don't have space issue. I have attached log file when problem happening. it is stuck like following(just search 19706 of 49964) 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) Thanks, Daning On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård haavard.kongsga...@gmail.com wrote: hadoop logs? On 6. mars 2013 21:04, Daning Wang dan...@netseer.com wrote: We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running big jobs. Basically all the nodes are dead, from that trasktracker's log looks it went into some kinds of loop forever. All the log entries like this when problem happened. Any idea how to debug the issue? Thanks in advance. 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_16_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_19_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_32_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_00_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_08_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_39_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_04_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_43_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_12_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_28_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_24_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_36_0 0.131468% reduce copy (19706 of 49964 at 0.00 MB/s) 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_16_0 0.131468% reduce copy (19706 of
Re: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade
Please followup on Jenkins failures. Looks like the patch is generated at the wrong directory. On Thu, Feb 28, 2013 at 1:34 AM, Azuryy Yu azury...@gmail.com wrote: Who can review this JIRA(https://issues.apache.org/jira/browse/HDFS-4533), which is very simple. -- Forwarded message -- From: Hadoop QA (JIRA) j...@apache.org Date: Wed, Feb 27, 2013 at 4:53 PM Subject: [jira] [Commented] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade To: azury...@gmail.com [ https://issues.apache.org/jira/browse/HDFS-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588130#comment-13588130] Hadoop QA commented on HDFS-4533: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571164/HDFS-4533.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4008//console This message is automatically generated. start-dfs.sh ignored additional parameters besides -upgrade --- Key: HDFS-4533 URL: https://issues.apache.org/jira/browse/HDFS-4533 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.0.3-alpha Reporter: Fengdong Yu Labels: patch Fix For: 2.0.4-beta Attachments: HDFS-4533.patch start-dfs.sh only takes -upgrade option and ignored others. So If run the following command, it will ignore the clusterId option. start-dfs.sh -upgrade -clusterId 1234 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- http://hortonworks.com/download/
Re: How to setup Cloudera Hadoop to run everything on a localhost?
Can you please take this Cloudera mailing list? On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin anton.asha...@gmail.comwrote: I am trying to run all Hadoop servers on a single Ubuntu localhost. All ports are open and my /etc/hosts file is 127.0.0.1 frigate frigate.domain.locallocalhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters When trying to install cluster Cloudera manager fails with the following messages: Installation failed. Failed to receive heartbeat from agent. I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my provider. What configuration is missing? Thanks! -- http://hortonworks.com/download/
Re: QJM HA and ClusterID
looks start-dfs.sh has a bug. It only takes -upgrade option and ignores clusterId. Consider running the command (which is what start-dfs.sh calls): bin/hdfs start namenode -upgrade -clusterId your cluster ID Please file a bug, if you can, for start-dfs.sh bug which ignores additional parameters. On Tue, Feb 26, 2013 at 4:50 PM, Azuryy Yu azury...@gmail.com wrote: Anybody here? Thanks! On Tue, Feb 26, 2013 at 9:57 AM, Azuryy Yu azury...@gmail.com wrote: Hi all, I've stay on this question several days. I want upgrade my cluster from hadoop-1.0.3 to hadoop-2.0.3-alpha, I've configured QJM successfully. How to customize clusterID by myself. It generated a random clusterID now. It doesn't work when I run: start-dfs.sh -upgrade -clusterId 12345-test Thanks! -- http://hortonworks.com/download/
Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with hive-0.9.0-cdh4.1.2)
Please only use CDH mailing list and do not copy this to hdfs-user. On Thu, Feb 7, 2013 at 7:20 AM, samir das mohapatra samir.help...@gmail.com wrote: Any Suggestion... On Thu, Feb 7, 2013 at 4:17 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I could not see the hive meta store DB under Mysql database Under mysql user hadoop. Example: $ mysql –u root -p $ Add hadoop user (CREATE USER ‘hadoop'@'localhost' IDENTIFIED BY ‘ hadoop';) $GRANT ALL ON *.* TO ‘hadoop'@‘% IDENTIFIED BY ‘hadoop’ $ Example (GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;) Bellow configuration i am follwing property namejavax.jdo.option.ConnectionURL/name valuejdbc:mysql://localhost:3306/hadoop?createDatabaseIfNotExist=true/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.mysql.jdbc.Driver/value /property property namejavax.jdo.option.ConnectionUserName/name valuehadoop/value /property property namejavax.jdo.option.ConnectionPassword/name valuehadoop/value /property Note: Previously i was using cdh3 it was perfectly creating under mysql metastore DB but when i changed cdh3 to cdh4.1.2 with hive as above subject line , It is not creating. Any suggestiong.. Regrads, samir. -- http://hortonworks.com/download/
Re: Application of Cloudera Hadoop for Dataset analysis
Please take this thread to CDH mailing list. On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku sharathchandr...@gmail.com wrote: Hi, I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I would like to get the following clarifications regarding cloudera hadoop distribution. I am using a CDH4 Demo VM for now. 1. After I upload the files into the file browser, if I have to link two-three datasets using a key in those files, what should I do? Do I have to run a query over them? 2. My objective is that I have some data collected over a few years and now, I would like to link all of them, as in a database using keys and then run queries over them to find out particular patterns. Later I would like to implement some Machine learning algorithms on them for predictive analysis. Will this be possible on the demo VM? I am totally new to this. Can I get some help on this? I would be very grateful for the same. -- Thanks and Regards, *Sharath Chandra Guntuku* Undergraduate Student (Final Year) *Computer Science Department* *Email*: f2009...@hyderabad.bits-pilani.ac.in *BITS-Pilani*, Hyderabad Campus Jawahar Nagar, Shameerpet, RR Dist, Hyderabad - 500078, Andhra Pradesh -- http://hortonworks.com/download/
Re: Advice on post mortem of data loss (v 1.0.3)
Sorry to hear you are having issues. Few questions and comments inline. On Fri, Feb 1, 2013 at 8:40 AM, Peter Sheridan psheri...@millennialmedia.com wrote: Yesterday, I bounced my DFS cluster. We realized that ulimit –u was, in extreme cases, preventing the name node from creating threads. This had only started occurring within the last day or so. When I brought the name node back up, it had essentially been rolled back by one week, and I lost all changes which had been made since then. There are a few other factors to consider. 1. I had 3 locations for dfs.name.dir — one local and two NFS. (I originally thought this was 2 local and one NFS when I set it up.) On 1/24, the day which we effectively rolled back to, the second NFS mount started showing as FAILED on dfshealth.jsp. We had seen this before without issue, so I didn't consider it critical. What do you mean by rolled back to? I understand this so far has you have three dirs: l1, nfs1 and nfs2. (l for local disk and nfs for NFS). nfs2 was shown as failed. 1. When I brought the name node back up, because of discovering the above, I had changed dfs.name.dir to 2 local drives and one NFS, excluding the one which had failed. When you brought the namenode backup, with the changed configuration you have l1, l2 and nfs1. Given you have not seen any failures, l1 and nfs1 have the latest edits so far. Correct? How did you add l2? Can you describe this procedure in detail? Reviewing the name node log from the day with the NFS outage, I see: When you say NFS outage here, this is the failure corresponding to nfs2 from above. Is that correct? 2013-01-24 16:33:11,794 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync edit log. java.io.IOException: Input/output error at sun.nio.ch.FileChannelImpl.force0(Native Method) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348) at org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.flushAndSync(FSEditLog.java:215) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:89) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1015) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1666) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 2013-01-24 16:33:11,794 WARN org.apache.hadoop.hdfs.server.common.Storage: Removing storage dir /rdisks/xx Unfortunately, since I wasn't expecting anything terrible to happen, I didn't look too closely at the file system while the name node was down. When I brought it up, the time stamp on the previous checkpoint directory in the dfs.name.dir was right around the above error message. The current directory basically had an fsimage and an empty edits log with the current time stamps. Which storage directory are you talking about here? So: what happened? Should this failure have led to my data loss? I would have thought the local directory would be fine in this scenario. Did I have any other options for data recovery? I am not sure how you concluded that you lost a week's data and the namenode rolled back by one week? Please share the namenode logs corresponding to the restart. This is how it should have worked. - When nfs2 was removed, on both l1 and nfs1 a timestamp is recorded, corresponding to removal of a storage directory. - If there is any checkpointing that happened, it would have also incremented the timestamp. - When the namenode starts up, it chooses l1 and nfs1 because the recorded timestamp is the latest on these directories and loads fsimage and edits from those directories. Namenode also performs checkpoint and writes new consolidated image on l1, l2 and nfs1 and creates empty editlog on l1, l2 and nfs1. If you provide more details on how l2 was added, we may be able to understand what happened. Regards, Suresh -- http://hortonworks.com/download/
Re: ClientProtocol Version mismatch. (client = 69, server = 1)
Please take this up in CDH mailing list. Most likely you are using client that is not from 2.0 release of Hadoop. On Tue, Jan 29, 2013 at 12:33 PM, Kim Chew kchew...@gmail.com wrote: I am using CDH4 (2.0.0-mr1-cdh4.1.2) vm running on my mbp. I was trying to invoke a remote method in the ClientProtocol via RPC, however I am getting this exception. 2013-01-29 11:20:45,810 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 69, server = 1) 2013-01-29 11:20:45,810 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 192.168.140.1:50597: error: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 69, server = 1) org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 69, server = 1) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.getProtocolImpl(ProtobufRpcEngine.java:400) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) I could understand if the Server's ClientProtocol has version number 60 or something else, but how could it has a version number of 1? Thanks. Kim -- http://hortonworks.com/download/
Re: Using distcp with Hadoop HA
Currently, as you have pointed out, client side configuration based failover is used in HA setup. The configuration must define namenode addresses for the nameservices of both the clusters. Are the datanodes belonging to the two clusters running on the same set of nodes? Can you share the configuration you are using, to diagnose the problem? - I am trying to do a distcp from cluster A to cluster B. Since no operations are supported on the standby namenode, I need to specify either the active namenode while using distcp or use the failover proxy provider (dfs.client.failover.proxy.provider.clusterA) where I can specify the two namenodes for cluster B and the failover code inside HDFS will figure it out. - If I use the failover proxy provider, some of my datanodes on cluster A would connect to the namenode on cluster B and vice versa. I am assuming that is because I have configured both nameservices in my hdfs-site.xml for distcp to work.. I have configured dfs.nameservice.id to be the right one but the datanodes do not seem to respect that. What is the best way to use distcp with Hadoop HA configuration without having the datanodes to connect to the remote namenode? Thanks Regards, Dhaval -- http://hortonworks.com/download/
Re: Cohesion of Hadoop team?
On Fri, Jan 18, 2013 at 6:48 AM, Glen Mazza gma...@talend.com wrote: Hi, looking at the derivation of the 0.23.x 2.0.x branches on one hand, and the 1.x branches on the other, as described here: http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCD0CAB8B.1098F%25evans%40yahoo-inc.com%3E One gets the impression the Hadoop committers are split into two teams, with one team working on 0.23.x/2.0.2 and another team working on 1.x, running the risk of increasingly diverging products eventually competing with each other. Is that the case? I am not sure how you came to this conclusion. The way I see it is, all the folks are working on trunk. Subset of this work from trunk is pushed to older releases such as 1.x or 0.23.x. In Apache Hadoop, features always go to trunk first before going to any older releases 1.x or 0.23.x. That means trunk is a superset of all the features. Is there expected to be a Hadoop 3.0 where the results of the two lines of development will merge or is it increasingly likely the subteams will continue their separate routes? 2.0.3-alpha, which is the latest release based off of trunk, that is in final stage of completion should have all the features that all the other releases have. Let me know if there are any exceptions to this that you know of. Thanks, Glen -- Glen Mazza Talend Community Coders - coders.talend.com blog: www.jroller.com/gmazza -- http://hortonworks.com/download/
Re: NN Memory Jumps every 1 1/2 hours
You did free up lot of old generation with reducing young generation, right? The extra 5G of RAM for the old generation should have helped. Based on my calculation, for the current number of objects you have, you need roughly: 12G of total heap with young generation size of 1G. This assumes the average file name size is 32 bytes. In later releases (= 0.20.204), several memory optimization and startup optimizations have been done. It should help you as well. On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.comwrote: So it turns out the issue was just the size of the filesystem. 2012-12-27 16:37:22,390 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4,354,340,042 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you need about 3x ram as your FSImage size. If you do not have enough you die a slow death. On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas sur...@hortonworks.com wrote: Do not have access to my computer. Based on reading the previous email, I do not see any thing suspicious on the list of objects in the histo live dump. I would like to hear from you about if it continued to grow. One instance of this I had seen in the past was related to weak reference related to socket objects. I do not see that happening here though. Sent from phone On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Tried this.. NameNode is still Ruining my Xmas on its slow death march to OOM. http://imagebin.org/240453 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas sur...@hortonworks.comwrote: -XX:NewSize=1G -XX:MaxNewSize=1G -- http://hortonworks.com/download/
Re: NN Memory Jumps every 1 1/2 hours
I do not follow what you mean here. Even when I forced a GC it cleared 0% memory. Is this with new younggen setting? Because earlier, based on the calculation I posted, you need ~11G in old generation. With 6G as the default younggen size, you actually had just enough memory to fit the namespace in oldgen. Hence you might not have seen Full GC freeing up enough memory. Have you tried Full GC with 1G youngen size have you tried this? I supsect you would see lot more memory freeing up. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that Namenode image that you see during checkpointing is the size of file written after serializing file system namespace in memory. This is not what is directly stored in namenode memory. Namenode stores data structures that corresponds to file system directory tree and block locations. Out of this only file system directory is serialized and written to fsimage. Blocks locations are not. On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I am not sure GC had a factor. Even when I forced a GC it cleared 0% memory. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that, but that sure does not seem to be the case. a 5GB image starts off using 10GB of memory and after burn in it seems to use about 15GB memory. So really we say the name node data has to fit in memory but what we really mean is the name node data must fit in memory 3x On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas sur...@hortonworks.com wrote: You did free up lot of old generation with reducing young generation, right? The extra 5G of RAM for the old generation should have helped. Based on my calculation, for the current number of objects you have, you need roughly: 12G of total heap with young generation size of 1G. This assumes the average file name size is 32 bytes. In later releases (= 0.20.204), several memory optimization and startup optimizations have been done. It should help you as well. On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.com wrote: So it turns out the issue was just the size of the filesystem. 2012-12-27 16:37:22,390 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4,354,340,042 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you need about 3x ram as your FSImage size. If you do not have enough you die a slow death. On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas sur...@hortonworks.com wrote: Do not have access to my computer. Based on reading the previous email, I do not see any thing suspicious on the list of objects in the histo live dump. I would like to hear from you about if it continued to grow. One instance of this I had seen in the past was related to weak reference related to socket objects. I do not see that happening here though. Sent from phone On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Tried this.. NameNode is still Ruining my Xmas on its slow death march to OOM. http://imagebin.org/240453 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas sur...@hortonworks.comwrote: -XX:NewSize=1G -XX:MaxNewSize=1G -- http://hortonworks.com/download/ -- http://hortonworks.com/download/
Re: NN Memory Jumps every 1 1/2 hours
I tried your suggested setting and forced GC from Jconsole and once it crept up nothing was freeing up. That is very surprising. If possible, take a live dump when namenode starts up (when memory used is low) and when namenode memory consumption has gone up considerably, closer to the heap limit. BTW, are you running with that configuration - with younggen size set to smaller size? So just food for thought: You said average file name size is 32 bytes. Well most of my data sits in /user/hive/warehouse/ Then I have a tables with partitions. Does it make sense to just move this to /u/h/w? In the directory structure in the namenode memory, there is one inode for user, hive and warehouse. So it would save only couple of bytes. However on fsimage in older releases, /user/hive/warehouse is repeated for every file. This in the later release has been optimized. But these optimizations affect only the fsimage and not the memory consumed on the namenode. Will I be saving 400,000,000 bytes of memory if I do? On Thu, Dec 27, 2012 at 5:41 PM, Suresh Srinivas sur...@hortonworks.com wrote: I do not follow what you mean here. Even when I forced a GC it cleared 0% memory. Is this with new younggen setting? Because earlier, based on the calculation I posted, you need ~11G in old generation. With 6G as the default younggen size, you actually had just enough memory to fit the namespace in oldgen. Hence you might not have seen Full GC freeing up enough memory. Have you tried Full GC with 1G youngen size have you tried this? I supsect you would see lot more memory freeing up. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that Namenode image that you see during checkpointing is the size of file written after serializing file system namespace in memory. This is not what is directly stored in namenode memory. Namenode stores data structures that corresponds to file system directory tree and block locations. Out of this only file system directory is serialized and written to fsimage. Blocks locations are not. On Thu, Dec 27, 2012 at 2:22 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am not sure GC had a factor. Even when I forced a GC it cleared 0% memory. One would think that since the entire NameNode image is stored in memory that the heap would not need to grow beyond that, but that sure does not seem to be the case. a 5GB image starts off using 10GB of memory and after burn in it seems to use about 15GB memory. So really we say the name node data has to fit in memory but what we really mean is the name node data must fit in memory 3x On Thu, Dec 27, 2012 at 5:08 PM, Suresh Srinivas sur...@hortonworks.com wrote: You did free up lot of old generation with reducing young generation, right? The extra 5G of RAM for the old generation should have helped. Based on my calculation, for the current number of objects you have, you need roughly: 12G of total heap with young generation size of 1G. This assumes the average file name size is 32 bytes. In later releases (= 0.20.204), several memory optimization and startup optimizations have been done. It should help you as well. On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.com wrote: So it turns out the issue was just the size of the filesystem. 2012-12-27 16:37:22,390 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4,354,340,042 Basically if the NN image size hits ~ 5,000,000,000 you get f'ed. So you need about 3x ram as your FSImage size. If you do not have enough you die a slow death. On Sun, Dec 23, 2012 at 9:40 PM, Suresh Srinivas sur...@hortonworks.com wrote: Do not have access to my computer. Based on reading the previous email, I do not see any thing suspicious on the list of objects in the histo live dump. I would like to hear from you about if it continued to grow. One instance of this I had seen in the past was related to weak reference related to socket objects. I do not see that happening here though. Sent from phone On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Tried this.. NameNode is still Ruining my Xmas on its slow death march to OOM. http://imagebin.org/240453 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas sur...@hortonworks.comwrote: -XX:NewSize=1G -XX:MaxNewSize=1G -- http://hortonworks.com/download/ -- http://hortonworks.com/download/ -- http://hortonworks.com/download/
Re: NN Memory Jumps every 1 1/2 hours
Do not have access to my computer. Based on reading the previous email, I do not see any thing suspicious on the list of objects in the histo live dump. I would like to hear from you about if it continued to grow. One instance of this I had seen in the past was related to weak reference related to socket objects. I do not see that happening here though. Sent from phone On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Tried this.. NameNode is still Ruining my Xmas on its slow death march to OOM. http://imagebin.org/240453 On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas sur...@hortonworks.comwrote: -XX:NewSize=1G -XX:MaxNewSize=1G
Re: NN Memory Jumps every 1 1/2 hours
This looks to me is because of larger default young generation size in newer java releases - see http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html#heap_size. I can see looking at your GC logs, around 6G space being used for young generation (though I do not see logs related to minor collection). That means for the same given number of objects, you have smaller old generation space and hence old generation collection can no longer perform well. It is unfortunate that such changes are made in java and that causes previously working applications to fail. My suggestion is to not depend on default young generation sizes any more. At large JVM sizes, the defaults chosen by the JDK no longer works well. So I suggest protecting yourself from such changes by explicitly specifying young generation size. Given my experience of tuning GC at Yahoo clusters, at the number of objects you have and total heap size you are allocating, I suggest setting the young generation to 1G. You can do that by adding -XX:NewSize=1G -XX:MaxNewSize=1G Let me know how it goes. On Sat, Dec 22, 2012 at 5:59 PM, Edward Capriolo edlinuxg...@gmail.comwrote: 6333.934: [Full GC 10391746K-9722532K(17194656K), 63.9812940 secs] -- http://hortonworks.com/download/
Re: Is there an additional overhead when storing data in HDFS?
HDFS uses 4GB for the file + checksum data. Default is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data. On Tue, Nov 20, 2012 at 11:00 PM, WangRamon ramon_w...@hotmail.com wrote: Hi All I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why? Thanks Ramon -- http://hortonworks.com/download/
Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs
Vinay, if the Hadoop docs are not clear in this regard, can you please create a jira to add these details? On Fri, Nov 16, 2012 at 12:31 AM, Vinayakumar B vinayakuma...@huawei.comwrote: Hi, ** ** If you are moving from NonHA (single master) to HA, then follow the below steps. **1. **Configure the another namenode’s configuration in the running namenode and all datanode’s configurations. And configure logical *fs.defaultFS* **2. **Configure the shared storage related configuration. **3. **Stop the running NameNode and all datanodes. **4. **Execute ‘hdfs namenode –initializeSharedEdits’ from the existing namenode installation, to transfer the edits to shared storage.** ** **5. **Now format zkfc using ‘hdfs zkfc –formatZK’ and start zkfc using ‘hadoop-daemon.sh start zkfc’ **6. **Now restart the namenode from existing installation. If all configurations are fine, then NameNode should start successfully as STANDBY, then zkfc will make it to ACTIVE. ** ** **7. **Now install the NameNode in another machine (master2) with same configuration, except ‘dfs.ha.namenode.id’. **8. **Now instead of format, you need to copy the name dir contents from another namenode (master1) to master2’s name dir. For this you are having 2 options. **a. **Execute ‘hdfs namenode -bootStrapStandby’ from the master2 installation. **b. **Using ‘scp’ copy entire contents of name dir from master1 to master2’s name dir. **9. **Now start the zkfc for second namenode ( No need to do zkfc format now). Also start the namenode (master2) ** ** Regards, Vinay- *From:* Uma Maheswara Rao G [mailto:mahesw...@huawei.com] *Sent:* Friday, November 16, 2012 1:26 PM *To:* user@hadoop.apache.org *Subject:* RE: High Availability - second namenode (master2) issue: Incompatible namespaceIDs ** ** If you format namenode, you need to cleanup storage directories of DataNode as well if that is having some data already. DN also will have namespace ID saved and compared with NN namespaceID. if you format NN, then namespaceID will be changed and DN may have still older namespaceID. So, just cleaning the data in DN would be fine. Regards, Uma -- *From:* hadoop hive [hadooph...@gmail.com] *Sent:* Friday, November 16, 2012 1:15 PM *To:* user@hadoop.apache.org *Subject:* Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs Seems like you havn't format your cluster (if its 1st time made). On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk a...@hsk.hk wrote: Hi, ** ** Please help! ** ** I have installed a Hadoop Cluster with a single master (master1) and have HBase running on the HDFS. Now I am setting up the second master (master2) in order to form HA. When I used JPS to check the cluster, I found : ** ** 2782 Jps 2126 NameNode 2720 SecondaryNameNode i.e. The datanode on this server could not be started ** ** In the log file, found: 2012-11-16 10:28:44,851 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1356148070; datanode namespaceID = 1151604993 ** ** ** ** ** ** One of the possible solutions to fix this issue is to: stop the cluster, reformat the NameNode, restart the cluster. QUESTION: As I already have HBASE running on the cluster, if I reformat the NameNode, do I need to reinstall the entire HBASE? I don't mind to have all data lost as I don't have many data in HBASE and HDFS, however I don't want to re-install HBASE again. ** ** ** ** On the other hand, I have tried another solution: stop the DataNode, edit the namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the datanode, it doesn't work: Warning: $HADOOP_HOME is deprecated. starting master2, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out* *** Exception in thread main java.lang.NoClassDefFoundError: master2 Caused by: java.lang.ClassNotFoundException: master2 at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: master2. Program will exit. QUESTION: Any other solutions? ** ** ** ** ** ** Thanks ** ** ** ** ** ** ** ** ** ** ** ** -- http://hortonworks.com/download/
Re: could only be replicated to 0 nodes, instead of 1
- A datanode is typically kept free with up to 5 free blocks (HDFS block size) of space. - Disk space is used by mapreduce jobs to store temporary shuffle spills also. This is what dfs.datanode.du.reserved is used to configure. The configuration is available in hdfs-site.xml. If you have not configured it then reserved space is 0. Not only mapreduce, other files also might take up the disk space. When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. On Tue, Sep 4, 2012 at 9:41 AM, Keith Wiley kwi...@keithwiley.com wrote: I've been running up against the good old fashioned replicated to 0 nodes gremlin quite a bit recently. My system (a set of processes interacting with hadoop, and of course hadoop itself) runs for a while (a day or so) and then I get plagued with these errors. This is a very simple system, a single node running pseudo-distributed. Obviously, the replication factor is implicitly 1 and the datanode is the same machine as the namenode. None of the typical culprits seem to explain the situation and I'm not sure what to do. I'm also not sure how I'm getting around it so far. I fiddle desperately for a few hours and things start running again, but that's not really a solution...I've tried stopping and restarting hdfs, but that doesn't seem to improve things. So, to go through the common suspects one by one, as quoted on http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo: • No DataNode instances being up and running. Action: look at the servers, see if the processes are running. I can interact with hdfs through the command line (doing directory listings for example). Furthermore, I can see that the relevant java processes are all running (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker). • The DataNode instances cannot talk to the server, through networking or Hadoop configuration problems. Action: look at the logs of one of the DataNodes. Obviously irrelevant in a single-node scenario. Anyway, like I said, I can perform basic hdfs listings, I just can't upload new data. • Your DataNode instances have no hard disk space in their configured data directories. Action: look at the dfs.data.dir list in the node configurations, verify that at least one of the directories exists, and is writeable by the user running the Hadoop processes. Then look at the logs. There's plenty of space, at least 50GB. • Your DataNode instances have run out of space. Look at the disk capacity via the Namenode web pages. Delete old files. Compress under-used files. Buy more disks for existing servers (if there is room), upgrade the existing servers to bigger drives, or add some more servers. Nope, 50GBs free, I'm only uploading a few KB at a time, maybe a few MB. • The reserved space for a DN (as set in dfs.datanode.du.reserved is greater than the remaining free space, so the DN thinks it has no free space I grepped all the files in the conf directory and couldn't find this parameter so I don't really know anything about it. At any rate, it seems rather esoteric, I doubt it is related to my problem. Any thoughts on this? • You may also get this message due to permissions, eg if JT can not create jobtracker.info on startup. Meh, like I said, the system basicaslly works...and then stops working. The only explanation that would really make sense in that context is running out of space...which isn't happening. If this were a permission error, or a configuration error, or anything weird like that, then the whole system would never get up and running in the first place. Why would a properly running hadoop system start exhibiting this error without running out of disk space? THAT's the real question on the table here. Any ideas? Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy. -- Edwin A. Abbott, Flatland -- http://hortonworks.com/download/
Re: could only be replicated to 0 nodes, instead of 1
Keith, Assuming that you were seeing the problem when you captured the namenode webUI info, it is not related to what I suspect. This might be a good question for CDH forums given this is not an Apache release. Regards, Suresh On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley kwi...@keithwiley.com wrote: On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote: When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. Sure thing. Thanks. Here's what I currently see. It looks like the problem isn't the datanode, but rather the namenode. Would you agree with that assessment? NameNode 'localhost:9000' Started: Tue Sep 04 10:06:52 PDT 2012 Version: 0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 Compiled:Thu Jan 26 11:55:16 PST 2012 by root from Unknown Upgrades:There are no upgrades in progress. Browse the filesystem Namenode Logs Cluster Summary Safe mode is ON. Resources are low on NN. Safe mode must be turned off manually. 1639 files and directories, 585 blocks = 2224 total. Heap Size is 39.55 MB / 888.94 MB (4%) Configured Capacity : 49.21 GB DFS Used : 9.9 MB Non DFS Used : 2.68 GB DFS Remaining: 46.53 GB DFS Used%: 0.02 % DFS Remaining% : 94.54 % Live Nodes : 1 Dead Nodes : 0 Decommissioning Nodes: 0 Number of Under-Replicated Blocks: 5 NameNode Storage: Storage Directory TypeState /var/lib/hadoop-0.20/cache/hadoop/dfs/name IMAGE_AND_EDITS Active Cloudera's Distribution including Apache Hadoop, 2012. Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com And what if we picked the wrong religion? Every week, we're just making God madder and madder! -- Homer Simpson -- http://hortonworks.com/download/
Re: Hadoop WebUI
Clement, To get the details related to how to contribute - see http://wiki.apache.org/hadoop/HowToContribute. UI is simple because it serves the purpose. More sophisticated UI for management and monitoring is being done in Ambari, see - http://incubator.apache.org/ambari/. The core hadoop UIs could be better. Please create a jira with your proposal and a brief design document. Create separate jira for HDFS and MapReduce (depending on where you want to to the work). Regards, Suresh On Wed, Aug 1, 2012 at 10:27 AM, Clement Jebakumar jeba.r...@gmail.comwrote: hi, I have observed for very longtime, that hadoop ui is simple.(ofcourse it has information which are required). but still Is there any reason for it? I thought of working on the UI as it is required for my cloud setup.. If i work on this, i can give the patch of my contributution to hadoop. how i can do my contrib to hadoop? Now currenly i am doing my updates in trunk.. is that ok to do with trunk? Give your views? *Clement Jebakumar,* 111/27 Keelamutharamman Kovil Street, Tenkasi, 627 811 http://www.declum.com/clement.html -- http://hortonworks.com/download/
Re: Namenode and Jobtracker dont start
Can you share information on the java version that you are using. - Is it as obvious as some previous processes still running and new processes cannot bind to the port? - Another pointer - http://stackoverflow.com/questions/8360913/weird-java-net-socketexception-permission-denied-connect-error-when-running-groo On Wed, Jul 18, 2012 at 7:29 AM, Björn-Elmar Macek ma...@cs.uni-kassel.dewrote: Hi, i have lately been running into problems since i started running hadoop on a cluster: The setup is the following: 1 Computer is NameNode and Jobtracker 1 Computer is SecondaryNameNode 2 Computers are TaskTracker and DataNode I ran into problems with running the wordcount example: NameNode and Jobtracker do not start properly both having connection problems of some kind. And this is although ssh is configured that way, that no prompt happens when i connect from any node in the cluster to any other. Is there any reason why this happens? The logs look like the following: \ JOBTRACKER**__ 2012-07-18 16:08:05,808 INFO org.apache.hadoop.mapred.**JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = its-cs100.its.uni-kassel.de/**141.51.205.10http://its-cs100.its.uni-kassel.de/141.51.205.10 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.2 STARTUP_MSG: build = https://svn.apache.org/repos/** asf/hadoop/common/branches/**branch-1.0.2https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2-r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 / 2012-07-18 16:08:06,479 INFO org.apache.hadoop.metrics2.**impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-07-18 16:08:06,534 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2012-07-18 16:08:06,554 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-07-18 16:08:06,554 INFO org.apache.hadoop.metrics2.**impl.MetricsSystemImpl: JobTracker metrics system started 2012-07-18 16:08:07,157 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source QueueMetrics,q=default registered. 2012-07-18 16:08:10,395 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source ugi registered. 2012-07-18 16:08:10,417 INFO org.apache.hadoop.security.** token.delegation.**AbstractDelegationTokenSecretM**anager: Updating the current master key for generating delegation tokens 2012-07-18 16:08:10,436 INFO org.apache.hadoop.mapred.**JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2012-07-18 16:08:10,438 INFO org.apache.hadoop.util.**HostsFileReader: Refreshing hosts (include/exclude) list 2012-07-18 16:08:10,440 INFO org.apache.hadoop.security.** token.delegation.**AbstractDelegationTokenSecretM**anager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2012-07-18 16:08:10,465 INFO org.apache.hadoop.security.** token.delegation.**AbstractDelegationTokenSecretM**anager: Updating the current master key for generating delegation tokens 2012-07-18 16:08:10,510 INFO org.apache.hadoop.mapred.**JobTracker: Starting jobtracker with owner as bmacek 2012-07-18 16:08:10,620 WARN org.apache.hadoop.mapred.**JobTracker: Error starting tracker: java.net.SocketException: Permission denied at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.**ServerSocketChannelImpl.bind(** ServerSocketChannelImpl.java:**119) at sun.nio.ch.**ServerSocketAdaptor.bind(** ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.**bind(Server.java:225) at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**301) at org.apache.hadoop.ipc.Server.**init(Server.java:1483) at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506) at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.** java:2306) at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.** java:2192) at org.apache.hadoop.mapred.**JobTracker.init(JobTracker.** java:2186) at org.apache.hadoop.mapred.**JobTracker.startTracker(** JobTracker.java:300) at org.apache.hadoop.mapred.**JobTracker.startTracker(** JobTracker.java:291) at org.apache.hadoop.mapred.**JobTracker.main(JobTracker.**java:4978) 2012-07-18 16:08:13,861 WARN org.apache.hadoop.metrics2.**impl.MetricsSystemImpl: Source name QueueMetrics,q=default already exists! 2012-07-18 16:08:13,885 WARN org.apache.hadoop.metrics2.**impl.MetricsSystemImpl: Source name ugi already exists! 2012-07-18 16:08:13,885 INFO org.apache.hadoop.security.**
Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?
This change in merged into branch-1 and will be available in release 1.1. On Mon, May 7, 2012 at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote: Can someone backport HADOOP-6546: BloomMapFile can return false negatives to branch-1 for the next 1+ release? Without this fix BloomMapFile is somewhat useless because having no false negatives is a core feature of BloomFilters. I am surprised that both hadoop 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.
Re: can HADOOP-6546: BloomMapFile can return false negatives get backported to branch-1?
I have marked it for 1.1. I will follow up on promoting the path. Regards, Suresh On May 7, 2012, at 6:40 PM, Jim Donofrio donofrio...@gmail.com wrote: Can someone backport HADOOP-6546: BloomMapFile can return false negatives to branch-1 for the next 1+ release? Without this fix BloomMapFile is somewhat useless because having no false negatives is a core feature of BloomFilters. I am surprised that both hadoop 1.0.2 and cdh3u3 do not have this fix from over 2 years ago.
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
This probably is a more relevant question in CDH mailing lists. That said, what Edward is suggesting seems reasonable. Reduce replication factor, decommission some of the nodes and create a new cluster with those nodes and do distcp. Could you share with us the reasons you want to migrate from Apache 205? Regards, Suresh On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving stuff. On Thu, May 3, 2012 at 7:25 AM, Michel Segel michael_se...@hotmail.com wrote: Ok... When you get your new hardware... Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the directories/files to the new cluster. As you gain space, decommission a node, rebalance, add node to new cluster... It's a slow process. Should I remind you to make sure you up you bandwidth setting, and to clean up the hdfs directories when you repurpose the nodes? Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote: Yeah I know :-) and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.com wrote: Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please tell me this isn't your production cluster... (Strong hint to Strata and Cloudea... You really want to accept my upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote: Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say from 0.20 to 0.20.205 but here I am downgrading from 0.20.205 to 0.20 (cdh3) Is this possible? On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi prash1...@gmail.com wrote: Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I have only 2 TB of free space On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote: Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, which is based on 0.20 Now it is actually a downgrade as 0.20.205's namenode info has to be used by 0.20's namenode. Any idea how I can achieve what I am trying to do? Thanks. On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar nitinpawar...@gmail.com wrote: i can think of following options 1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did a copyFromLocal On Thu, May 3, 2012 at 11:41 AM, Austin Chungath austi...@gmail.com wrote: Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data
Re: hadoop permission guideline
Can you please take this discussion CDH mailing list? On Mar 22, 2012, at 7:51 AM, Michael Wang michael.w...@meredith.com wrote: I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to install all needed packages. When it was installed, the root is used. I found the installation created some users, such as hdfs, hive, mapred,hue,hbase... After the installation, should we change some permission or ownership of some directories/files? For example, to use HIVE. It works fine with root user, since the metatore directory belongs to root. But in order to let other user use HIVE, I have to change metastore ownership to a specific non-root user, then it works. Is it the best practice? Another example is the start-all.sh, stop-all.sh they all belong to root. Should I change them to other user? I guess there are more cases... Thanks, This electronic message, including any attachments, may contain proprietary, confidential or privileged information for the sole use of the intended recipient(s). You are hereby notified that any unauthorized disclosure, copying, distribution, or use of this message is prohibited. If you have received this message in error, please immediately notify the sender by reply e-mail and delete it.
Re: Issue when starting services on CDH3
Guys, can you please take this up in CDH related mailing lists. On Thu, Mar 15, 2012 at 10:01 AM, Manu S manupk...@gmail.com wrote: Because for large clusters we have to run namenode in a single node, datanode in another nodes So we can start namenode and jobtracker in master node and datanode n tasktracker in slave nodes For getting more clarity You can check the service status after starting Verify these: dfs.name.dir hdfs:hadoop drwx-- dfs.data.dir hdfs:hadoop drwx-- mapred.local.dir mapred:hadoop drwxr-xr-x Please follow each steps in this link https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster On Mar 15, 2012 9:52 PM, Manish Bhoge manishbh...@rocketmail.com wrote: Ys, I understand the order and I formatted namenode before starting services. As I suspect there may be ownership and an access issue. Not able to nail down issue exactly. I also have question why there are 2 routes to start services. When we have start-all.sh script then why need to go to init.d to start services?? Thank you, Manish Sent from my BlackBerry, pls excuse typo -Original Message- From: Manu S manupk...@gmail.com Date: Thu, 15 Mar 2012 21:43:26 To: common-user@hadoop.apache.org; manishbh...@rocketmail.com Reply-To: common-user@hadoop.apache.org Subject: Re: Issue when starting services on CDH3 Did you check the service status? Is it like dead, but pid exist? Did you check the ownership and permissions for the dfs.name.dir,dfs.data.dir,mapped.local.dir etc ? The order for starting daemons are like this: 1 namenode 2 datanode 3 jobtracker 4 tasktracker Did you format the namenode before starting? On Mar 15, 2012 9:31 PM, Manu S manupk...@gmail.com wrote: Dear manish Which daemons are not starting? On Mar 15, 2012 9:21 PM, Manish Bhoge manishbh...@rocketmail.com wrote: I have CDH3 installed in standalone mode. I have install all hadoop components. Now when I start services (namenode,secondary namenode,job tracker,task tracker) I can start gracefully from /usr/lib/hadoop/ ./bin/start-all.sh. But when start the same servises from /etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to start Hue also which is in init.d that also I couldn't start. Here I suspect authentication issue. Because all the services in init.d are under root user and root group. Please suggest I am stuck here. I tried hive and it seems it running fine. Thanks Manish. Sent from my BlackBerry, pls excuse typo
Re: Questions about HDFS’s placement policy
See my comments inline: On Wed, Mar 14, 2012 at 9:24 AM, Giovanni Marzulli giovanni.marzu...@ba.infn.it wrote: Hello, I'm trying HDFS on a small test cluster and I need to clarify some doubts about hadoop behaviour. Some details of my cluster: Hadoop version: 0.20.2 I have two racks (rack1, rack2). Three datanodes for every rack. Replication factor is set to 3. HDFS’s placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. Instead, I noticed that sometimes, a few blocks of files are stored as follows: two replicas in the local rack and a replica in a different rack. Are there exceptions that cause different behaviour than default placement policy? Your description of replica placement is correct. However a node chosen based on this placement may not be a good target, due to the traffic on the node, remaining space etc. See BlockPlacementPolicyDefault#isGoodTarget(). Given the small cluster size, you may be seeing different behavior based on load of individual nodes, racks etc. Likewise, at times some blocks are read from nodes in the remote rack instead of nodes in the local rack. Why does it happen? This is surprising. Not sure if the topology is correctly configired. Another thing: if I have two datacenters and two racks for each of them (so a hierarchical network topology), where two remote replicas arestored? Does Hadoop consider the hierarchy and stores one replica in the local datacenter and two replicas in the other datacenter? Or the two replicas are stored in a totally random rack? Hadoop clusters are not spread across datacenters. Regards, Suresh
Re: What is the NEW api?
there are many people talking about the NEW API This might be related to releases 0.21 or later, where append and related functionality is re-implemented. 1.0 comes from 0.20.205 and has same API as 0.20-append. Sent from phone On Mar 11, 2012, at 6:27 PM, WangRamon ramon_w...@hotmail.com wrote: Hi all I've been with Hadoop-0.20-append for a few time and I plan to upgrade to 1.0.0 release, but i find there are many people taking about the NEW API, so I'm lost, can anyone please tell me what is the new API? Is the OLD one available in the 1.0.0 release? Thanks CheersRamon
Re: Backupnode in 1.0.0?
On Thu, Feb 23, 2012 at 12:41 AM, Jeremy Hansen jer...@skidrow.la wrote: Thanks. Could you clarify what BackupNode does? -jeremy Namenode currently keeps the entire file system namespace in memory. It logs the write operations (create, delete file etc.) into a journal file called editlog. This journal needs to be merged with the file system image periodically to avoid journal file growing to a large size. This is called checkpointing. Checkpoint also reduces the startup time, since the namenode need not load large editlog file. Prior to release 0.21, another node called SecondaryNamenode was used for checkpointing. It periodically gets the file system image and edit, load it into memory and write checkpoint image. This image is then then shipped to the Namenode. In 0.21, BackupNode was introduced. Unlike SecondaryNamenode, it gets edits streamed from the Namenode. It periodically writes the checkpoint image and ships it back to Namenode. The goal was for this to become Standby node, towards Namenode HA. Konstantin and few others are pursuing this. I have not seen any deployments of BackupNode in production. I would love to hear if any one has deployed it in production and how stable it is. Regards, Suresh
Re: Backupnode in 1.0.0?
Joey, Can you please answer the question from in the context of Apache releases. Not sure if CDH4b1 needs to be mentioned in the context of this mailing list. Regards, Suresh On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote: Check out this branch for the 0.22 version of Bigtop: https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/ However, I don't think BackupNode is what you want. It sounds like you want HA which is coming in (hopefully) 0.23.2 and is also available today in CDH4b1. -Joey On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote: By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: Setting up Federated HDFS
On Tue, Feb 7, 2012 at 4:51 PM, Chandrasekar chandruseka...@gmail.comwrote: In which file should i specify all this information about nameservices and the list of namenodes? hdfs-site.xml is the appropriate place, since it is hdfs-specific configuration. If there are multiple namenodes, then which one should i specify in core-site.xml as fs.defaultFS? core-site.xml is the right place for fs.defaultFS. Given you have multiple namespaces from in federation setup, fs.defaultFS should point to ViewFileSystem for a unified view of the namespaces to the clients. There is an open bug HDFS-2558 to track this. I will get to this as soon as I can. Regards, Suresh
Re: HDFS Federation Exception
Thanks for figuring that. Could you create an HDFS Jira for this issue? On Wednesday, January 11, 2012, Praveen Sripati praveensrip...@gmail.com wrote: Hi, The documentation (1) suggested to set the `dfs.namenode.rpc-address.ns1` property to `hdfs://nn-host1:rpc-port` in the example. Changing the value to `nn-host1:rpc-port` (removing hdfs://) solved the problem. The document needs to be updated. (1) - http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html Praveen On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, Got the latest code to see if any bugs were fixed and did try federation with the same configuration, but was getting similar exception. 2012-01-11 15:25:35,321 ERROR namenode.NameNode (NameNode.java:main(803)) - Exception in namenode join java.io.IOException: Failed on local exception: java.net.SocketException: Unresolved address; Host Details : local host is: hdfs; destination host is: (unknown):0; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:895) at org.apache.hadoop.ipc.Server.bind(Server.java:231) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:313) at org.apache.hadoop.ipc.Server.init(Server.java:1600) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:576) at org.apache.hadoop.ipc.WritableRpcEngine$Server.init(WritableRpcEngine.java:322) at org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:282) at org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.init(NameNodeRpcServer.java:145) at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:356) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:334) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:458) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:450) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799) Caused by: java.net.SocketException: Unresolved address at sun.nio.ch.Net.translateToSocketException(Net.java:58) at sun.nio.ch.Net.translateException(Net.java:84) at sun.nio.ch.Net.translateException(Net.java:90) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:61) at org.apache.hadoop.ipc.Server.bind(Server.java:229) ... 14 more Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:30) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:122) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) ... 15 more Regards, Praveen On Wed, Jan 11, 2012 at 12:24 PM, Praveen Sripati praveensrip...@gmail.com wrote: Hi, I am trying to setup a HDFS federation and getting the below error. Also, pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I miss something in the configuration files? 2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803)) - Exception in namenode join java.lang.IllegalArgumentException: Can't parse port '' at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266) at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329) at org.apache.hadoop.hdfs.server.namenode.N
Re: datanode failing to start
Can you please send your notes on what info is out of date or better still create a jira so that it can be addressed. On Fri, Jan 6, 2012 at 3:11 PM, Dave Kelsey da...@gamehouse.com wrote: gave up and installed version 1. it installed correctly and worked, thought the instructions for setup and the location of scripts and configs are now out of date. D On 1/5/2012 10:25 AM, Dave Kelsey wrote: java version 1.6.0_29 hadoop: 0.20.203.0 I'm attempting to setup the pseudo-distributed config on a mac 10.6.8. I followed the steps from the QuickStart (http://wiki.apache.org./** hadoop/QuickStart http://wiki.apache.org./hadoop/QuickStart) and succeeded with Stage 1: Standalone Operation. I followed the steps for Stage 2: Pseudo-distributed Configuration. I set the JAVA_HOME variable in conf/hadoop-env.sh and I changed tools.jar to the location of classes.jar (a mac version of tools.jar) I've modified the three .xml files as described in the QuickStart. ssh'ing to localhost has been configured and works with passwordless authentication. I formatted the namenode with bin/hadoop namenode -format as the instructions say This is what I see when I run bin/start-all.sh root# bin/start-all.sh starting namenode, logging to /Users/admin/hadoop/hadoop-0.** 20.203.0/bin/../logs/hadoop-**root-namenode-Hoot-2.local.out localhost: starting datanode, logging to /Users/admin/hadoop/hadoop-0.** 20.203.0/bin/../logs/hadoop-**root-datanode-Hoot-2.local.out localhost: Exception in thread main java.lang.**NoClassDefFoundError: server localhost: Caused by: java.lang.**ClassNotFoundException: server localhost: at java.net.URLClassLoader$1.run(** URLClassLoader.java:202) localhost: at java.security.**AccessController.doPrivileged(**Native Method) localhost: at java.net.URLClassLoader.** findClass(URLClassLoader.java:**190) localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:** 306) localhost: at sun.misc.Launcher$**AppClassLoader.loadClass(** Launcher.java:301) localhost: at java.lang.ClassLoader.**loadClass(ClassLoader.java:** 247) localhost: starting secondarynamenode, logging to /Users/admin/hadoop/hadoop-0.**20.203.0/bin/../logs/hadoop-** root-secondarynamenode-Hoot-2.**local.out starting jobtracker, logging to /Users/admin/hadoop/hadoop-0.** 20.203.0/bin/../logs/hadoop-**root-jobtracker-Hoot-2.local.**out localhost: starting tasktracker, logging to /Users/admin/hadoop/hadoop-0. **20.203.0/bin/../logs/hadoop-**root-tasktracker-Hoot-2.local.**out There are 4 processes running: ps -fax | grep hadoop | grep -v grep | wc -l 4 They are: SecondaryNameNode TaskTracker NameNode JobTracker I've searched to see if anyone else has encountered this and not found anything d p.s. I've also posted this to core-u...@hadoop.apache.org which I've yet to find how to subscribe to.
Re: HDFS load balancing for non-local reads
Currently it sorts the block locations as: # local node # local rack node # random order of remote nodes See DatanodeManager#sortLocatedBlock(...) and NetworkTopology#pseudoSortByDistance(...). You can play around with other policies by plugging in different NetworkTopology. On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay rbc...@ncsu.edu wrote: Hi- ** ** How does the NameNode handle load balancing of non-local reads with multiple block locations when locality is equal? ** ** IE, if the client is equidistant (same rack) from 2 DataNodes hosting the same block, does the NameNode consider current client count or any other load indicators when deciding which DataNode will satisfy the read request? Or, is the client provided a list of all split locations and is allowed to make this choice themselves? ** ** Thanks! ** ** -Ben ** **
Re: HDFS Backup nodes
Srivas, As you may know already, NFS is just being used in the first prototype for HA. Two options for editlog store are: 1. Using BookKeeper. Work has already completed on trunk towards this. This will replace need for NFS to store the editlogs and is highly available. This solution will also be used for HA. 2. We have a short term goal also to enable editlogs going to HDFS itself. The work is in progress. Regards, Suresh -- Forwarded message -- From: M. C. Srivas mcsri...@gmail.com Date: Sun, Dec 11, 2011 at 10:47 PM Subject: Re: HDFS Backup nodes To: common-user@hadoop.apache.org You are out of luck if you don't want to use NFS, and yet want redundancy for the NN. Even the new NN HA work being done by the community will require NFS ... and the NFS itself needs to be HA. But if you use a Netapp, then the likelihood of the Netapp crashing is lower than the likelihood of a garbage-collection-of-death happening in the NN. [ disclaimer: I don't work for Netapp, I work for MapR ] On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote: Thanks Joey. We've had enough problems with nfs (mainly under very high load) that we thought it might be riskier to use it for the NN. randy On 12/07/2011 06:46 PM, Joey Echeverria wrote: Hey Rand, It will mark that storage directory as failed and ignore it from then on. In order to do this correctly, you need a couple of options enabled on the NFS mount to make sure that it doesn't retry infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10 options set. -Joey On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net wrote: What happens then if the nfs server fails or isn't reachable? Does hdfs lock up? Does it gracefully ignore the nfs copy? Thanks, randy - Original Message - From: Joey Echeverriaj...@cloudera.com To: common-user@hadoop.apache.org Sent: Wednesday, December 7, 2011 6:07:58 AM Subject: Re: HDFS Backup nodes You should also configure the Namenode to use an NFS mount for one of it's storage directories. That will give the most up-to-date back of the metadata in case of total node failure. -Joey On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com wrote: This means still we are relying on Secondary NameNode idealogy for Namenode's backup. Can OS-mirroring of Namenode is a good alternative keep it alive all the time ? Thanks, Praveenesh On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: AFAIK backup node introduced in 0.21 version onwards. __**__ From: praveenesh kumar [praveen...@gmail.com] Sent: Wednesday, December 07, 2011 12:40 PM To: common-user@hadoop.apache.org Subject: HDFS Backup nodes Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: Difference between DFS Used and Non-DFS Used
non DFS storage is not required, it is provided as information only to shown how the storage is being used. The available storage on the disks is used for both DFS and non DFS (mapreduce shuffle output and any other files that could be on the disks). See if you have unnecessary files or shuffle output that is lingering on these disks, that is contributing to 250GB. Delete the unneeded files and you should be able to reclaim some of the 250GB. On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla sagar_shu...@persistent.co.inwrote: Thanks Harsh. My first question still remains unanswered - Why does it require non-DFS storage?. If it is cache data then it should get flushed from the system after certain interval of time. And if it is useful data then it should have been part of used DFS data. I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used is around 250 GB which is quite ridiculous. Thanks, Sagar -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Friday, July 08, 2011 4:42 PM To: common-user@hadoop.apache.org Subject: Re: Difference between DFS Used and Non-DFS Used It is just for information's sake (cause it can be computed with the data collected). The space is accounted just to let you know that there's something being stored on the DataNodes apart from just the HDFS data, in case you are running out of space. On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla sagar_shu...@persistent.co.in wrote: Hi Harsh, Thanks for your reply. But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ? Ideally, it should have been part of same storage. Thanks, Sagar -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, July 07, 2011 6:04 PM To: common-user@hadoop.apache.org Subject: Re: Difference between DFS Used and Non-DFS Used DFS used is a count of all the space used by the dfs.data.dirs. The non-dfs used space is whatever space is occupied beyond that (which the DN does not account for). On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla sagar_shu...@persistent.co.in wrote: Hi, What is the difference between DFS Used and Non-DFS used ? Thanks, Sagar DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Harsh J DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Harsh J DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Regards, Suresh
Re: Rapid growth in Non DFS Used disk space
dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-* Things to check: - Are there other files that are not blk related? - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks) On Fri, May 13, 2011 at 1:41 PM, Kester, Scott skes...@weather.com wrote: We have a job that cleans up the mapred.local directory, so that¹s not it. I have done some further looking at data usage on the datanodes and 99% of the space used is under the dfs.data.dir/current directory. What would be under 'current' that wasn't part of HDFS? On 5/13/11 3:12 PM, Allen Wittenauer a...@apache.org wrote: On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: 2) Any ideas on what is driving the growth in Non DFS Used space? I looked for things like growing log files on the datanodes but didn't find anything. Logs are one possible culprit. Another is to look for old files that might be orphaned in your mapred.local.dir - there have been bugs in the past where we've leaked files. If you shut down the TaskTrackers, you can safely delete everything from within mapred.local.dirs. Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. The TT doesn't properly clean up after itself. -- Regards, Suresh
Re: CDH and Hadoop
On Thu, Mar 24, 2011 at 7:04 PM, Rita rmorgan...@gmail.com wrote: Oh! Thats for the heads up on that... I guess I will go with the cloudera source then On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch dar...@darose.net wrote: They do, but IIRC, they recently announced that they're going to be discontinuing it. DR Yahoo! discontinued the distribution in favor of making Apache Hadoop the most stable and the go to place for Hadoop releases. So all the advantages of using Yahoo distribution, you get in Apache Hadoop release. Please see the details of announcement here: http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/
Re: hadoop fs -du hbase table size
When you brought down the DN, the blocks in it were replicated to the remaining DNs. When the DN was added back, the blocks in it were over replicated, resulting in deletion of the extra replica. On Mon, Mar 14, 2011 at 7:34 AM, Alex Baranau alex.barano...@gmail.comwrote: Hello, As far as I understand, since hadoop fs -du command uses Linux' du internally this mean that the number of replicas (at the moment of command run) affect the result. Is that correct? I have the following case. I have a small (1 master + 5 slaves each with DN, TT RS) test HBase cluster with replication set to 2. The tables data size is monitoried with the help of hadoop fs -du command. There's a table which is constantly written to: data is only added in it. At some point I decided to reconfigure one of the slaves and shut it down. After reconfiguration (HBase already marked it as dead one) I brought it up again. Things went smoothly. However on the table size graph (I drew from data fetched with hadoop fs -du command) I noticed a little spike up on data size and then it went down to the normal/expected values. Can it be so that at some point of the taking out/reconfiguring/adding back node procedure at some point blocks were over-replicated? I'd expect them to be under-replicated for some time (as DN is down) and I'd expect to see the inverted spike: small decrease in data amount and then back to expected rate (after all blocks got replicated again). Any ideas? Thank you, Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase -- Regards, Suresh
Re: copy a file from hdfs to local file system with java
For an example how it is done, look at FsShell#copyToLocal() and its internal implementation. It uses FileUtil#copy() method to do this copying. On Fri, Feb 25, 2011 at 5:08 AM, Alessandro Binhara binh...@gmail.comwrote: How to copy a file from a HDS to local file system with a JAVA API ? where i can find a documentation and example about it? thanks -- Regards, Suresh
Re: corrupt blocks after restart
The problem is that replicas for 3609 blocks are not reported to namenode. Do you have datanodes in exclude file? What is the number of registered nodes before start compared to what it is now? Removing all the datanodes from exclude file (if there are any) and restarting the cluster should fix the issue. On Fri, Feb 18, 2011 at 5:43 PM, Chris Tarnas c...@email.com wrote: I've hit a data curroption problem in a system we were rapidly loading up, and I could really use some pointers on where to look for the root of the problem as well as any possible solutions. I'm running the cdh3b3 build of Hadoop 0.20.2. I experienced some issues with a client (Hbase regionserver) getting an IOException talking with the namenode. I thought the namenode might have been resourced starved (maybe not enough RAM). I first ran a fsck and the filesystem was healthy and then shutdown hadoop (stop-all.sh) to update the hadoop-env.sh to allocate more memory to the namenode, then started up hadoop again (start-all.sh). After starting up the server I ran another fsck and now the filesystem is corrupt and about 1/3 or less of the size it should be. All of the datanodes are online, but it is as if they are all incomplete. I've tried using the previous checkpoint from the secondary namenode to no avail. This is the fsck summary blocks of total size 442716 B.Status: CORRUPT Total size:416302602463 B Total dirs:7571 Total files: 7525 Total blocks (validated): 8516 (avg. block size 48884758 B) CORRUPT FILES:3343 MISSING BLOCKS: 3609 MISSING SIZE: 169401218659 B CORRUPT BLOCKS: 3609 Minimally replicated blocks: 4907 (57.62095 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 4740 (55.659935 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 0.7557539 Corrupt blocks:3609 Missing replicas: 8299 (128.94655 %) Number of data-nodes: 10 Number of racks: 1 The namenode had quite a few WARNS like this one (The list of excluded nodes is all of the nodes in the system!) 2011-02-18 17:06:40,506 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1(excluded: 10.56.24.15:50010, 10.56.24.19:50010, 10.56.24.16:50010, 10.56.24.20:50010, 10.56.24.14:50010, 10.56.24.17:50010, 10.56.24.13:50010, 10.56.24.18:50010, 10.56.24.11:50010, 10.56.24.12:50010) I grepped for errors and warns on all 10 of the datanode logs and only found that over the last day two nodes had a total of 8 warns and 1 error: node 5: 2011-02-18 03:44:56,642 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification failed for blk_-8223286903671115311_101182. Exception : java.io.IOException: Input/output error 2011-02-18 03:45:04,440 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification failed for blk_-8223286903671115311_101182. Exception : java.io.IOException: Input/output error 2011-02-18 06:53:17,081 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification failed for blk_8689822798201808529_99687. Exception : java.io.IOException: Input/output error 2011-02-18 06:53:25,105 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification failed for blk_8689822798201808529_99687. Exception : java.io.IOException: Input/output error 2011-02-18 12:09:09,613 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Could not read or failed to veirfy checksum for data at offset 25624576 for block blk_-8776727553170755183_302602 got : java.io.IOException: Input/output error 2011-02-18 12:17:03,874 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Could not read or failed to veirfy checksum for data at offset 2555904 for block blk_-1372864350494009223_328898 got : java.io.IOException: Input/output error 2011-02-18 13:15:40,637 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Could not read or failed to veirfy checksum for data at offset 458752 for block blk_5554094539319851344_322246 got : java.io.IOException: Input/output error 2011-02-18 13:12:13,587 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.56.24.15:50010, storageID=DS-1424058120-10.56.24.15-50010-1297226452840, infoPort=50075, ipcPort=50020):DataXceiver Node 9: 2011-02-18 12:02:58,879 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Could not read or failed to veirfy checksum for data at offset 16711680 for block blk_-5196887735268731000_300861 got : java.io.IOException: Input/output error Many thanks for any help or where I should look. -chris -- Regards, Suresh
Re: Data Nodes do not start
On Tue, Feb 8, 2011 at 11:05 PM, rahul patodi patodira...@gmail.com wrote: I think you should copy the namespaceID of your master which is in name/current/VERSION file to all the slaves This is a sure recipe for disaster. The VERSION file is a file system meta data file not to be messed around with. At worst, this can cause loss of entire file system data! Rahul please update your blog to reflect this. Some background on namespace ID: A namespace ID is created on the namenode when it is formatted. This is propagated to datanodes when they register the first time with namenode. From then on, this ID is burnt into the datanodes. A mismatch in namespace ID of datanode and namenode means: # Datanode is pointing to a wrong namenode, perhaps in a different cluster (config of datanode points to wrong namenode address). # Namenode was running with a storage directory previously. It was changed to some other storage directory with a different file system image. Why does editing namespace ID is a bad idea? Given that either namenode has loaded wrong namespace or datanode is pointing to wrong namenode, messing around with namespaceID either on namenode/datanode, results in datanode being able to register with the namenode. When datanode sends block report, the blocks on the datanode do not belong to the namespace loaded by the namenode, resulting in deletion of all the blocks on the datanode. Please find out if any of these problem exist in your setup and fix it.