Re: Re: HDFS block physical location
Hi JS, You may also be interested in the following JIRA which proposes an API for block-disk mapping information: https://issues.apache.org/jira/browse/HDFS-3672 There has been some discussion about potential use cases on this JIRA. If you can describe what your use case is for this information on the JIRA, we'd really appreciate the input. -Todd On Wed, Jul 25, 2012 at 3:20 PM, Chen He airb...@gmail.com wrote: For block to filename mapping, you can get from my previous answer. For block to harddisk mapping, you may need to traverse all the directory that used for HDFS, I am sure your OS has the information about which hard drive is mounted to which directory. with these two types of information, you can write a small Perl or Python script to get what you want. Or Take look of the namenode.java and see where and how it saves the table of block information. Please correct me if there is any mistake. Chen On Wed, Jul 25, 2012 at 6:10 PM, 20seco...@web.de wrote: Thanks, but that just gives me the hostnames or am I overlooking something? I actually need the filename/harddisk on the node. JS Gesendet: Mittwoch, 25. Juli 2012 um 23:33 Uhr Von: Chen He airb...@gmail.com An: common-user@hadoop.apache.org Betreff: Re: HDFS block physical location nohup hadoop fsck / -files -blocks -locations cat nohup.out | grep [your block name] Hope this helps. On Wed, Jul 25, 2012 at 5:17 PM, 20seco...@web.de wrote: Hi, just a short question. Is there any way to figure out the physical storage location of a given block? I don't mean just a list of hostnames (which I know how to obtain), but actually the file where it is being stored in. We use several hard disks for hdfs data on each node, and I would need to know which block ends up on which harddisk. Thanks! JS -- Todd Lipcon Software Engineer, Cloudera
Re: High load on datanode startup
On Fri, May 11, 2012 at 2:29 AM, Darrell Taylor darrell.tay...@gmail.com wrote: What I saw on the machine was thousands of recursive processes in ps of the form 'bash /usr/bin/hbase classpath...', Stopping everything didn't clean the processes up so had to kill them manually with some grep/xargs foo. Once this was all cleaned up and the hadoop-env.sh file removed the nodes seem to be happy again. Ah -- maybe the issue is that... my guess is that hbase classpath is now trying to include the Hadoop dependencies using hadoop classpath. But hadoop classpath was recursing right back because of that setting in hadoop-env. Basically you made a fork bomb - that explains the shape of the graph in Ganglia perfectly. -Todd Darrell. Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org Cc: Raj Vishwanathan rajv...@yahoo.com Sent: Thursday, May 10, 2012 3:57 AM Subject: Re: High load on datanode startup On Thu, May 10, 2012 at 9:33 AM, Todd Lipcon t...@cloudera.com wrote: That's real weird.. If you can reproduce this after a reboot, I'd recommend letting the DN run for a minute, and then capturing a jstack pid of dn as well as the output of top -H -p pid of dn -b -n 5 and send it to the list. What I did after the reboot this morning was to move the my dn, nn, and mapred directories out of the the way, create a new one, formatted it, and restarted the node, it's now happy. I'll try moving the directories back later and do the jstack as you suggest. What JVM/JDK are you using? What OS version? root@pl446:/# dpkg --get-selections | grep java java-common install libjaxp1.3-java install libjaxp1.3-java-gcj install libmysql-java install libxerces2-java install libxerces2-java-gcj install sun-java6-bin install sun-java6-javadb install sun-java6-jdk install sun-java6-jre install root@pl446:/# java -version java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) root@pl446:/# cat /etc/issue Debian GNU/Linux 6.0 \n \l -Todd On Wed, May 9, 2012 at 11:57 PM, Darrell Taylor darrell.tay...@gmail.com wrote: On Wed, May 9, 2012 at 10:52 PM, Raj Vishwanathan rajv...@yahoo.com wrote: The picture either too small or too pixelated for my eyes :-) There should be a zoom option in the top right of the page that allows you to view it full size Can you login to the box and send the output of top? If the system is unresponsive, it has to be something more than an unbalanced hdfs cluster, methinks. Sorry, I'm unable to login to the box, it's completely unresponsive. Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Sent: Wednesday, May 9, 2012 2:40 PM Subject: Re: High load on datanode startup On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan rajv...@yahoo.com wrote: When you say 'load', what do you mean? CPU load or something else? I mean in the unix sense of load average, i.e. top would show a load of (currently) 376. Looking at Ganglia stats for the box it's not CPU load as such, the graphs shows actual CPU usage as 30%, but the number of running processes is simply growing in a linear manner - screen shot of ganglia page here : https://picasaweb.google.com/lh/photo/Q0uFSzyLiriDuDnvyRUikXVR0iWwMibMfH0upnTwi28?feat=directlink Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, May 9, 2012 9:52 AM Subject: High load on datanode startup Hi, I wonder if someone could give some pointers with a problem I'm having? I have a 7 machine cluster setup for testing and we have been pouring data into it for a week without issue, have learnt several thing along the way and solved all the problems up to now by searching online, but now I'm stuck. One of the data nodes decided to have a load of 70+ this morning, stopping datanode and tasktracker brought it back to normal, but every time I start the datanode again the load shoots through the roof, and all I get in the logs is : STARTUP_MSG: Starting DataNode STARTUP_MSG: host = pl464/10.20.16.64 STARTUP_MSG: args
Re: High load on datanode startup
That's real weird.. If you can reproduce this after a reboot, I'd recommend letting the DN run for a minute, and then capturing a jstack pid of dn as well as the output of top -H -p pid of dn -b -n 5 and send it to the list. What JVM/JDK are you using? What OS version? -Todd On Wed, May 9, 2012 at 11:57 PM, Darrell Taylor darrell.tay...@gmail.com wrote: On Wed, May 9, 2012 at 10:52 PM, Raj Vishwanathan rajv...@yahoo.com wrote: The picture either too small or too pixelated for my eyes :-) There should be a zoom option in the top right of the page that allows you to view it full size Can you login to the box and send the output of top? If the system is unresponsive, it has to be something more than an unbalanced hdfs cluster, methinks. Sorry, I'm unable to login to the box, it's completely unresponsive. Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Sent: Wednesday, May 9, 2012 2:40 PM Subject: Re: High load on datanode startup On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan rajv...@yahoo.com wrote: When you say 'load', what do you mean? CPU load or something else? I mean in the unix sense of load average, i.e. top would show a load of (currently) 376. Looking at Ganglia stats for the box it's not CPU load as such, the graphs shows actual CPU usage as 30%, but the number of running processes is simply growing in a linear manner - screen shot of ganglia page here : https://picasaweb.google.com/lh/photo/Q0uFSzyLiriDuDnvyRUikXVR0iWwMibMfH0upnTwi28?feat=directlink Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, May 9, 2012 9:52 AM Subject: High load on datanode startup Hi, I wonder if someone could give some pointers with a problem I'm having? I have a 7 machine cluster setup for testing and we have been pouring data into it for a week without issue, have learnt several thing along the way and solved all the problems up to now by searching online, but now I'm stuck. One of the data nodes decided to have a load of 70+ this morning, stopping datanode and tasktracker brought it back to normal, but every time I start the datanode again the load shoots through the roof, and all I get in the logs is : STARTUP_MSG: Starting DataNode STARTUP_MSG: host = pl464/10.20.16.64 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.197-1~squeeze -/ 2012-05-09 16:12:05,925 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2012-05-09 16:12:06,139 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. Nothing else. The load seems to max out only 1 of the CPUs, but the machine becomes *very* unresponsive Anybody got any pointers of things I can try? Thanks Darrell. -- Todd Lipcon Software Engineer, Cloudera
Re: High Availability Framework for HDFS Namenode in 2.0.0
Hi Shi, The 20% regression was prior to implementing a few optimizations on the branch. Here's the later comment: https://issues.apache.org/jira/browse/HDFS-1623?focusedCommentId=13218813page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218813 Also, the 20% measurement was on a particular stress test designed to target exactly the metadata code path we figured there might be performance impact. So on typical clusters with typical workloads, you shouldn't find any measurable difference. -Todd On Thu, May 3, 2012 at 8:38 AM, Shi Yu sh...@uchicago.edu wrote: Hi Harsh J, It seems that the 20% performance lost is not that bad, at least some smart people are still working to improve it. I will keep an eye on this interesting trend. Shi -- Todd Lipcon Software Engineer, Cloudera
Re: cloudera vs apache hadoop migration stories ?
Hi Jay, Probably makes sense to move this to the cdh-user list if you have any Cloudera-specific questions. But I just wanted to clarify: CDH doesn't make any API changes that aren't already upstream. So, in some places, CDH may be ahead of whatever Apache release you are comparing against, but it is always made up of patches from the Apache trunk. In the specific case of MultipleInputs, we did backport the new API implementation from Apache Hadoop 0.21+. If you find something in CDH that you would like backported to upstream Apache Hadoop 1.0.x, please feel free to file a JIRA and assign it to me - I'm happy to look into it for you. Thanks Todd On Wed, Apr 4, 2012 at 10:15 AM, Jay Vyas jayunit...@gmail.com wrote: Seems like cloudera and standard apache-hadoop are really not cross compatible. Things like MultipleInputs and stuff that we are finding don't work the same. Any good (recent) war stories on the migration between the two ? Its interesting to me that cloudera and amazon are that difficult to swap in/out in cloud. -- Todd Lipcon Software Engineer, Cloudera
Re: activity on IRC .
Hey Jay, That's the only one I know of. Not a lot of idle chatter, but when people have questions, discussions do start up. Much more active during PST working hours, of course :) -Todd On Wed, Mar 28, 2012 at 8:05 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys : I notice the IRC activity is a little low. Just wondering if theres a better chat channel for hadoop other than the official one (#hadoop on freenode)? In any case... Im on there :) come say hi. -- Jay Vyas MMSB/UCHC -- Todd Lipcon Software Engineer, Cloudera
Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)
Hey Doug, Want to also run a comparison test with inter-cluster replication turned on? How about kerberos-based security on secure HDFS? How about ACLs or other table permissions even without strong authentication? Can you run a test comparing performance running on top of Hadoop 0.23? How about running other ecosystem products like Solbase, Havrobase, and Lily, or commercial products like Digital Reasoning's Synthesys, etc? For those unfamiliar, the answer to all of the above is that those comparisons can't be run because Hypertable is years behind HBase in terms of features, adoption, etc. They've found a set of benchmarks they win at, but bulk loading either database through the put API is the wrong way to go about it anyway. Anyone loading 5T of data like this would use the bulk load APIs which are one to two orders of magnitude more efficient. Just ask the Yahoo crawl cache team, who has ~1PB stored in HBase, or Facebook, or eBay, or many others who store hundreds to thousands of TBs in HBase today. Thanks, -Todd On Mon, Feb 13, 2012 at 9:07 AM, Doug Judd d...@hypertable.com wrote: In our original test, we mistakenly ran the HBase test with the hbase.hregion.memstore.mslab.enabled property set to false. We re-ran the test with the hbase.hregion.memstore.mslab.enabled property set to true and have reported the results in the following addendum: Addendum to Hypertable vs. HBase Performance Testhttp://www.hypertable.com/why_hypertable/hypertable_vs_hbase_2/addendum/ Synopsis: It slowed performance on the 10KB and 1KB tests and still failed the 100 byte and 10 byte tests with *Concurrent mode failure* - Doug -- Todd Lipcon Software Engineer, Cloudera
Re: effect on data after topology change
Hi Ravi, You'll probably need to up the replication level of the affected files and then drop it back down to the desired level. Current versions of HDFS do not automatically repair rack policy violations if they're introduced in this manner. -Todd On Mon, Jan 16, 2012 at 3:53 PM, rk vishu talk2had...@gmail.com wrote: Hello All, If i change the rackid for some nodes and restart namenode, will data be rearranged accordingly? Do i need to run rebalancer? Any information on this would be appreciated. Thanks and Regards Ravi -- Todd Lipcon Software Engineer, Cloudera
Re: hadoop filesystem cache
There is some work being done in this area by some folks over at UC Berkeley's AMP Lab in coordination with Facebook. I don't believe it has been published quite yet, but the title of the project is PACMan -- I expect it will be published soon. -Todd On Sat, Jan 14, 2012 at 5:30 PM, Rita rmorgan...@gmail.com wrote: After reading this article, http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was wondering if there was a filesystem cache for hdfs. For example, if a large file (10gigabytes) was keep getting accessed on the cluster instead of keep getting it from the network why not storage the content of the file locally on the client itself. A use case on the client would be like this: property namedfs.client.cachedirectory/name value/var/cache/hdfs/value /property property namedfs.client.cachesize/name descriptionin megabytes/description value10/value /property Any thoughts of a feature like this? -- --- Get your facts first, then you can distort them as you please.-- -- Todd Lipcon Software Engineer, Cloudera
Re: NameNode - didn't persist the edit log
Hi Guy, Eli has been looking into these issues and it looks like you found a nasty bug. You can follow these JIRAs to track resolution: HDFS-2701, HDFS-2702, HDFS-2703. I think in particular HDFS-2703 is the one that bit you here. -Todd On Thu, Dec 15, 2011 at 2:06 AM, Guy Doulberg guy.doulb...@conduit.com wrote: Hi Todd, you are right I should be more specific: 1. from the namenode log: 2011-12-11 08:57:23,245 WARN org.apache.hadoop.hdfs.server.common.Storage: rollEdidLog: removing storage /srv/hadoop/hdfs/edit 2011-12-11 08:57:23,311 WARN org.apache.hadoop.hdfs.server.common.Storage: incrementCheckpointTime failed on /srv/hadoop/hdfs/name;type=IMAGE 2011-12-11 08:57:23,316 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:83) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:78) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:78) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:829) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 2. Embarrassingly no, we had only, now we have 2 and periodically backups , :( 3. Yes 4. hadoop version Hadoop 0.20.2-cdh3u2 Subversion file:///tmp/nightly_2011-10-13_20-02-02_3/hadoop-0.20-0.20.2+923.142-1~lucid -r 95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Compiled by root on Thu Oct 13 21:52:18 PDT 2011 From source with checksum 644e5db6c59d45bca96cec7f220dda51 Thanks, Guy On Thu 15 Dec 2011 11:39:26 AM IST, Todd Lipcon wrote: Hi Guy, Several questions come to mind here: - What was the exact WARN level message you saw? - Did you have multiple dfs.name.dirs configured as recommended by most setup guides? - Did you try entering safemode and then running saveNamespace to persist the image before shutting down the NN? This would have saved your data. - What exact version of HDFS were you running? This is certainly not expected behavior... all of the places where an edit log fails have a check against there being 0 edit logs remaining and should issue a FATAL level message followed by a System.exit(-1). -Todd On Thu, Dec 15, 2011 at 1:16 AM, Guy Doulbergguy.doulb...@conduit.com wrote: Hi guys, We recently had the following problem on our production cluster: The filesystem containing the editlog and fsimage had no free inodes. As a result the namenode wasn't able to obtain an inode for the fsimage and editlog after a checkpiot has been reached, while the previous files were freed. Unfortunately, we had no monitoring on the inodes number, so it happens that the namenode ran in this state for a few hours. We have noticed this failure in its DFS-status page. But the namenode didn't enter safe-mode, so all the writes were made couldn't be persisted to the editlog. After discovering the problem we freed inodes, and the file-system seemed to be okay again, we tried to force the namenode to persist to editlog with no success
Re: HDFS Backup nodes
On Wed, Dec 14, 2011 at 10:00 AM, Scott Carey sc...@richrelevance.com wrote: As of today, there is no option except to use NFS. And as you yourself mention, the first HA prototype when it comes out will require NFS. How will it 'require' NFS? Won't any 'remote, high availability storage' work? NFS is unreliable unless in my experience unless: ... A solution with a brief 'stall' in service while a SAN mount switched over or similar with drbd should be possible and data safe, if this is being built to truly 'require' NFS that is no better for me than the current situation, which we manage using OS level tools for failover that will temporarily break clients but resume availability quickly thereafter. Where I would like the most help from hadoop is in making the failover transparent to clients, not in solving the reliable storage problem or failover scenarios that Storage and OS vendors do. Currently our requirement is that we can have two client machines mount the storage, though only one needs to have it mounted rw at a time. This is certainly doable with DRBD in conjunction with a clustered filesystem like GPFS2. I believe Dhruba was doing some experimentation with an approach like this. It's not currently provided for, but it wouldn't be very difficult to extend the design so that the standby didn't even need read access until the failover event. It would just cause a longer failover period since the standby would have more edits to catch up with, etc. I don't think anyone's currently working on this, but if you wanted to contribute I can point you in the right direction. If you happen to be at the SF HUG tonight, grab me and I'll give you the rundown on what would be needed. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS Backup nodes
On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas mcsri...@gmail.com wrote: But if you use a Netapp, then the likelihood of the Netapp crashing is lower than the likelihood of a garbage-collection-of-death happening in the NN. This is pure FUD. I've never seen a garbage collection of death ever in any NN with smaller than a 40GB heap, and only a small handful of times on larger heaps. So, unless you're running a 4000 node cluster, you shouldn't be concerned with this. And the existence of many 4000 node clusters running fine on HDFS indicates that a properly tuned NN does just fine. [Disclaimer: I don't spread FUD regardless of vendor affiliation.] -Todd [ disclaimer: I don't work for Netapp, I work for MapR ] On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote: Thanks Joey. We've had enough problems with nfs (mainly under very high load) that we thought it might be riskier to use it for the NN. randy On 12/07/2011 06:46 PM, Joey Echeverria wrote: Hey Rand, It will mark that storage directory as failed and ignore it from then on. In order to do this correctly, you need a couple of options enabled on the NFS mount to make sure that it doesn't retry infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10 options set. -Joey On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net wrote: What happens then if the nfs server fails or isn't reachable? Does hdfs lock up? Does it gracefully ignore the nfs copy? Thanks, randy - Original Message - From: Joey Echeverriaj...@cloudera.com To: common-user@hadoop.apache.org Sent: Wednesday, December 7, 2011 6:07:58 AM Subject: Re: HDFS Backup nodes You should also configure the Namenode to use an NFS mount for one of it's storage directories. That will give the most up-to-date back of the metadata in case of total node failure. -Joey On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com wrote: This means still we are relying on Secondary NameNode idealogy for Namenode's backup. Can OS-mirroring of Namenode is a good alternative keep it alive all the time ? Thanks, Praveenesh On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: AFAIK backup node introduced in 0.21 version onwards. __**__ From: praveenesh kumar [praveen...@gmail.com] Sent: Wednesday, December 07, 2011 12:40 PM To: common-user@hadoop.apache.org Subject: HDFS Backup nodes Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS Backup nodes
On Tue, Dec 13, 2011 at 10:42 PM, M. C. Srivas mcsri...@gmail.com wrote: Any simple file meta-data test will cause the NN to spiral to death with infinite GC. For example, try create many many files. Or even simple stat a bunch of file continuously. Sure. If I run dd if=/dev/zero of=foo my laptop will spiral to death also. I think this is what you're referring to -- continuously write files until it is out of RAM. This is a well understood design choice of HDFS. It is not designed as general purpose storage for small files, and if you run tests against it assuming it is, you'll get bad results. I agree there. The real FUD going on is refusing to acknowledge that there is indeed a real problem. Yes, if you use HDFS for workloads for which it was never designed, you'll have a problem. If you stick to commonly accepted best practices I think you'll find the same thing that hundreds of other companies have found: HDFS is stable and reliable and has no such GC of death problems when used as intended. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS Backup nodes
[praveen...@gmail.com] Sent: Wednesday, December 07, 2011 12:40 PM To: common-user@hadoop.apache.org Subject: HDFS Backup nodes Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Todd Lipcon Software Engineer, Cloudera
Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)
Hi Chris, I'd suggest updating to a newer version of your hadoop distro - you're hitting some bugs that were fixed last summer. In particular, you're missing the amendment patch from MAPREDUCE-2373 as well as some patches to MR which make the fetch retry behavior more aggressive. -Todd On Mon, Dec 5, 2011 at 12:45 PM, Chris Curtin curtin.ch...@gmail.com wrote: Hi, Using: *Version:* 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14, 8 node cluster, 64 bit Centos We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer jobs. When we investigate it looks like the TaskTracker on the node being fetched from is not running. Looking at the logs we see what looks like a self-initiated shutdown: 2011-12-05 14:10:48,632 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201112050908_0222_r_1100711673 exited with exit code 0. Number of tasks it ran: 0 2011-12-05 14:10:48,632 ERROR org.apache.hadoop.mapred.JvmManager: Caught Throwable in JVMRunner. Aborting TaskTracker. java.lang.NullPointerException at org.apache.hadoop.mapred.DefaultTaskController.logShExecStatus(DefaultTaskController.java:145) at org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:129) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:472) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:446) 2011-12-05 14:10:48,634 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down TaskTracker at had11.atlis1/10.120.41.118 / Then the reducers have the following: 2011-12-05 14:12:00,962 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at sun.net.NetworkClient.doConnect(NetworkClient.java:158) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.http.HttpClient.init(HttpClient.java:233) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.http.HttpClient.New(HttpClient.java:323) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1525) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1482) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1390) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233) 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201112050908_0169_r_05_0: Failed fetch #2 from attempt_201112050908_0169_m_02_0 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output from attempt_201112050908_0169_m_02_0 even after MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to the JobTracker 2011-12-05 14:12:00,962 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle failed with too many fetch failures and insufficient progress!Killing task attempt_201112050908_0169_r_05_0. 2011-12-05 14:12:00,966 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201112050908_0169_r_05_0 adding host had11.atlis1 to penalty box, next contact in 8 seconds 2011-12-05 14:12:00,966 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201112050908_0169_r_05_0: Got 1 map-outputs from previous failures The job then fails. Several questions: 1. what is causing the TaskTracker to fail/exit? This is after running hundreds to thousands of jobs, so it's not just at start-up. 2. why isn't hadoop detecting that the reducers need something from a dead mapper and restarting the mapper job, even it means aborting the reducers? 3. why isn't the DataNode being used to fetch the blocks? It is still up and running when this happens, so shouldn't it know where the files are in HDFS? Thanks, Chris -- Todd Lipcon Software Engineer, Cloudera
Re: Hbase with Hadoop
On Wed, Oct 12, 2011 at 9:31 AM, Vinod Gupta Tankala tvi...@readypulse.com wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. You've got it a bit backwards - except for very rare exceptions, we check our fixes into the public ASF codebase before we commit anything to CDH releases. Sometimes, it will show up in a CDH release before an ASF release, but the changes are always done as backports from ASF'[s subversion. You can see the list of public JIRAs referenced in our changelists here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.CHANGES.txt Apologies for the vendor-specific comment: I just wanted to clarify that Cloudera's aim is to contribute to the community and not any kind of fork as suggested above. Back to work on 0.23 for me! -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: DFSClient: Could not complete file
On Fri, Oct 7, 2011 at 12:40 PM, Chris Curtin curtin.ch...@gmail.com wrote: hi Todd, Thanks for the reply. Yes I'm seeing 30,000 ms a couple of times a day, though it looks like 4000 ms is average. Also see 150,000+ and lots of 50,000. Is there anything I can do about this? The bug is still open in JIRA. Currently the following workarounds may be effective: - schedule a cron job to run once every couple minutes that runs: find /data/1/hdfs /data/2/hdfs/ ... -length 2 /dev/null(this will cause your inodes and dentries to get paged into cache so the block report runs quickly) - tune /proc/sys/vm/vfs_cache_pressure to a lower value (this will encourage Linux to keep inodes and dentries in cache) Both have some associated costs, but at least one of our customers has found the above set of workarounds to be effective. Currently I'm waiting on review of HDFS-2379, though if you are adventurous you could consider building your own copy of Hadoop with this patch applied. I've tested on a cluster and fairly confident it is safe. Thanks -Todd On Fri, Oct 7, 2011 at 2:15 PM, Todd Lipcon t...@cloudera.com wrote: Hi Chris, You may be hitting HDFS-2379. Can you grep your DN logs for the string BlockReport and see if you see any taking more than 3ms or so? -Todd On Fri, Oct 7, 2011 at 6:31 AM, Chris Curtin curtin.ch...@gmail.com wrote: Sorry to bring this back from the dead, but we're having the issues again. This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock Apache 0.20.2). Nothing carried over from the old cluster except data in HDFS (copied from old cluster). Bigger/more machines, more RAM, faster disks etc. And it is back. Confirmed that all the disks setup for HDFS are 'deadline'. Runs fine for few days then hangs again with the 'Could not complete' error in the JobTracker log until we kill the cluster. 2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+ retrying... Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks like what could be happening to us. Anyone found a good workaround? Any other ideas? Also, does the HDFS system try to do 'du' on disks not assigned to it? The HDFS disks are separate from the root and OS disks. Those disks are NOT setup to be 'deadline'. Should that matter? Thanks, Chris -- Todd Lipcon Software Engineer, Cloudera
Re: DFSClient: Could not complete file
? Thanks, Chris On Sun, Mar 13, 2011 at 3:45 AM, icebergs hkm...@gmail.com wrote: You should check the bad reducers' logs carefully.There may be more information about it. 2011/3/10 Chris Curtin curtin.ch...@gmail.com Hi, The last couple of days we have been seeing 10's of thousands of these errors in the logs: INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /offline/working/3/aat/_temporary/_attempt_201103100812_0024_r_03_0/4129371_172307245/part-3 retrying... When this is going on the reducer in question is always the last reducer in a job. Sometimes the reducer recovers. Sometimes hadoop kills that reducer, runs another and it succeeds. Sometimes hadoop kills the reducer and the new one also fails, so it gets killed and the cluster goes into a loop of kill/launch/kill. At first we thought it was related to the size of the data being evaluated (4+GB), but we've seen it several times today on 100 MB Searching here or online doesn't show a lot about what this error means and how to fix it. We are running 0.20.2, r911707 Any suggestions? Thanks, Chris -- Todd Lipcon Software Engineer, Cloudera
Re: risks of using Hadoop
On Wed, Sep 21, 2011 at 6:52 AM, Michael Segel michael_se...@hotmail.com wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks? http://www.datacenterknowledge.com/archives/2007/05/07/averting-disaster-with-the-epo-button/ -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: risks of using Hadoop
To clarify, *append* is not supported and is known to be buggy. *sync* support is what HBase needs and what 0.20.205 will support. Before 205 is released, you can also find these features in CDH3 or by building your own release from SVN. -Todd On Sat, Sep 17, 2011 at 4:59 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hi George, You can use it noramally as well. Append interfaces will be exposed. For Hbase, append support is required very much. Regards, Uma - Original Message - From: George Kousiouris gkous...@mail.ntua.gr Date: Saturday, September 17, 2011 12:29 pm Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org Cc: Uma Maheswara Rao G 72686 mahesw...@huawei.com Hi, When you say that 0.20.205 will support appends, you mean for general purpose writes on the HDFS? or only Hbase? Thanks, George On 9/17/2011 7:08 AM, Uma Maheswara Rao G 72686 wrote: 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarkokobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarkokobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-usercommon-user@hadoop.apache.org Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac. -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece -- Todd Lipcon Software Engineer, Cloudera
Re: IO pipeline optimizations
Hi Shrinivas, There has been some work going on recently around optimizing checksums. See HDFS-2080 for example. This will help both the write and read code, though we've focused more on read. There have also been a lot of improvements around random read access - for example HDFS-941 which improves random read by more than 2x. I'm planning on writing a blog post in the next couple of weeks about some of this work. -Todd On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi jshrini...@gmail.comwrote: This blog post on YDN website http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has detailed discussion on different steps involved in Hadoop IO operations and opportunities for optimizations. Could someone please comment on current state of these potential optimizations? Are some of these expected to be addressed in next gen MR release? Thanks, -Shrinivas -- Todd Lipcon Software Engineer, Cloudera
Re: Append to Existing File
On Tue, Jun 21, 2011 at 11:53 AM, Joey Echeverria j...@cloudera.com wrote: Yes. Sort-of kind-of... we support it only for the use case that HBase uses it. Mostly, we support sync() which was implemented at the same time. I know of several bugs in existing-file-append in CDH3 and 0.20-append. -Todd -Joey On Jun 21, 2011 1:47 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi All, Does CDH3 support Existing File Append ? Regards, Jagaran From: Eric Charles eric.char...@u-mangate.com To: common-user@hadoop.apache.org Sent: Tue, 21 June, 2011 3:53:33 AM Subject: Re: Append to Existing File When you say bugs pending, are your refering to HDFS-265 (which links to HDFS-1060, HADOOP-6239 and HDFS-744? Are there other issues related to append than the one above? Tks, Eric https://issues.apache.org/jira/browse/HDFS-265 On 21/06/11 12:36, madhu phatak wrote: Its not stable . There are some bugs pending . According one of the disccusion till date the append is not ready for production. On Tue, Jun 14, 2011 at 12:19 AM, jagaran dasjagaran_...@yahoo.co.in wrote: I am using hadoop-0.20.203.0 version. I have set dfs.support.append to true and then using append method It is working but need to know how stable it is to deploy and use in production clusters ? Regards, Jagaran From: jagaran dasjagaran_...@yahoo.co.in To: common-user@hadoop.apache.org Sent: Mon, 13 June, 2011 11:07:57 AM Subject: Append to Existing File Hi All, Is append to an existing file is now supported in Hadoop for production clusters? If yes, please let me know which version and how Thanks Jagaran -- Eric -- Todd Lipcon Software Engineer, Cloudera
Re: Checkpoint vs Backup Node
Hi Sulabh, Neither of these nodes have been productionized -- so I don't think anyone would have a good answer for you about what works in production. They are only available in 0.21 and haven't had any substantial QA. One of the potential issues with the BN is that it can delay the logging of edits by the primary NN, if the BN were to hang or go offline. The CN would not have such an issue. -Todd On Tue, May 24, 2011 at 5:08 PM, sulabh choudhury sula...@gmail.com wrote: As far as my understanding goes, I feel that Backup node is much more efficient then the Checkpoint node, as it has the current(up-to-date) copy of file system too. I do not understand what would be the use case (in a production environment) tin which someone would prefer Checkpoint node over Backup node, or I should ask, what do people generally prefer of the two and why ? -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop and WikiLeaks
C'mon guys -- while this is of course an interesting debate, can we please keep it off common-user? -Todd On Sun, May 22, 2011 at 3:30 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Sat, May 21, 2011 at 4:13 PM, highpointe highpoint...@gmail.com wrote: Does this copy text bother anyone else? Sure winning any award is great but does hadoop want to be associated with innovation like WikiLeaks? [Only] through the free distribution of information, the guaranteed integrity of said information and an aggressive system of checks and balances can man truly be free and hold the winning card. So... YES. Hadoop should be considered an innovation that promotes the free flow of information and a statistical whistle blower. Take off your damn aluminum hat. If it doesn't work for you, it will work against you. On May 19, 2011, at 8:54 AM, James Seigel ja...@tynt.com wrote: Does this copy text bother anyone else? Sure winning any award is great but does hadoop want to be associated with innovation like WikiLeaks? I do not know how to interpret your lame aluminum hat insult. As far as I am concerned WikiLeaks helped reveal classified US information across the the internet. We can go back and forth about governments having too much secret/classified information and what the public should know, ...BUT... I believe that stealing and broadcasting secret documents is not innovation and it surely put many lives at risk. I also believe that Wikileaks is tainted with Julian Assange's actions. *Dec 1 : The International Criminal Police Organisation or INTERPOL on Wednesday said it has issued look out notice for arrest of WikiLeaks' owner Julian Assange on suspicion of rape charges on the basis of the Swedish Government's arrest warrant.* http://www.newkerala.com/news/world/fullnews-95693.html Those outside the US see wikileaks a different way they I do, but for the reasons I outlined above I would not want to be associated with them at all. Moreover, I believe there already is an aggressive system of checks and balances in the US (it could be better of course) and we do not need innovation like wikileaks offers to stay free, like open source the US is always changing and innovating. Wikileaks represents irresponsible use of technology that should be avoided. -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop and WikiLeaks
On Sun, May 22, 2011 at 5:10 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Correct. But it is a place to discuss changing the content of http://hadoop.apache.org which is what I am advocating. Fair enough. Is anyone -1 on rephrasing the news item to had the potential as a greater catalyst for innovation than other nominees... (ie cutting out the mention of iPad/wikileaks?) If not, I will change it tomorrow. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Problem: Unknown scheme hdfs. It should correspond to a JournalType enumeration value
Hi Eduardo, Sounds like you've configured your dfs.name.dirs to be on HDFS instead of like file paths. -Todd On Fri, May 20, 2011 at 2:20 PM, Eduardo Dario Ricci duzas...@gmail.com wrote: Hy People I'm starting in hadoop commom.. and got some problem to try using a cluster.. I'm following the steps of this page: http://hadoop.apache.org/common/docs/r0.21.0/cluster_setup.html I done everything, but when I will format the HDFS, this error happens: I searched for something to help-me, but didn't find nothing. If some guy could help-me, I will be thankfull. Re-format filesystem in /fontes/cluster/namedir ? (Y or N) Y 11/05/20 16:41:40 ERROR namenode.NameNode: java.io.IOException: Unknown scheme hdfs. It should correspond to a JournalType enumeration value at org.apache.hadoop.hdfs.server.namenode.FSImage.checkSchemeConsistency(FSImage.java:269) at org.apache.hadoop.hdfs.server.namenode.FSImage.setStorageDirectories(FSImage.java:222) at org.apache.hadoop.hdfs.server.namenode.FSImage.init(FSImage.java:178) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1240) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) 11/05/20 16:41:40 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.217.134 / -- Eduardo Dario Ricci Cel: 14-81354813 MSN: thenigma...@hotmail.com -- Todd Lipcon Software Engineer, Cloudera
Re: some guidance needed
Hi Ioan, I would encourage you to look at a system like HBase for your mail backend. HDFS doesn't work well with lots of little files, and also doesn't support random update, so existing formats like Maildir wouldn't be a good fit. -Todd On Wed, May 18, 2011 at 4:02 PM, Ioan Eugen Stan stan.ieu...@gmail.com wrote: Hello everybody, I'm a GSoC student for this year and I will be working on James [1]. My project is to implement email storage over HDFS. I am quite new to Hadoop and associates and I am looking for some hints as to get started on the right track. I have installed a single node Hadoop instance on my machine and played around with it (ran some examples) but I am interested into what you (more experienced people) think it's the best way to approach my problem. I am a little puzzled about the fact that that I read hadoop is best used for large files and email aren't that large from what I know. Another thing that crossed my mind is that since HDFS is a file system, wouldn't it be possible to set it as a back-end for the (existing) maildir and mailbox storage formats? (I think this question is more suited on the James mailing list, but if you have some ideas please speak your mind). Also, any development resources to get me started are welcomed. [1] http://james.apache.org/mailbox/ [2] https://issues.apache.org/jira/browse/MAILBOX-44 Regards, -- Ioan Eugen Stan -- Todd Lipcon Software Engineer, Cloudera
Re: JVM reuse and log files
Hi Shrinivas, Yes, this is the behavior of the task logs when using JVM Reuse. You should notice in the log directories for the other tasks a log index file which specifies the byte offsets into the log files where the task starts and stops. When viewing logs through the web UI, it will use these index files to show you the right portion of the logs. -Todd On Wed, Mar 30, 2011 at 1:17 PM, Shrinivas Joshi jshrini...@gmail.comwrote: It seems like when JVM reuse is enabled map task log data is not getting written to their corresponding log files; log data from certain map tasks gets appended to log files corresponding to some other map task. For example, I have a case here where 8 map JVMs are running simultaneously and all syslog data from map task 9, 17 and 25 gets appended in to log file for map task 0. Whereas no syslog file gets generated in attempt_*m_09_0/ , attempt_*m_17_0/ and attempt_*m_25_0/ folders. This job creates 32 map tasks. This behavior might also be applicable to reduce log files, however, in our case total # of reduce tasks is not more than max reduce JVMs running at the same time and hence it might not be manifesting. BTW, this is on Apache distro 0.21.0. -Shrinivas -- Todd Lipcon Software Engineer, Cloudera
Re: Could not obtain block
) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed. 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException: while trying to read 3037288 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122) Cheers, Evert Lammerts Consultant eScience Cloud Services SARA Computing Network Services Operations, Support Development Phone: +31 20 888 4101 Email: evert.lamme...@sara.nl http://www.sara.nl -- Todd Lipcon Software Engineer, Cloudera
Re: java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/
Hi Job, This seems CDH-specific, so I've moved the thread over to the cdh-users mailing list (BCC common-user) Thanks -Todd On Thu, Feb 24, 2011 at 2:52 AM, Job j...@gridline.nl wrote: Hi all, This issue could very well be related to the Cloudera distribution (CDH3b4) I use, but maybe someone knows the solution: I configured a Job, something like this: Configuration conf = getConf(); // ... set configuration conf.set(mapred.jar, localJarFile.toString()) // tracker, zookeeper, hbase etc. Job job = new Job(conf); // map: job.setMapperClass(DataImportMap.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Put.class); // reduce: TableMapReduceUtil.initTableReducerJob(MyTable, DataImportReduce.class, job); FileInputFormat.addInputPath(job, new Path(inputData)); // execute: job.waitForCompletion(true); Now the server throws a strange exception below, see the stacktrace below. When i take look at the hdfs file system - through hdfs fuse - the file is there, it really is the jar that contains my mapred classes. Any clue wat goes wrong here? Thanks, Job - java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/.staging/job_201102241026_0002/job.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1303) at org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:198) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1154) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1129) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1055) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2212) at org.apache.hadoop.mapred.TaskTracker $TaskLauncher.run(TaskTracker.java:2176) -- Drs. Job Tiel Groenestege GridLine - Intranet en Zoeken GridLine Keizersgracht 520 1017 EK Amsterdam www: http://www.gridline.nl mail: j...@gridline.nl tel: +31 20 616 2050 fax: +31 20 616 2051 De inhoud van dit bericht en de eventueel daarbij behorende bijlagen zijn persoonlijk gericht aan en derhalve uitsluitend bestemd voor de geadresseerde. Zij kunnen gegevens met betrekking tot een derde bevatten. De ontvanger die niet de geadresseerde is, noch bevoegd is dit bericht namens geadresseerde te ontvangen, wordt verzocht de afzender onmiddellijk op de hoogte te stellen van de ontvangst. Elk gebruik van de inhoud van dit bericht en/of van de daarbij behorende bijlagen door een ander dan de geadresseerde is onrechtmatig jegens afzender respectievelijk de hiervoor bedoelde derde. -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop issue on 64 bit ubuntu . Native Libraries.
Hi Ajay, Hadoop should ship with built artifacts for amd64 in the lib/native/Linux-amd64-64/ subdirectory of your tarball. You just need to put this directory on your java.library.path system property. -Todd You need to run ant -Dcompile.native=1 compile-native fro On Tue, Feb 22, 2011 at 9:14 PM, Ajay Anandan anan...@ualberta.ca wrote: Hi, I am using the kmeans clustering in mahout. It ran fine in my 32 bit machine. But when I try to run it in another 64 bit machine I get the following error: *org.apache.hadoop.util.NativeCodeLoader clinit WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable * I built my native libraries in hadoop. My project uses the hadoop core jar file alone. The algorithm does not run distributively over hadoop but runs only in one system. I am not using the parallelizing capacity of hadoop yet. I am getting the error when using methods in hadoop.util.NativeCodeLoader. Can somebody help me to build the native libraries? I am using ubuntu 64 bit version on a amd processor. -- Sincerely, Ajay Anandan. MSc,Computing Science, University of Alberta. -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop issue on 64 bit ubuntu . Native Libraries.
On Wed, Feb 23, 2011 at 11:21 AM, Todd Lipcon t...@cloudera.com wrote: Hi Ajay, Hadoop should ship with built artifacts for amd64 in the lib/native/Linux-amd64-64/ subdirectory of your tarball. You just need to put this directory on your java.library.path system property. -Todd You need to run ant -Dcompile.native=1 compile-native fro err, disregard this postscript. Started writing how to compile it, and then realized we actually ship built artifacts :) On Tue, Feb 22, 2011 at 9:14 PM, Ajay Anandan anan...@ualberta.ca wrote: Hi, I am using the kmeans clustering in mahout. It ran fine in my 32 bit machine. But when I try to run it in another 64 bit machine I get the following error: *org.apache.hadoop.util.NativeCodeLoader clinit WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable * I built my native libraries in hadoop. My project uses the hadoop core jar file alone. The algorithm does not run distributively over hadoop but runs only in one system. I am not using the parallelizing capacity of hadoop yet. I am getting the error when using methods in hadoop.util.NativeCodeLoader. Can somebody help me to build the native libraries? I am using ubuntu 64 bit version on a amd processor. -- Sincerely, Ajay Anandan. MSc,Computing Science, University of Alberta. -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: file:/// has no authority
Hi, Double check that your configuration XML files are well-formed. You can do this easily using a validator like tidy. My guess is that one of the tags is mismatched so the configuration isn't being read. -Todd On Mon, Jan 31, 2011 at 9:19 PM, danoomistmatiste kkhambadk...@yahoo.comwrote: Hi, I have setup a Hadoop cluster as per the instructions for CDH3. When I try to start the datanode on the slave, I get this error, org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority. I have setup the right parameters in core-site.xml where master is the IP address where the namenode is running configuration property namefs.default.name/name valuehdfs://master:54310/value /property -- View this message in context: http://old.nabble.com/file%3Ahas-no-authority-tp30813534p30813534.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException
Hi Shuja, Can you paste the output of ls -lR on all of your dfs.name.dirs? (hopefully you have more than one, with one on an external machine via NFS, right?) Thanks -Todd On Fri, Jan 7, 2011 at 4:39 AM, Shuja Rehman shujamug...@gmail.com wrote: Hi, After power failure, the name node is not starting,, giving the following error. kindly let me know how to resolve it thnx 2011-01-07 04:14:49,666 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/192.168.1.2 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2+737 STARTUP_MSG: build = -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010 / 2011-01-07 04:14:50,610 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2011-01-07 04:14:50,670 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext 2011-01-07 04:14:50,907 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs 2011-01-07 04:14:50,908 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-01-07 04:14:50,908 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2011-01-07 04:14:50,931 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2011-01-07 04:14:52,378 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext 2011-01-07 04:14:52,392 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-01-07 04:14:52,651 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.namenode.FSImage.readCheckpointTime(FSImage.java:571) at org.apache.hadoop.hdfs.server.namenode.FSImage.getFields(FSImage.java:562) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.java:237) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.java:226) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:394) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157) 2011-01-07 04:14:52,662 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.namenode.FSImage.readCheckpointTime(FSImage.java:571) at org.apache.hadoop.hdfs.server.namenode.FSImage.getFields(FSImage.java:562) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.java:237) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.read(Storage.java:226) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:394) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157) 2011-01-07 04:14:52,673 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: -- Regards Shuja-ur-Rehman Baig http://pk.linkedin.com/in/shujamughal -- Todd Lipcon Software
Re: LocalDirAllocator and getLocalPathForWrite
Hi Marc, LocalDirAllocator is an internal-facing API and you shouldn't be using it from user code. If you write into mapred.local.dir like this, you will end up with conflicts between different tasks running from the same node. The working directory of your MR task is already within one of the drives, and there isn't usually a good reason to write to multiple drives from within a task - you should get parallelism by running multiple tasks at the same time, not by having each task write to multiple places. Thanks -Todd On Wed, Jan 5, 2011 at 8:35 AM, Marc Sturlese marc.sturl...@gmail.comwrote: I have a doubt about how this works. The API documentation says that the class LocalDirAllocator is: An implementation of a round-robin scheme for disk allocation for creating files I am wondering, the disk allocation is done in the constructor? Let's say I have a cluster of just 1 node and 4 disks and I do inside a reducer: LocalDirAllocator localDirAlloc = new LocalDirAllocator(mapred.local.dir); Path pathA = localDirAlloc.getLocalPathForWrite(a) ; Path pathB = localDirAlloc.getLocalPathForWrite(b) ; The local paths pathA and pathB will for sure be in the same local disk as it was allocated by new LocalDirAllocator(mapred.local.dir) or is getLocalPathForWrite who gets the disk and so the two paths might not be in the same disk (as I have 4 disks)? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: LocalDirAllocator and getLocalPathForWrite
Hi Marc, Yes, using LocalFileSystem would work fine, or you can just use the normal java.io.File APIs. -Todd On Wed, Jan 5, 2011 at 3:26 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey Todd, LocalDirAllocator is an internal-facing API and you shouldn't be using it from user code. If you write into mapred.local.dir like this, you will end up with conflicts between different tasks running from the same node I know it's a bit odd usage but the thing is that I need to create files in the local file system, work in there with them amb after that upload them to hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I create a folder which looks like mapred.local.dir/taskId/attemptId and I work there and aparently I am having no problems. and there isn't usually a good reason to write to multiple drives from within a task When I said I had a cluster of one node, was just to try to clarify my doubt and explain the example. My cluster is bigger than that actually and each node has more than 1 phisical disk. To have multuple task running at the same time is what I do. I would like each task to write just to a single local disk but don't know how to do it. The working directory of your MR task is already within one of the drives, Is there a way to get a working directory in the local disk from the reducer? Could I do something similar to: FileSystem fs = FileSystem.get(conf); LocalFileSystem localFs = fs.getLocal(conf); Path path = localFs.getWorkingDirectory(); I would apreciate if you can tell me a bit more about this. I need to deal with these files just in local and want them copied to hdfs just when I finish working with them. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS FS Commands Hanging System
Hi Jon, Try: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / -Todd On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman jon2...@gmail.com wrote: Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- Todd Lipcon Software Engineer, Cloudera
Re: documentation of hadoop implementation
Hi Da, Chris Douglas had an excellent presentation at the Hadoop User Group last year on just this topic. Maybe you can find his slides or a recording on YDN/google? -Todd On Wed, Dec 29, 2010 at 10:20 AM, Da Zheng zhengda1...@gmail.com wrote: Hello, Is the implementation of Hadoop documented somewhere? especially that part where the output of mappers is partitioned, sorted and spilled to the disk. I tried to understand it, but it's rather complex. Is there any document that can help me understand it? Thanks, Da -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop RPC call response post processing
On Tue, Dec 28, 2010 at 1:00 PM, Stefan Groschupf s...@101tec.com wrote: Hi Todd, Right, that is the code I'm looking into. Though Responder is inner private class and is created responder = new Responder(); It would be great if the Responder implementation could be configured. Do you have any idea how to overwrite the Responder? Nope, it's not currently pluggable, nor do I think there's any compelling reason to make it pluggable. It's coupled quite tightly to the implementation right now. Perhaps you can hack something in a git branch, and if it has good results on something like NNBench it could be a general contribution? -Todd On Dec 27, 2010, at 8:21 PM, Todd Lipcon wrote: Hi Stefan, Sounds interesting. Maybe you're looking for o.a.h.ipc.Server$Responder? -Todd On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote: Hi All, I'm browsing the RPC code since quite a while now trying to find any entry point / interceptor slot that allows me to handle a RPC call response writable after it was send over the wire. Does anybody has an idea how break into the RPC code from outside. All the interesting methods are private. :( Background: Heavy use of the RPC allocates hugh amount of Writable objects. We saw in multiple systems that the garbage collect can get so busy that the jvm almost freezes for seconds. Things like zookeeper sessions time out in that cases. My idea is to create an object pool for writables. Borrowing an object from the pool is simple since this happen in our custom code, though we do know when the writable return was send over the wire and can be returned into the pool. A dirty hack would be to overwrite the write(out) method in the writable, assuming that is the last thing done with the writable, though turns out that this method is called in other cases too, e.g. to measure throughput. Any ideas? Thanks, Stefan -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop RPC call response post processing
Hi Stefan, Sounds interesting. Maybe you're looking for o.a.h.ipc.Server$Responder? -Todd On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote: Hi All, I'm browsing the RPC code since quite a while now trying to find any entry point / interceptor slot that allows me to handle a RPC call response writable after it was send over the wire. Does anybody has an idea how break into the RPC code from outside. All the interesting methods are private. :( Background: Heavy use of the RPC allocates hugh amount of Writable objects. We saw in multiple systems that the garbage collect can get so busy that the jvm almost freezes for seconds. Things like zookeeper sessions time out in that cases. My idea is to create an object pool for writables. Borrowing an object from the pool is simple since this happen in our custom code, though we do know when the writable return was send over the wire and can be returned into the pool. A dirty hack would be to overwrite the write(out) method in the writable, assuming that is the last thing done with the writable, though turns out that this method is called in other cases too, e.g. to measure throughput. Any ideas? Thanks, Stefan -- Todd Lipcon Software Engineer, Cloudera
Re: dictionary.csv
Hi Michael, Just a guess - maybe when you run outside of Hadoop you're running with a much larger Java heap? You can set mapred.child.java.opts to determine the heap size of the task procsses. Also double check that the same JVM is getting used. There are some functions that I've found to be signficantly faster or slower in OpenJDK vs Sun JDK. -Todd On Thu, Dec 23, 2010 at 6:28 AM, Black, Michael (IS) michael.bla...@ngc.com wrote: Using hadoop-0.20.2+737 on Redhat's distribution. I'm trying to use a dictionary.csv file from a Lucene index inside a map function plus another comma delimited file. It's just a simple loop of reading a line, split the line on commas, and add the dictionary entry to a hash map. It's about an 8M file with 1.5M lines. I'm using an absolute path so the file read is local (and not hdfs). I've verified no hdfs reads occurring from the job status. When I run this outside of hadoop it executes in 6 seconds. Inside hadoop it takes 13 seconds and the java process is 100% CPU the whole time... This makes absolutely no sense to me...I would've thought it should execute in the same time frame seeing as how it's just reading a local file (I'm only running one task at the moment). I'm also reading another file in a similar fashion and see 3.4 seconds vs 0.3 seconds (longer lines that are also getting split). This one is 45 lines and 278K. It appears that perhaps the split function is running slower since the smaller file with more columns runs 10X slower than the large file which is only 2X slower. Anybody have any idea why file input is slower under hadoop? -- Todd Lipcon Software Engineer, Cloudera
Re: NameNode question about lots of small files
Hi Chris, To have a reasonable understanding of used heap, you need to trigger a full GC. Otherwise, the heap number on the web UI doesn't actually tell you live heap. With the default (non-CMS) collector, the collector will not run until it is manually triggered or the heap becomes full. You can use JConsole to connect and force a GC to get a good measurement of heap used. Keep in mind also that the total heap is more than just the inodes and blocks. Other things like RPC buffers account for some usage as well. -Todd On Thu, Dec 16, 2010 at 11:25 AM, Chris Curtin curtin.ch...@gmail.com wrote: Hi, During our research into the 'small files' issues we are having I didn't find anything to explain what I see 'after' a change. Before: all files were stored in a structure like /source/year/month/day/ where we had dozens of files in each day's direcotory (and 500+ sources). We were using a lot more memory than we expected in the NameNode so we redesigned the directory structure. Here is the 'before' summary: *1823121 files and directories, 1754612 blocks = 3577733 total. Heap Size is 1.94 GB / 1.94 GB (100%)* ** The Heap Size relative to the # of files was higher than we expected (Using 150 byte/file rule of thumb from Cloudera) so we redesigned our approach. After: simplified into /source/year_month/ and while there are a lot of files in the directory, the memory usage dropped significantly: * * *1824616 files and directories, 1754612 blocks = 3579228 total. Heap Size is 1.18 GB / 1.74 GB (67%)* ** This was a suprise, since we hadn't done the file compaction step and already saw a huge drop in memory usage. What I don't understand is why the change in memory usage? The old structure is still there (/source/year/month/day) but with no files in the tips. The reorg process only moved the files to the new structure, a separate step is going to remove the empty directories. The 'before' was after the cluster was at idle for 4+ hours so I don't think it was GC timing issue. I'm looking to understand what happened so I can make sure our capacity calculations based on # of files and # of directories is correct. We're using: 0.20.2, r911707 Thanks, Chris -- Todd Lipcon Software Engineer, Cloudera
Re: Inclusion of MR-1938 in CDH3b4
Hey Roger, Thanks for the input. We're glad to see the community expressing their priorities on our JIRA. I noticed you also sent this to cdh-user, which is the more appropriate list. CDH-specific discussion should be kept off the ASF lists like common-user, which is meant for discussion about the upstream project. -Todd On Wed, Dec 15, 2010 at 10:43 AM, Roger Smith rogersmith1...@gmail.com wrote: If you would like MR-1938 patch (see link below), Ability for having user's classes take precedence over the system classes for tasks' classpath, to be included in CDH3b4 release, please put in a vote on https://issues.cloudera.org/browse/DISTRO-64. The details about the fix are here: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Roger -- Todd Lipcon Software Engineer, Cloudera
Re: Running not as hadoop user
The user who started the NN has superuser privileges on HDFS. You can also configure a supergroup by setting dfs.permissions.supergroup (default supergroup) -Todd On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, hadoop user has some advantages for running Hadoop. For example, if HDFS is mounted as a local file system, then only user hadoop has write/delete permissions. Can this privilege be given to another user? In other words, is this hadoop user hard-coded, or can another be used in its stead? Thank you, Mark -- Todd Lipcon Software Engineer, Cloudera
Re: Is there a single command to start the whole cluster in CDH3 ?
Hi everyone, Since this question is CDH-specific, it's better to ask on the cdh-user mailing list: https://groups.google.com/a/cloudera.org/group/cdh-user/topics?pli=1 Thanks -Todd On Wed, Nov 24, 2010 at 1:26 AM, Hari Sreekumar hsreeku...@clickable.comwrote: Hi Raul, I am not sure about CDH, but I have created a separate hadoop user to run my ASF hadoop version, and it works fine. Maybe you can also try creating a new hadoop user, make hadoop the owner of hadoop root directory. HTH, Hari On Wed, Nov 24, 2010 at 11:51 AM, rahul patodi patodira...@gmail.com wrote: hi Ricky, for installing CDH3 you can refer this tutorial: http://cloudera-tutorial.blogspot.com/2010/11/running-cloudera-in-distributed-mode.html all the steps in this tutorial are well tested.(*in case of any query please leave a comment*) On Wed, Nov 24, 2010 at 11:48 AM, rahul patodi patodira...@gmail.com wrote: hi Hary, when i try to start hadoop daemons by /usr/lib/hadoop# bin/start-dfs.sh on name node it is giving this error:*May not run daemons as root. Please specify HADOOP_NAMENODE_USER(*same for other daemons*)* but when i try to start it using */etc/init.d/hadoop-0.20-namenode start *it* *gets start successfully* ** * *whats the reason behind that? * On Wed, Nov 24, 2010 at 10:04 AM, Hari Sreekumar hsreeku...@clickable.com wrote: Hi Ricky, Yes, that's how it is meant to be. The machine where you run start-dfs.sh will become the namenode, and the machine whihc you specify in you masters file becomes the secondary namenode. Hari On Wed, Nov 24, 2010 at 2:13 AM, Ricky Ho rickyphyl...@yahoo.com wrote: Thanks for pointing me to the right command. I am using the CDH3 distribution. I figure out no matter what I put in the masters file, it always start the NamedNode at the machine where I issue the start-all.sh command. And always start a SecondaryNamedNode in all other machines. Any clue ? Rgds, Ricky -Original Message- From: Hari Sreekumar [mailto:hsreeku...@clickable.com] Sent: Tuesday, November 23, 2010 10:25 AM To: common-user@hadoop.apache.org Subject: Re: Is there a single command to start the whole cluster in CDH3 ? Hi Ricky, Which hadoop version are you using? I am using hadoop-0.20.2 apache version, and I generally just run the $HADOOP_HOME/bin/start-dfs.sh and start-mapred.sh script on my master node. If passwordless ssh is configured, this script will start the required services on each node. You shouldn't have to start the services on each node individually. The secondary namenode is specified in the conf/masters file. The node where you call the start-*.sh script becomes the namenode(for start-dfs) or jobtracker(for start-mapred). The node mentioned in the masters file becomes the 2ndary namenode, and the datanodes and tasktrackers are the nodes which are mentioned in the slaves file. HTH, Hari On Tue, Nov 23, 2010 at 11:43 PM, Ricky Ho rickyphyl...@yahoo.com wrote: I setup the cluster configuration in masters, slaves, core-site.xml, hdfs-site.xml, mapred-site.xml and copy to all the machines. And I login to one of the machines and use the following to start the cluster. for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done I expect this command will SSH to all the other machines (based on the master and slaves files) to start the corresponding daemons, but obviously it is not doing that in my setup. Am I missing something in my setup ? Also, where do I specify where the Secondary Name Node is run. Rgds, Ricky -- -Thanks and Regards, Rahul Patodi Associate Software Engineer, Impetus Infotech (India) Private Limited, www.impetus.com Mob:09907074413 -- -Thanks and Regards, Rahul Patodi Associate Software Engineer, Impetus Infotech (India) Private Limited, www.impetus.com Mob:09907074413 -- Todd Lipcon Software Engineer, Cloudera
Re: 0.21 found interface but class was expected
We do have policies against breaking APIs between consecutive major versions except for very rare exceptions (eg UnixUserGroupInformation went away when security was added). We do *not* have any current policies that existing code can work against different major versions without a recompile in between. Switching an implementation class to an interface is a case where a simple recompile of the dependent app should be sufficient to avoid issues. For whatever reason, the JVM bytecode for invoking an interface method (invokeinterface) is different than invoking a virtual method in a class (invokevirtual). -Todd On Sat, Nov 13, 2010 at 5:28 PM, Lance Norskog goks...@gmail.com wrote: It is considered good manners :) Seriously, if you want to attract a community you have an obligation to tell them when you're going to jerk the rug out from under their feet. On Sat, Nov 13, 2010 at 3:27 PM, Konstantin Boudnik c...@apache.org wrote: It doesn't answer my question. I guess I will have to look for the answer somewhere else On Sat, Nov 13, 2010 at 03:22PM, Steve Lewis wrote: Java libraries are VERY reluctant to change major classes in a way that breaks backward compatability - NOTE that while the 0.18 packages are deprecated, they are separate from the 0.20 packages allowing 0.18 code to run on 0.20 systems - this is true of virtually all Java libraries On Sat, Nov 13, 2010 at 3:08 PM, Konstantin Boudnik c...@apache.org wrote: As much as I love ranting I can't help but wonder if there were any promises to make 0.21+ be backward compatible with 0.20 ? Just curious? On Sat, Nov 13, 2010 at 02:50PM, Steve Lewis wrote: I have a long rant at http://lordjoesoftware.blogspot.com/ on this but the moral is that there seems to have been a deliberate decision that 0,20 code will may not be comparable with - I have NEVER seen a major library so directly abandon backward compatability On Fri, Nov 12, 2010 at 8:04 AM, Sebastian Schoenherr sebastian.schoenh...@student.uibk.ac.at wrote: Hi Steve, we had a similar problem. We've compiled our code with version 0.21 but included the wrong jars into the classpath. (version 0.20.2; NInputFormat.java). It seems that Hadoop changed this class to an interface, maybe you've a simliar problem. Hope this helps. Sebastian Zitat von Steve Lewis lordjoe2...@gmail.com: Cassandra sees this error with 0.21 of hadoop Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected I see something similar Error: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected I find this especially puzzling since org.apache.hadoop.mapreduce.TaskInputOutputContext IS a class not an interface Does anyone have bright ideas??? -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iF4EAREIAAYFAkzfGnwACgkQenyFlstYjhK6RwD+IdUVZuqXACV9+9By7fMiy/MO Uxyt4o4Z4naBzvjMu0cBAMkHLuHFHxuM5Yzb7doeC8eAzq+brsBzVHDKGeUD5FG4 =dr5x -END PGP SIGNATURE- -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iF4EAREIAAYFAkzfHswACgkQenyFlstYjhKtUAD+Nu/gL5DQ+v9iC89dIaHltvCK Oa6HOwVWNXWksUxhZhgBAMueLiItX6y4jhCKA5xCOqAmfxA0KZpTkyZr4+ozrazg =wScC -END PGP SIGNATURE- -- Lance Norskog goks...@gmail.com -- Todd Lipcon Software Engineer, Cloudera
Re: Granting Access
There's also a config dfs.supergroup - users in the supergroup act as superusers with regard to HDFS permissoins. -Todd On Fri, Oct 29, 2010 at 12:10 AM, Pavan yarapa...@gmail.com wrote: IMHO, there is no straight forward way of doing this in Hadoop except that you need to install Hadoop components such as MapReduce and HDFS as different users . This is an ongoing development priority. The available access related configuration options (before Kerberos V5) are : - dfs.permissions = true|false - dfs.web.ugi = webuser,webgroup - dfs.permissions.supergroup = supergroup - dfs.upgrade.permission = 777 - dfs.umask = 022 However, with Kerberos V5 availability there seems to be some hope wrt user authentication. In this model, HDFS uses the same permissions model as before but a user can no longer trivially impersonate other users and there is provisioning for ACLs to specify who can do what wrt jobs, tasks (authorization). Perhaps, you can try Yahoo's distribution or Cloudera's CDH3b3 to evaluate current status on this. ~ Pavan On Fri, Oct 29, 2010 at 11:47 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Hi all, As all of us know that Hadoop considers the user who starts the hadoop cluster as superuser. It provides all access to HDFS to that user. But know I want to know that how we can R/W access to new user for e.g Tom to access HDFS. Is there any command or we can write code for it. I read the tutorial but not able to succed . Thanks in Advance -- Todd Lipcon Software Engineer, Cloudera
Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs
Hi Ed, Sounds like this might be a bug, either in MultipleOutputs or in LZO. Does it work properly with gzip compression? Which LZO implementation are you using? The one from google code or the more up to date one from github (either kevinweil's or mine)? Any chance you could write a unit test that shows the issue? Thanks -Todd On Thu, Oct 21, 2010 at 2:52 PM, ed hadoopn...@gmail.com wrote: Hello everyone, I am having problems using MultipleOutputs with LZO compression (could be a bug or something wrong in my own code). In my driver I set MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class, NullWritable.class, Text.class); In my reducer I have: MultipleOutputsNullWritable, Text mOutput = new MultipleOutputsNullWritable, Text(context); public String generateFileName(Key key){ return custom_file_name; } Then in the reduce() method I have: mOutput.write(mNullWritable, mValue, generateFileName(key)); This results in creating LZO files that do not decompress properly (lzop -d throws the error lzop: unexpected end of file: outputFile.lzo) If I switch back to the regular context.write(mNullWritable, mValue); everything works fine. Am I forgetting a step needed when using MultipleOutputs or is this a bug/non-feature of using LZO compression in Hadoop. Thank you! ~Ed -- Todd Lipcon Software Engineer, Cloudera
Re: Questions about BN and CN
On Thu, Sep 23, 2010 at 6:20 PM, Konstantin Shvachko s...@yahoo-inc.comwrote: Hi Shen, Why do we need CheckpointNode? 1. First of all it is a compatible replacement of SecondaryNameNode. 2. Checkpointing is also needed for periodically compacting edits. You can do it with CN or BN, but CN is more lightweight. I assume there could be cases when streaming edits to BN over network can be slower than writing them to disk, so you might want to turn BN off for performance reasons. Also, if the BN hangs, it will hang edits on the primary node as well, since synchronous RPCs are used to push edits, right Konst? Would be worth testing a kill -STOP on the BN while performing operations on the primary. 3. Also in current implementation NN allows only one BN, but multiple CNs. So if the single BN dies the checkpointing will stall. You can prevent it by starting two CNs instead, or one BN and one CN. But I agree with you CN is just a subset of BN by its functionality. Thanks, Konstantin On 9/22/2010 5:50 PM, ChingShen wrote: Thanks Konstantin, But, my main question is that because the CN can only provide an old state of the namespace, so why do we need it? I think the BN is best solution. Shen On Thu, Sep 23, 2010 at 5:20 AM, Konstantin Shvachkos...@yahoo-inc.com wrote: The CheckpointNode creates checkpoints of the namespace, but does not keep an up-to-date state of the namespace in memory. If primary NN fails CheckpointNode can only provide an old state of the namespace created during latest checkpoint. Also CheckpointNode is a replacement for SecondaryNameNode in earlier releases. BackupNode does checkpoints too, but in addition keeps an up-to-date state of the namespace in its memory. When the primary NN dies you can ask BackupNode to save namespace, which will create the up-to-date image, and then start NN instead of BN on the node BN was running using that saved image directly or start NN on a different node using importCheckpoint from the saved inage directory. See the guide here. http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#Checkpoint+Node Thanks, --Konstantin On 9/8/2010 11:36 PM, ChingShen wrote: Hi all, I got the Backup node(BN) that includes all the checkpoint responsibilities, and it maintains an up-to-date namespace state, which is always in sync with the active NN. Q1. In which situation do we need a CN? Q2. If the NameNode machine fails, which different manual intervention between BN and CN? Thanks. Shen -- Todd Lipcon Software Engineer, Cloudera
Re: Xcievers Load
On Thu, Sep 23, 2010 at 7:05 AM, Michael Segel michael_se...@hotmail.com wrote: 4000 xcievers is a lot. I'm wondering if there's a correlation between the number of xcievers and ulimit -n. Should they be configured on a 1 to 1 ratio? 2:1 ratio of file descriptors to xceivers. 4000 xceivers is quite normal on a heavily loaded HBase cluster in my experience. The cost is the RAM of the extra threads, but there's not much you can do about it, given the current design of the datanode. -Todd -Mike Date: Thu, 23 Sep 2010 08:04:40 -0400 Subject: Xcievers Load From: marnan...@gmail.com To: u...@hbase.apache.org; common-user@hadoop.apache.org Hi, We have a job that writes many small files (using MultipleOutputFormat) and its exceeding the 4000 xcievers that we have configured. What is the effect on the cluster of increasing this count to some higher number? Many thanks, Martin PD: Hbase is also running on the cluster. -- Todd Lipcon Software Engineer, Cloudera
Re: accounts permission on hadoop
On Tue, Aug 31, 2010 at 5:28 PM, Allen Wittenauer awittena...@linkedin.com wrote: On Aug 31, 2010, at 2:43 PM, Edward Capriolo wrote: On Tue, Aug 31, 2010 at 5:07 PM, Gang Luo lgpub...@yahoo.com.cn wrote: Hi all, I am the administrator of a hadoop cluster. I want to know how to specify a group a user belong to. Or hadoop just use the group/user information from the linux system it runs on? For example, if a user 'smith' belongs to a group 'research' in the linux system. what is his account and group on HDFS? Currently hadoop gets its user groups from the posix user/groups. ... based upon what the client sends, not what the server knows. Not anymore in trunk or the security branch - now it's mapped on the server side with a configurable resolver class. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya alexander.l...@gmail.com wrote: Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? I don't know that anyone has tested it against 0.21 or trunk, but I don't see any reasons it won't work just fine -- the APIs are pretty stable between 0.20 and above. -Todd On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote: On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote: Hi Josh, No real pain points... just trying to investigate/research the best way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 repositories to build from and I am trying to find out if one should be used over the other. For instance, in your previous posting, you refer to hadoop-gpl-compression while the Twitter blog post from last year mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO is preferable but we're curious if there are any caveats/gotchas we should be aware of. Yes, definitely use the hadoop-lzo project from github -- either from my repo or from kevinweil's (the two are kept in sync) The repo on Google Code has a number of known bugs, which is why we forked it over to github last year. -Todd On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote: Bobby, We're working hard to make compression easier, the biggest hurdle currently is the licensing issues around the LZO codec libs (GPL, which is not compatible with ASF bsd-style license). Outside of making the changes to the mapred-site.xml file, with your setup would do you view as the biggest pain point? Josh Patterson Cloudera On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com bdennett%2bsoftw...@gmail.com bdennett%252bsoftw...@gmail.com wrote: We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo (http://github.com/kevinweil/hadoop-lzo). Some of what appear to be the better instructions/guides out there: * Josh Patterson's reply on June 25th to the Newbie to HDFS compression thread -- http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/% 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e * hadoop-gpl-compression FAQ -- http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ * Hadoop at Twitter (part 1): Splittable LZO Compression blog post -- http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable- lzo-compression/ Thanks in advance, -Bobby -- Todd Lipcon Software Engineer, Cloudera
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote: Hi Josh, No real pain points... just trying to investigate/research the best way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 repositories to build from and I am trying to find out if one should be used over the other. For instance, in your previous posting, you refer to hadoop-gpl-compression while the Twitter blog post from last year mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO is preferable but we're curious if there are any caveats/gotchas we should be aware of. Yes, definitely use the hadoop-lzo project from github -- either from my repo or from kevinweil's (the two are kept in sync) The repo on Google Code has a number of known bugs, which is why we forked it over to github last year. -Todd On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote: Bobby, We're working hard to make compression easier, the biggest hurdle currently is the licensing issues around the LZO codec libs (GPL, which is not compatible with ASF bsd-style license). Outside of making the changes to the mapred-site.xml file, with your setup would do you view as the biggest pain point? Josh Patterson Cloudera On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com wrote: We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo (http://github.com/kevinweil/hadoop-lzo). Some of what appear to be the better instructions/guides out there: * Josh Patterson's reply on June 25th to the Newbie to HDFS compression thread -- http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e * hadoop-gpl-compression FAQ -- http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ * Hadoop at Twitter (part 1): Splittable LZO Compression blog post -- http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ Thanks in advance, -Bobby -- Todd Lipcon Software Engineer, Cloudera
Re: Namenode throws NPE, won't start
Hey Bradford, Your image is corrupt. Do you have an image from a second name.dir that has different md5sums? It's sometimes possible to restore issues like this, but it usually requires using a tweaked namenode to skip over the bad data. -Todd On Tue, Jul 6, 2010 at 11:23 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hey guys, Running .20.2 -- namenode refuses to start. Ran fine the last few weeks. Any ideas? Full text: http://pastebin.com/QWZ6A0Ja 2010-07-06 23:20:01,524 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1076) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:975) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:962) -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- Todd Lipcon Software Engineer, Cloudera
Re: Next Release of Hadoop version number and Kerberos
On Wed, Jul 7, 2010 at 8:29 AM, Ananth Sarathy ananth.t.sara...@gmail.comwrote: Ok, that was what I was thinking. would there be a patch that could be applied to .21 or 20.2 for the Kerberos, since Cloudera is touting there .20S version, but my preference is staying with the latest apache distro and patch as needed The Security/Kerberos support is a huge project that has been in progress for several months, so the implementation spans tens (if not hundreds?) of patches. Manually adding these patches to a prior Apache release will take days if not weeks of work, is my guess. The 0.20S (aka 0.20.104) branch is over on Yahoo's repository at github. Currently, the latest Cloudera CDH3 beta release does not have security integrated, but we are in the midst of this integration as I speak. Thanks -Todd On Wed, Jul 7, 2010 at 11:23 AM, Tom White t...@cloudera.com wrote: Hi Ananth, The next release of Hadoop will be 0.21.0, but it won't have Kerberos authentication in it (since it's not all in trunk yet). The 0.22.0 release later this year will have a working version of security in it. Cheers, Tom On Wed, Jul 7, 2010 at 8:09 AM, Ananth Sarathy ananth.t.sara...@gmail.com wrote: is the next release of Hadoop going to .21 or .22? I was just wondering, cause I am hearing conflicting things about the next release having Kerberos security but looking through some past emails, hearing that it was coming in .22. Ananth T Sarathy -- Todd Lipcon Software Engineer, Cloudera
Re: What is IPC classes for???
On Wed, Jul 7, 2010 at 12:25 PM, Ahmad Shahzad ashahz...@gmail.com wrote: But i want to know if they are used by JobTracker and TaskTracker, or they are used by DataNode and NameNode. Or is it used by the child threads that are used for map-reduce to talk to TaskTracker on that machine. All of the above. See the packages org.apache.hadoop.{hdfs,mapred}.{protocol,server.protocol} Regards, Ahmad Shahzad -- Todd Lipcon Software Engineer, Cloudera
Re: Performance tuning of sort
On Thu, Jun 17, 2010 at 12:43 AM, Jeff Zhang zjf...@gmail.com wrote: Your understanding of Sort is not right. The key concept of Sort is the TotalOrderPartitioner. Actually before the map-reduce job, client side will do sampling of input data to estimate the distribution of input data. And the mapper do nothing, each reducer will fetch its data according the TotalOrderPartitioner. The data in each reducer is local sorted, and each reducer are sorted ( r0r1r2), so the overall result data is sorted. The sorting happens on the map side, actually, during the spill process. The mapper itself is an identity function, but the map task code does perform a sort (on a partition,key tuple) as originally described in this thread. Reducers just do a merge of mapper outputs. -Todd On Thu, Jun 17, 2010 at 12:13 AM, 李钰 car...@gmail.com wrote: Hi all, I'm doing some tuning of the sort benchmark of hadoop. To be more specified, running test against the org.apache.hadoop.examples.Sort class. As looking through the source code, I think the map tasks take responsibility of sorting the input data, and the reduce tasks just merge the map outputs and write them into HDFS. But here I've got a question I couldn't understand: the time cost of the reduce phase of each reduce task, that is writing data into HDFS, is different from each other. Since the input data and operations of each reduce task is the same, what reason will cause the execution time different? Is there anything wrong of my understanding? Does anybody have any experience on this? Badly need your help, thanks. Best Regards, Carp -- Best Regards Jeff Zhang -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop JobTracker Hanging
+1, jstack is crucial to solve these kinds of issues. Also, which scheduler are you using? Thanks -Todd On Thu, Jun 17, 2010 at 2:38 PM, Ted Yu yuzhih...@gmail.com wrote: Is upgrading to hadoop-0.20.2+228 possible ? Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan t...@shopping.com wrote: Folks, I need some help on job tracker. I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). I have the same problem with both the clusters: the job tracker hangs almost once a day. Symptom: The job tracker web page can not be loaded, the command hadoop job -list hangs and jobtracker.log file stops being updated. No useful information can I find in the job tracker log file. The symptom is gone after I restart the job tracker and the cluster runs fine for another 20+ hour period. And then the symptom comes back. I do not have serious problem with HDFS. Any ideas about the causes? Any configuration parameter that I can change to reduce the chances of the problem? Any tips for diagnosing and troubleshooting? Thanks! Tan -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop JobTracker Hanging
Li, just to narrow your search, in my experience this is usually caused by OOME on the JT. Check the logs for OutOfMemoryException, see what you find. You may need to configure it to retain fewer jobs in memory, or up your heap. -Todd On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan t...@shopping.com wrote: Thanks for your tips, Ted. All of our QA is done on 0.20.1, and I got a feeling it is not version related. I will run jstack and jmap once the problem happens again and I may need your help to analyze the result. Tan -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, June 17, 2010 2:39 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop JobTracker Hanging Is upgrading to hadoop-0.20.2+228 possible ? Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan t...@shopping.com wrote: Folks, I need some help on job tracker. I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). I have the same problem with both the clusters: the job tracker hangs almost once a day. Symptom: The job tracker web page can not be loaded, the command hadoop job -list hangs and jobtracker.log file stops being updated. No useful information can I find in the job tracker log file. The symptom is gone after I restart the job tracker and the cluster runs fine for another 20+ hour period. And then the symptom comes back. I do not have serious problem with HDFS. Any ideas about the causes? Any configuration parameter that I can change to reduce the chances of the problem? Any tips for diagnosing and troubleshooting? Thanks! Tan -- Todd Lipcon Software Engineer, Cloudera
Re: Jotracker java.lang.NumberFormatException
Hi Ankit, You need to trim your configuration variables so there is no extra whitespace. eg valuefoo/value, not: value foo /value There's a patch up for this in many of the configs, but not sure if we got mapred.job.tracker. -Todd On Tue, Jun 15, 2010 at 5:55 AM, ankit sharma ankit1984.c...@gmail.comwrote: Hi All , I have multinode cluster with 1 master (namenode,+ jobtracker) and 2 slavers (datanode + tasktracker). I can start namenode and datanodes,but CANT start jobtracker.The log shows java.lang.NumberFormatException. I will be greatfull if anybody can tell me what is the problem and why is this java execption being thrown? here is the complete log , all the files value are attached.(master,slaves,core-site.xml...etc) / 2010-06-15 17:05:12,679 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = centosxcat1/192.168.15.140 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2010-06-15 17:05:12,756 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2010-06-15 17:05:12,768 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.NumberFormatException: For input string: 54311 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.parseInt(Integer.java:514) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:146) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123) at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1579) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702) 2010-06-15 17:05:12,769 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down JobTracker at centosxcat1/192.168.15.140 / cat conf/master centosxcat1 cat conf/salves aadityaxcat3 linux-466z cat conf/core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. --?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property name dfs.name.dir /name value /fsname /value /property property name dfs.data.dir /name value /fsdata /value /property property name dfs.replication /name value 2 /value /property /configuration cat conf/mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property name mapred.job.tracker /name value centosxcat1:54311 /value /property /configuration configuration property namefs.default.name/name valuehdfs://centosxcat1/value /property /configuration cat conf/hdfs-site.xml -- Todd Lipcon Software Engineer, Cloudera
Re: Appending and seeking files while writing
On Mon, Jun 14, 2010 at 4:00 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for clarification. Append will be supported fully in 0.21. Any ETA for this version? Should be out soon - Tom White is working hard on the release. Note that the first release, 0.21.0, will be somewhat of a development quality release not recommended for production use. Of course, the way it will become production-worthy is by less risk-averse people trying it and finding the bugs :) Will it work both with Fuse and HDFS API? I don't know that the Fuse code has been updated to call append. My guess is that a small patch would be required. Also, append does *not* add random write. It simply adds the ability to re-open a file and add more data to the end. Just to clarify, even with append it won't be possible to: 1) Pause writing of new file, skip to any position, and update the data. 2) Open existing file, skip to any position and update the data. Correct, neither of those are allowed. This will be even with FUSE. Is this correct? Regards. -- Todd Lipcon Software Engineer, Cloudera
Re: Appending and seeking files while writing
On Mon, Jun 14, 2010 at 4:28 AM, Stas Oskin stas.os...@gmail.com wrote: By the way, what about an ability for node to read file which is being written by another node? This is allowed, though there are some remaining bugs to be ironed out here. See https://issues.apache.org/jira/browse/HDFS-1057 for example. Or the file must be written and closed completely, before it becomes available for other nodes? (AFAIK in 0.18.3 the file appeared as 0 size until it was closed). Regards. -- Todd Lipcon Software Engineer, Cloudera
Re: Appending and seeking files while writing
On Sun, Jun 13, 2010 at 12:46 AM, Vidur Goyal vi...@students.iiit.ac.inwrote: Append is supported in hadoop 0.20 . Append will be supported in the 0.20-append branch, which is still in progress. It is NOT supported in vanilla 0.20. You can turn on the config option but it is dangerous and highly discouraged for real use. Append will be supported fully in 0.21. Also, append does *not* add random write. It simply adds the ability to re-open a file and add more data to the end. -Todd Hi. I think this really depends on the append functionality, any idea whether it supports such behaviour now? Regards. On Fri, Jun 11, 2010 at 10:41 AM, hadooprcoks hadoopro...@gmail.com wrote: Stas, I also believe that there should be a seek interface on the write path so that the FS API is complete. The FsDataInputStream already support seek() - so should FsDataOutputStream. For File systems, that do not support the seek on the write path, the seek can be a no operation. Could you open a JIRA to track this. I am willing to provide the patch if you do not have the time to do so. thanks hadooprocks On Thu, Jun 10, 2010 at 5:05 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. Was the append functionality finally added to 0.20.1 version? Also, is the ability to seek file being written and write data in other place also supported? Thanks in advance! -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Todd Lipcon Software Engineer, Cloudera
Re: dfs.name.dir capacity for namenode backup?
On Mon, May 17, 2010 at 5:10 PM, jiang licht licht_ji...@yahoo.com wrote: I am considering to use a machine to save a redundant copy of HDFS metadata through setting dfs.name.dir in hdfs-site.xml like this (as in YDN): property namedfs.name.dir/name value/home/hadoop/dfs/name,/mnt/namenode-backup/value finaltrue/final /property where the two folders are on different machines so that /mnt/namenode-backup keeps a copy of hdfs file system information and its machine can be used to replace the first machine that fails as namenode. So, my question is how big this hdfs metatdata will consume? I guess it is proportional to the hdfs capacity. What ratio is that or what size will be for 150TB hdfs? On the order of a few GB, max (you really need double the size of your image, so it has tmp space when downloading a checkpoint or performing an upgrade). But on any disk you can buy these days you'll have plenty of space. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: dfs.name.dir capacity for namenode backup?
Yes, we recommend at least one local directory and one NFS directory for dfs.name.dir in production environments. This allows an up-to-date recovery of NN metadata if the NN should fail. In future versions the BackupNode functionality will move us one step closer to not needing NFS for production deployments. Note that the NFS directory does not need to be anything fancy - you can simply use an NFS mount on another normal Linux box. -Todd On Tue, May 18, 2010 at 11:19 AM, Andrew Nguyen and...@ucsfcti.org wrote: Sorry to hijack but after following this thread, I had a related question to the secondary location of dfs.name.dir. Is the approach outlined below the preferred/suggested way to do this? Is this people mean when they say, stick it on NFS ? Thanks! On May 17, 2010, at 11:14 PM, Todd Lipcon wrote: On Mon, May 17, 2010 at 5:10 PM, jiang licht licht_ji...@yahoo.com wrote: I am considering to use a machine to save a redundant copy of HDFS metadata through setting dfs.name.dir in hdfs-site.xml like this (as in YDN): property namedfs.name.dir/name value/home/hadoop/dfs/name,/mnt/namenode-backup/value finaltrue/final /property where the two folders are on different machines so that /mnt/namenode-backup keeps a copy of hdfs file system information and its machine can be used to replace the first machine that fails as namenode. So, my question is how big this hdfs metatdata will consume? I guess it is proportional to the hdfs capacity. What ratio is that or what size will be for 150TB hdfs? On the order of a few GB, max (you really need double the size of your image, so it has tmp space when downloading a checkpoint or performing an upgrade). But on any disk you can buy these days you'll have plenty of space. -Todd -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Any possible to set hdfs block size to a value smaller than 64MB?
, May 18, 2010 at 2:34 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Hey Pierre, These are not traditional filesystem blocks - if you save a file smaller than 64MB, you don't lose 64MB of file space.. Hadoop will use 32KB to store a 32KB file (ok, plus a KB of metadata or so), not 64MB. Brian On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote: Hi, I'm porting a legacy application to hadoop and it uses a bunch of small files. I'm aware that having such small files ain't a good idea but I'm not doing the technical decisions and the port has to be done for yesterday... Of course such small files are a problem, loading 64MB blocks for a few lines of text is an evident loss. What will happen if I set a smaller, or even way smaller (32kB) blocks? Thank you. Pierre ANCELOT. -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- Todd Lipcon Software Engineer, Cloudera
Re: NameNode deadlocked (help?)
Hey Brian, Looks like it's not deadlocked, but rather just busy doing a lot of work: org.apache.hadoop.hdfs.server.namenode.fsnamesystem$heartbeatmoni...@1c778255 daemon prio=10 tid=0x2aaafc012c00 nid=0x493 runnable [0x413da000..0x413daa10] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeStoredBlock(FSNamesystem.java:3236) - locked 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeDatanode(FSNamesystem.java:2695) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.heartbeatCheck(FSNamesystem.java:2785) - locked 0x2aaab3659dd8 (a java.util.TreeMap) - locked 0x2aaab3653848 (a java.util.ArrayList) - locked 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2312) at java.lang.Thread.run(Thread.java:619) Let me dig up the 19.1 source code and see if this looks like an infinite loop or just one that's tying it up for some number of seconds. Anything being written into the logs? -Todd On Fri, May 14, 2010 at 5:22 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hey guys, I know it's 5PM on a Friday, but we just saw one of our big cluster's namenode's deadlock. This is 0.19.1; does this ring a bell for anyone? I haven't had any time to start going through source code, but I figured I'd send out a SOS in case if this looked familiar. We had restarted this cluster a few hours ago and made the following changes: 1) Increased the number of datanode handlers from 10 to 40. 2) Increased ipc.server.listen.queue.size from 128 to 256. If nothing else, I figure a deadlocked NN might be interesting to devs... Brian 2010-05-14 17:11:30 Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.2-b01 mixed mode): IPC Server handler 39 on 9000 daemon prio=10 tid=0x2aaafc181400 nid=0x4cd waiting for monitor entry [0x45962000..0x45962d90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:2231) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.sendHeartbeat(NameNode.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 38 on 9000 daemon prio=10 tid=0x2aaafc17f800 nid=0x4cc waiting for monitor entry [0x45861000..0x45861d10] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStats(FSNamesystem.java:3326) - waiting to lock 0x2aaab3653848 (a java.util.ArrayList) at org.apache.hadoop.hdfs.server.namenode.NameNode.getStats(NameNode.java:505) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) IPC Server handler 37 on 9000 daemon prio=10 tid=0x2aaafc17e000 nid=0x4cb waiting for monitor entry [0x4575f000..0x45760a90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:801) - waiting to lock 0x2aaab3843e40 (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.getBlockLocations(NameNode.java:272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
Re: HDFS-630 patch for Hadoop v0.20
Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-630 patch for Hadoop v0.20
On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello Todd, Thank you for the reply. In the cluster I use here, apache Hadoop is installed. So I have to use that. I am trying out HBase on my laptop first. Even though I install CDH2, it won't be useful because on the cluster, I have to work with apache Hadoop. Since version 0.21 is still in development, there should be a HDFS-630 patch for the current stable release of Hadoop isn't it? No, it was not considered for release in Hadoop 0.20.X because it breaks wire compatibility, and though I've done a workaround to avoid issues stemming from that, it would be unlikely to pass a backport vote. -Todd On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Hadoop performance - xfs and ext4
On Tue, May 11, 2010 at 7:33 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: On 23/04/10 15:43, Todd Lipcon wrote: Hi Stephen, Can you try mounting ext4 with the nodelalloc option? I've seen the same improvement due to delayed allocation butbeen a little nervous about that option (especially in the NN where we currently follow what the kernel people call an antipattern for image rotation). Hi Todd, Sorry for the delayed response - I had to wait for another test window before trying this out. To clarify, my namename and secondary namenode have been using ext4 in all tests - reconfiguring the datanodes is a fast operation, the nn and 2nn less so. I figure any big performance benefit would appear on the data nodes anyway and can then apply it back to the nn and 2nn if testing shows any benefits in changing. So I tried running our datanodes with their ext4 filesystems mounted using noatime,nodelalloc and after 6 runs of the TeraSort, it seems it runs SLOWER with those options by between 5-8%. The TeraGen itself seemed to run about 5% faster but it was only a single run so I'm not sure how reliable that is. Yep, that's what I'd expect. noatime should be a small improvement, nodelalloc should be a small detriment. The thing is that delayed allocation has some strange cases that could theoretically cause data loss after a power outage, so I was interested to see if it nullified all of your performance gains or if it were just a small hit. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: This list's spam filter
Try sending plaintext email instead of rich - the spam scoring for HTML email is overly agressive on the apache listservs. -Todd On Mon, May 10, 2010 at 11:14 AM, Oscar Gothberg oscar.gothb...@gmail.com wrote: Hi, I've been trying all morning to post a Hadoop question to this list but can't get through the spam filter. At a loss. Does anyone have any ideas what may trigger it? What can I do to not have it tag me? Thanks, / Oscar -- Todd Lipcon Software Engineer, Cloudera
Re: Applying HDFS-630 patch to hadoop-0.20.2 tarball release?
Hi Joseph, You'll have to apply the patch with patch -p0 foo.patch and then recompile using ant. If you want to avoid this you can grab the CDH2 tarball here: http://archive.cloudera.com/cdh/2/ - it includes the HDFS-630 patch. Thanks -Todd On Tue, May 4, 2010 at 9:38 AM, Joseph Chiu joec...@joechiu.com wrote: I am currently testing out a rollout of HBase 0.20.3 on top of Hadoop 0.20.2. The HBase doc recommends HDFS-630 patch be applied. I realize this is a newbieish question, but has anyone done this to the tarball Hadoop-0.20.2 release? Since this is a specific recommendation by the HBase release, I think a walk-through would be quite useful for anyone else similary coming up the Hadoop + HBase learning curve. (I'm afraid I've been away from the Linux / DB / Systems world for far too long, nearly a decade, and I've come back to work to a very changed landscape. But I digress...) Thanks in advance. Joseph -- Todd Lipcon Software Engineer, Cloudera
Re: DataNode not able to spawn a Task
Hi Vishal, What operating system are you on? The TT is having issues parsing the output of df -Todd On Tue, Apr 27, 2010 at 9:03 AM, vishalsant vishal.santo...@gmail.comwrote: Hi guys, I see the exception below when I launch a job 0/04/27 10:54:16 INFO mapred.JobClient: map 0% reduce 0% 10/04/27 10:54:22 INFO mapred.JobClient: Task Id : attempt_201004271050_0001_m_005760_0, Status : FAILED Error initializing attempt_201004271050_0001_m_005760_0: java.lang.NumberFormatException: For input string: - at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:476) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630) Few things * I ran fsck on the namenode and no corrupted blocks reported. * The -report from dfsadmin , says the datanode is up. -- View this message in context: http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28378863.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: Extremely slow HDFS after upgrade
) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) So the behavior is as if the network is extremely slow, but it seems to only affect Hadoop. Any ideas? Thanks, -Scott -- Todd Lipcon Software Engineer, Cloudera
Re: Extremely slow HDFS after upgrade
Checked link autonegotiation with ethtool? Sometimes gige will autoneg to 10mb half duplex if there's a bad cable, NIC, or switch port. -Todd On Fri, Apr 16, 2010 at 8:08 PM, Scott Carey sc...@richrelevance.comwrote: More info -- this is not a Hadoop issue. The network performance issue can be replicated with SSH only on the links where Hadoop has a problem, and only in the direction with a problem. HDFS is slow to transfer data in certain directions from certain machines. So, for example, copying from node C to D may be slow, but not the other direction from C to D. Likewise, although only 3 of 8 nodes have this problem, it is not universal. For example, node C might have trouble copying data to 5 of the 7 other nodes, and node G might have trouble with all 7 other nodes. No idea what it is yet, but SSH exhibits the same issue -- only in those specific point-to-point links in one specific direction. -Scott On Apr 16, 2010, at 7:10 PM, Scott Carey wrote: Ok, so here is a ... fun result. I have dfs.replication.min set to 2, so I can't just do hsdoop fs -Ddfs.replication=1 put someFile someFile Since that will fail. So here are two results that are fascinating: $ time hadoop fs -Ddfs.replication=3 -put test.tar test.tar real1m53.237s user0m1.952s sys 0m0.308s $ time hadoop fs -Ddfs.replication=2 -put test.tar test.tar real0m1.689s user0m1.763s sys 0m0.315s The file is 77MB and so is two blocks. The test with replication level 3 is slow about 9 out of 10 times. When it is slow it sometimes is 28 seconds, sometimes 2 minutes. It was fast one time... The test with replication level 2 is fast in 40 out of 40 tests. This is a development cluster with 8 nodes. It looks like the replication level of 3 or more causes trouble. Looking more closely at the logs, it seems that certain datanodes (but not all) cause large delays if they are in the middle of an HDFS write chain. So, a write that goes from A B C is fast if B is a good node and C a bad node. If its A C B then its slow. So, I can say that some nodes but not all are doing something wrong. when in the middle of a write chain. If I do a replication = 2 write on one of these bad nodes, its always slow. So the good news is I can identify the bad nodes, and decomission them. The bad news is this still doesn't make a lot of sense, and 40% of the nodes have the issue. Worse, on a couple nodes the behavior in the replication = 2 case is not consistent -- sometimes the first block is fast. So it may be dependent on not just the source, but the source target combination in the chain. At this point, I suspect something completely broken at the network level, perhaps even routing. Why it would show up after an upgrade is yet to be determined, but the upgrade did include some config changes and OS updates. Thanks Todd! -Scott On Apr 16, 2010, at 5:34 PM, Todd Lipcon wrote: Hey Scott, This is indeed really strange... if you do a straight hadoop fs -put with dfs.replication set to 1 from one of the DNs, does it upload slow? That would cut out the network from the equation. -Todd On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.com wrote: I have two clusters upgraded to CDH2. One is performing fine, and the other is EXTREMELY slow. Some jobs that formerly took 90 seconds, take 20 to 50 minutes. It is an HDFS issue from what I can tell. The simple DFS benchmark with one map task shows the problem clearly. I have looked at every difference I can find and am wondering where else to look to track this down. The disks on all nodes in the cluster check out -- capable of 75MB/sec minimum with a 'dd' write test. top / iostat do not show any significant CPU usage or iowait times on any machines in the cluster during the test. ifconfig does not report any dropped packets or other errors on any machine in the cluster. dmesg has nothing interesting. The poorly performing cluster is on a slightly newer CentOS version: Poor: 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux (CentOS 5.4, recent patches) Good: 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux (CentOS 5.3, I think) The performance is always poor, not sporadically poor. It is poor with M/R tasks as well as non-M/R HDFS clients (i.e. sqoop). Poor performance cluster (no other jobs active during the test): --- $ hadoop jar /usr/lib/hadoop/hadoop-0.20.1+169.68-test.jar TestDFSIO -write -nrFiles 1 -fileSize 2000 10/04/16 12:53:13 INFO mapred.FileInputFormat: nrFiles = 1 10/04/16 12:53:13 INFO mapred.FileInputFormat: fileSize (MB) = 2000 10/04/16 12:53:13 INFO mapred.FileInputFormat: bufferSize = 100 10/04/16 12:53:14 INFO
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: Todd Lipcon wrote: Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. Yes, it looks like it is a kernel bug alright (see thread on kernel netdev at http://marc.info/?t=12709428891r=1w=2 if interested). To be fair, I don't think these bugs are confined to Debian - I did some initial testing with Scientific Linux and also ran into problems with forcedeth. Interesting, good find. I try to avoid forcedeth now and have heard the same from ops people at various large linux deployments. Not sure why, but it's traditionally had a lot of bugs/regressions. Sure, but I figured I'd go with a distro now that can be largely left untouched for the next 2-3 years and Debian lenny felt that bit old for that. I know RHEL/CentOS would fit that requirement also, will see. I'm also interested in using DRBD in some of our nodes for redundancy, again, running with a newer distro should reduce the pain of configuring that. Finally, I figured burning in our cluster was a good opportunity to give back to the community and do some testing on their behalf. Very admirable of you :) It is good to have some people running new kernels to suss these issues out before the rest of us check out modern technology ;-) With regard to our TeraSort benchmark time of ~23 minutes - is that in the right ballpark for a cluster of 45 data nodes and a nn and 2nn? Yep, sounds about the right ballpark. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: cluster under-utilization with Hadoop Fair Scheduler
Hi Abhishek, This behavior is improved by MAPREDUCE-706 I believe (not certain that that's the JIRA, but I know it's fixed in trunk fairscheduler). These patches are included in CDH3 (currently in beta) http://archive.cloudera.com/cdh/3/ In general, though, map tasks that are so short are not going to be very efficient - even with fast assignment there is some constant overhead per task. Thanks -Todd On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma absha...@usc.edu wrote: Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrently active map slots on my cluster never exceeds 100, and hence, the cluster utilization during my experiments never exceeds 50% even when large jobs with more than a 1000 maps are being executed. A look at the Fair Scheduler code (in particular, the assignTasks function) revealed the reason. As per my understanding, with the implementation in Hadoop 0.20.0, a TaskTracker is not assigned more than 1 map and 1 reduce task per heart beat. In my experiments, in every heart beat, each TT has 2 free map slots but is assigned only 1 map task, and hence, the utilization never goes beyond 50%. Of course, this (degenerate) case does not arise when map tasks take more than one 1 heart beat interval to finish. For example, I repeated the experiments with maps tasks taking close to 15 s to finish and noticed close to 100 % utilization when large jobs were executing. Why does the Fair Scheduler not assign more than one map task to a TT per heart beat? Is this done to spread the load uniformly across the cluster? I looked at assignTasks function in the default Hadoop scheduler (JobQueueTaskScheduler.java), and it does assign more than 1 map task per heart beat to a TT. It will be easy to change the Fair Scheduler to assign more than 1 map task to a TT per heart beat (I did that and achieved 100% utilization even with small map tasks). But I am wondering, if doing so will violate some fairness properties. Thanks, Abhishek -- Todd Lipcon Software Engineer, Cloudera
Re: What means PacketResponder ...terminating ?
Hi Al, It just means that the write pipeline is tearing itself down. Please see my response on the hbase list for further explanation of your particular issue. -Todd On Fri, Apr 9, 2010 at 12:15 AM, Al Lias al.l...@gmx.de wrote: While searching for a HBase Problem I came across this log messages: ... box00: /var/log/hadoop/hadoop-hadoop-datanode-box00.log.2010-04-08:2010-04-08 16:39:29,200 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_991235084167234271_101356 terminating box05: /var/log/hadoop/hadoop-hadoop-datanode-box05.log.2010-04-08:2010-04-08 16:39:29,200 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_991235084167234271_101356 terminating box13: /var/log/hadoop/hadoop-hadoop-datanode-box13.log.2010-04-08:2010-04-08 16:39:29,200 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_991235084167234271_101356 terminating As they seem to preceed some HBase Problem, I would like to understand what it means. Thx for any help, Al -- Todd Lipcon Software Engineer, Cloudera
Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
On Fri, Apr 9, 2010 at 8:18 AM, stephen mulcahy stephen.mulc...@deri.orgwrote: Allen Wittenauer wrote: On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote: When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2 datanodes of the cluster enter a state whereby they are no longer responsive to network traffic. How much free memory do you have? Lots, a few GB How many tasks per node do you have? I left this at the default. What are the service times, etc, on your IO system? Can you clarify this query? Has anyone run into similar problems with their environments? I noticed that the when the nodes become unresponsive, it often happens when the TeraSort is at I've always seen Linux nodes go unresponsive when they get memory starved to the point that the OOM can't function because it can't allocate enough mem. Sure, but I can login to the unresponsive nodes via the console - it's just the network that has become responsive. To be clear here, I don't suspect Hadoop is the root cause of the problem - I suspect either a kernel bug or some other operating system level bug. I was wondering if others had run into similar problems. Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. I was also wondering in general what kernel versions and distros people are using, especially for larger production clusters. The overwhelming majority of production clusters run on RHEL 5.3 or RHEL 5.4 in my experience (I'm lumping CentOS 5.3/5.4 in with RHEL here). I know one or two production clusters running Debian Lenny, but none running something as new as what you're talking about. Hadoop doesn't exercise the new features in very recent kernels, so there's no sense accepting instability - just go with something old that works! -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who has seen this error, I just committed a fix to my github repository: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 The problem turned out to be an assumption that InputStream.read() would return all the bytes that were asked for. This turns out to almost always be true on local filesystems, but on HDFS it's not true if the read crosses a block boundary. So, every couple of TB of lzo compressed data one might see this error. Big thanks to Alex Roetter who was able to provide a file that exhibited the bug! Thanks -Todd On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon t...@cloudera.com wrote: Hi Alex, Unfortunately I wasn't able to reproduce, and the data Dmitriy is working with is sensitive. Do you have some data you could upload (or send me off list) that exhibits the issue? -Todd On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter aroet...@imageshack.net wrote: Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know! Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote: Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who has seen this error, I just committed a fix to my github repository: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 The problem turned out to be an assumption that InputStream.read() would return all the bytes that were asked for. This turns out to almost always be true on local filesystems, but on HDFS it's not true if the read crosses a block boundary. So, every couple of TB of lzo compressed data one might see this error. Big thanks to Alex Roetter who was able to provide a file that exhibited the bug! Thanks -Todd On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon t...@cloudera.com wrote: Hi Alex, Unfortunately I wasn't able to reproduce, and the data Dmitriy is working with is sensitive. Do you have some data you could upload (or send me off list) that exhibits the issue? -Todd On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter aroet...@imageshack.net wrote: Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: What means log DIR* NameSystem.completeFile: failed to complete... ?
Hi Al, Usually this indicates that the file was renamed or deleted while it was still being created by the client. Unfortunately it's not the most descriptive :) -Todd On Tue, Apr 6, 2010 at 5:36 AM, Al Lias al.l...@gmx.de wrote: Hi all, this warning is written in FSFileSystem.java/completeFileInternal(). It makes the calling code in NameNode.java throwing an IOException. FSFileSystem.java ... if (fileBlocks == null ) { NameNode.stateChangeLog.warn( DIR* NameSystem.completeFile: + failed to complete + src + because dir.getFileBlocks() is null + and pendingFile is + ((pendingFile == null) ? null : (from + pendingFile.getClientMachine())) ); ... What is the meaning of this warning? Any Idea what could have gone wrong in such a case? (This popped up through hbase, but as this code is in HDFS, I am asking this list) Thx Al -- Todd Lipcon Software Engineer, Cloudera
Re: Jetty can't start the SelectChannelConnector
Hi Edson, Your attachments did not come through - can you put them on pastebin? -Todd On Tue, Apr 6, 2010 at 3:37 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi Todd, I'm getting this behavior in another cluster too, there the same thing happens. and as I don't have a jstack installed in the first cluster and I'm not the admin, I'm sending the results of the second cluster. These are the results: [erl...@cohiba ~ ]$ jstack -l 22510 22510: well-known file is not secure [erl...@cohiba ~ ]$ jstack -l 3836 3836: well-known file is not secure The jstack -F result is in thread_dump files and the jstack -m result is in java_native_frames. The files ending with nn are the namenode results and the files ending with dn are the datanode results. Thanks, Edson Ramiro On 6 April 2010 18:19, Todd Lipcon t...@cloudera.com wrote: Hi Edson, Can you please run jstack on the daemons in question and paste the output here? -Todd On Tue, Apr 6, 2010 at 12:44 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, I configured the Hadoop in a cluster and the NameNode and JobTracker are running ok, but the DataNode and TaskTracker Doesn't start, they stop and keep waiting when they are going to start the Jetty I observed that Jetty can't start the _SelectChannelConnector_ Is there any Jetty configuration that should be changed ? There is no log message in the NN and JT when I try to start the DN and TT. The kernel I'm using is: Linux bl05 2.6.32.10 #2 SMP Tue Apr 6 12:33:42 BRT 2010 x86_64 GNU/Linux This is the message when I start the DN. It happens with TT too. ram...@bl05:~/hadoop-0.20.1+169.56$ ./bin/hadoop datanode 10/04/06 16:24:14 INFO datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = bl05.ctinfra.ufpr.br/192.168.1.115 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1+169.56 STARTUP_MSG: build = -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled by 'chad' on Tue Feb 2 13:27:17 PST 2010 / 10/04/06 16:24:14 INFO datanode.DataNode: Registered FSDatasetStatusMBean 10/04/06 16:24:14 INFO datanode.DataNode: Opened info server at 50010 10/04/06 16:24:14 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 10/04/06 16:24:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 10/04/06 16:24:14 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 10/04/06 16:24:14 INFO http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 10/04/06 16:24:14 INFO http.HttpServer: Jetty bound to port 50075 10/04/06 16:24:14 INFO mortbay.log: jetty-6.1.14 Thanks in Advance, Edson Ramiro -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: losing network interfaces during long running map-reduce jobs
Hi David, On Fri, Apr 2, 2010 at 6:16 PM, David Howell dehow...@gmail.com wrote: I'm encountering a completely bizarre failure mode in my Hadoop cluster. A week ago, I switched from vanilla apache Hadoop 0.20.1 to CDH 2. Ever since then, my tasktracker/ datenode machines have been regularly losing their networking during long ( 1 hour) jobs. Restarting the network interface brings them back online immediately. Could you clarify wha you mean by losing their networking? Can you ping the node externally? If you access the node via the console (via ILOM, etc) and run tcpdump or tshark, can you see ethernet broadcast traffic at all? Do you see anything in dmesg on the machine in question? Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd On Thu, Apr 1, 2010 at 12:16 AM, Dmitriy Ryaboy dmit...@twitter.com wrote: Hi folks, We write a lot of lzo-compressed files to HDFS -- some via scribe, some using internal tools. Occasionally, we discover that the created lzo files cannot be read from HDFS -- they get through some (often large) portion of the file, and then fail with the following stack trace: Exception in thread main java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122) at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at com.twitter.twadoop.jobs.LzoReadTest.main(LzoReadTest.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) The initial thought is of course that the lzo file is corrupt -- however, plain-jane lzop is able to read these files. Moreover, if we pull the files out of hadoop, uncompress them, compress them again, and put them back into HDFS, we can usually read them from HDFS as well. We've been thinking that this strange behavior is caused by a bug in the hadoop-lzo libraries (we use the version with Twitter and Cloudera fixes, on github: http://github.com/kevinweil/hadoop-lzo ) However, today I discovered that using the exact same environment, codec, and InputStreams, we can successfully read from the local file system, but cannot read from HDFS. This appears to point at possible issues in the FSDataInputStream or further down the stack. Here's a small test class that tries to read the same file from HDFS and from the local FS, and the output of running it on our cluster. We are using the CDH2 distribution. https://gist.github.com/e1bf7e4327c7aef56303 Any ideas on what could be going on? Thanks, -Dmitriy -- Todd Lipcon Software Engineer, Cloudera
Re: java.io.IOException: Function not implemented
Hi Edson, I noticed that only the h01 nodes are running 2.6.32.9, the other broken DNs are 2.6.32.10. Is there some reason you are running a kernel that is literally 2 weeks old? I wouldn't be at all surprised if there were a bug here, or some issue with your Debian unstable distribution... -Todd On Tue, Mar 30, 2010 at 3:54 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, Thanks for help Todd and Steve, I configured Hadoop (0.20.2) again and I'm getting the same error (Function not implemented). Do you think it's a Hadoop bug? This is the situation: I've 28 nodes where just four are running the datanode. In all other nodes the tasktracker in running ok. The NN and JT are running ok. The configuration of the machines is the same, its a nfs shared home. In all machines the Java version is 1.6.0_17. This is the kernel version of the nodes, note that are two versions and in both the datanode doesn't work. Just in the h0* machines. ram...@lcpad:~/hadoop-0.20.2$ ./bin/slaves.sh uname -a | sort a01: Linux a01 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a02: Linux a02 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a03: Linux a03 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a04: Linux a04 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a05: Linux a05 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a06: Linux a06 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a07: Linux a07 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a09: Linux a09 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux a10: Linux a10 2.6.27.11 #4 Fri Jan 16 22:32:46 BRST 2009 x86_64 GNU/Linux ag06: Linux ag06 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux ag07: Linux ag07 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux bl02: Linux bl02 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux bl03: Linux bl03 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux bl04: Linux bl04 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux bl06: Linux bl06 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux bl07: Linux bl07 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux ct02: Linux ct02 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux ct03: Linux ct03 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux ct04: Linux ct04 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux ct06: Linux ct06 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux h01: Linux h01 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux h02: Linux h02 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux h03: Linux h03 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux h04: Linux h04 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux sd02: Linux sd02 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux sd05: Linux sd05 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux sd06: Linux sd06 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux sd07: Linux sd07 2.6.32.10 #1 SMP Tue Mar 16 10:17:30 BRT 2010 x86_64 GNU/Linux These are the java processes running on each clients. Jjust the h0* machines are running ok. ram...@lcpad:~/hadoop-0.20.2$ ./bin/slaves.sh pgrep -lc java | sort a01: 1 a02: 1 a03: 1 a04: 1 a05: 1 a06: 1 a07: 1 a09: 1 a10: 1 ag06: 1 ag07: 1 bl02: 1 bl03: 1 bl04: 1 bl06: 1 bl07: 1 ct02: 1 ct03: 1 ct04: 1 ct06: 1 h01: 2 h02: 2 h03: 2 h04: 2 sd02: 1 sd05: 1 sd06: 1 sd07: 1 This is my configuration: ram...@lcpad:~/hadoop-0.20.2$ cat conf/*site* ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://lcpad:9000/value /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelcpad:9001/value /property /configuration Thanks in Advance, Edson Ramiro On 30 March 2010 05:58, Steve Loughran ste...@apache.org wrote: Edson Ramiro wrote: I'm not involved with Debian community :( I think you are now... -- Todd Lipcon Software Engineer, Cloudera
Re: java.io.IOException: Function not implemented
Hi Edson, What operating system are you on? What kernel version? Thanks -Todd On Mon, Mar 29, 2010 at 12:01 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, I'm trying to install Hadoop on a cluster, but I'm getting this error. I'm using java version 1.6.0_17 and hadoop-0.20.1+169.56.tar.gz from Cloudera. Its running in a NFS home shared between the nodes and masters. The NameNode works well, but all nodes try to connect and fail. Any Idea ? Thanks in Advance. == logs/hadoop-ramiro-datanode-a05.log == 2010-03-29 15:56:00,168 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 0 time(s). 2010-03-29 15:56:01,172 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 1 time(s). 2010-03-29 15:56:02,176 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 2 time(s). 2010-03-29 15:56:03,180 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 3 time(s). 2010-03-29 15:56:04,184 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 4 time(s). 2010-03-29 15:56:05,188 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 5 time(s). 2010-03-29 15:56:06,192 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 6 time(s). 2010-03-29 15:56:07,196 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 7 time(s). 2010-03-29 15:56:08,200 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 8 time(s). 2010-03-29 15:56:09,204 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 9 time(s). 2010-03-29 15:56:09,204 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to lcpad/192.168.1.51:9000 failed on local exception: java.io.IOException: Function not implemented at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:278) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:225) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1309) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1264) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1272) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) Caused by: java.io.IOException: Function not implemented at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.init(EPollArrayWrapper.java:68) at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:52) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:407) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) at org.apache.hadoop.ipc.Client.getConnection(Client.java:860) at org.apache.hadoop.ipc.Client.call(Client.java:720) ... 13 more Edson Ramiro -- Todd Lipcon Software Engineer, Cloudera
Re: hadoop-append feature not in stable release?
Hi Gokul, You're correct that all of the stable released versions of Hadoop have a buggy implemention of append, and thus dfs.support.append is disabled in 0.20. The new implementation of append has been tracked in HDFS-265 and is now complete in trunk - just a few more tests are being done on it at this point. Major props to the team at Yahoo for the work here! We'll have to wait some time before this new implementation is available in an Apache release - see the ongoing release thread on -general for more information on the timeline. Regarding ports of append into an 0.20 branch, we will be working on adding just hflush() functionality to our distribution (CDH) in CDH3, for the benefit of HBase. This distribution should be available within the next couple of months. The patches to track are HDFS-200, HDFS-142, and a number of other bug fixes on top of those. Please get in touch with me off list if you're interested in testing development builds with this functionality before it is generally available. Thanks -Todd On Mon, Mar 29, 2010 at 9:01 PM, Gokulakannan M gok...@huawei.com wrote: Hi, I am new to hadoop. The following questions popped up in my mind and I couldn't get answers from web. I found that in hdfs-default.xml, the property dfs.support.append has been set to false by default with the description Does HDFS allow appends to files? This is currently set to false because there are bugs in the append code and is not supported in any production cluster So, is there a way to resolve this issue? any existing patches(like HADOOP-1700 http://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.pl ugin.system.issuetabpanels%3Aall-tabpanel ) will solve the problem of hadoop-append to be stable? From HADOOP-1700 http://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.pl ugin.system.issuetabpanels%3Aall-tabpanel , I can see that this feature has been enabled and updated in trunk. But why it is not enabled in the stable Hadoop release? Thanks, Gokul -- Todd Lipcon Software Engineer, Cloudera
Re: java.io.IOException: Function not implemented
Hey Edson, Unfortunately I'm not sure what's going on here - for whatever reason, the kernel isn't allowing Java NIO to use epoll, and thus the IPC framework from Hadoop isn't working correctly. I don't think this is a hadoop specific bug. Does this issue occur on all of the nodes? -Todd On Mon, Mar 29, 2010 at 2:26 PM, Edson Ramiro erlfi...@gmail.com wrote: I'm not involved with Debian community :( ram...@h02:~/hadoop$ cat /proc/sys/fs/epoll/max_user_watches 3373957 and the Java is not the OpenSDK. The version is: ram...@lcpad:/usr/lib/jvm/java-6-sun$ java -version java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) Edson Ramiro On 29 March 2010 17:14, Todd Lipcon t...@cloudera.com wrote: Hi Edson, It looks like for some reason your kernel does not have epoll enabled. It's very strange, since your kernel is very recent (in fact, bleeding edge!) Can you check the contents of /proc/sys/fs/epoll/max_user_watches Are you involved with the Debian community? This sounds like a general Java bug. Can you also please verify that you're using the Sun JVM and not OpenJDK (the debian folks like OpenJDK but it has subtle issues with Hadoop) You'll have to add a non-free repository and install sun-java6-jdk -Todd On Mon, Mar 29, 2010 at 1:05 PM, Edson Ramiro erlfi...@gmail.com wrote: I'm using Linux h02 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux ram...@h02:~/hadoop$ cat /etc/debian_version squeeze/sid Thanks for reply Edson Ramiro On 29 March 2010 16:56, Todd Lipcon t...@cloudera.com wrote: Hi Edson, What operating system are you on? What kernel version? Thanks -Todd On Mon, Mar 29, 2010 at 12:01 PM, Edson Ramiro erlfi...@gmail.com wrote: Hi all, I'm trying to install Hadoop on a cluster, but I'm getting this error. I'm using java version 1.6.0_17 and hadoop-0.20.1+169.56.tar.gz from Cloudera. Its running in a NFS home shared between the nodes and masters. The NameNode works well, but all nodes try to connect and fail. Any Idea ? Thanks in Advance. == logs/hadoop-ramiro-datanode-a05.log == 2010-03-29 15:56:00,168 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 0 time(s). 2010-03-29 15:56:01,172 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 1 time(s). 2010-03-29 15:56:02,176 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 2 time(s). 2010-03-29 15:56:03,180 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 3 time(s). 2010-03-29 15:56:04,184 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 4 time(s). 2010-03-29 15:56:05,188 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 5 time(s). 2010-03-29 15:56:06,192 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 6 time(s). 2010-03-29 15:56:07,196 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 7 time(s). 2010-03-29 15:56:08,200 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 8 time(s). 2010-03-29 15:56:09,204 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: lcpad/192.168.1.51:9000. Already tried 9 time(s). 2010-03-29 15:56:09,204 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to lcpad/192.168.1.51:9000 failed on local exception: java.io.IOException: Function not implemented at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:278) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:225) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance
Re: Why must I wait for NameNode?
There's a bit of an issue if you have no data in your HDFS -- 0 blocks out of 0 is considered 100% reported, so NN leaves safe mode even if there are no DNs talking to it yet. For a fix, please see HDFS-528, included in Cloudera's CDH2. Thanks -Todd On Fri, Mar 19, 2010 at 10:29 AM, Bill Habermaas b...@habermaas.us wrote: At startup, the namenode goes into 'safe' mode to wait for all data nodes to send block reports on data they are holding. This is normal for hadoop and necessary to make sure all replicated data is accounted for across the cluster. It is the nature of the beast to work this way for good reasons. Bill -Original Message- From: Nick Klosterman [mailto:nklos...@ecn.purdue.edu] Sent: Friday, March 19, 2010 1:21 PM To: common-user@hadoop.apache.org Subject: Why must I wait for NameNode? What is the namemode doing upon startup? I have to wait about 1 minute and watch for the namenode dfs usage drop from 100% otherwise the install is unusable. Is this typical? Is something wrong with my install? I've been attempting the Pseudo distributed tutorial example for a while trying to get it to work. I finally discovered that the namenode upon start up is 100% in use and I need to wait about 1 minute before I can use it. Is this typical of hadoop installations? This isn't entirely clear in the tutorial. I believe that a note should be entered if this is typical. This error caused me to get WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: SOMEFILE could only be replicated to 0 nodes, instead of 1 I had written a script to do all of the steps right in a row. Now with a 1 minute wait things work. Is my install atypical or am I doing something wrong that is causing this needed wait time. Thanks, Nick -- Todd Lipcon Software Engineer, Cloudera
Re: Fair scheduler fairness question
On Wed, Mar 10, 2010 at 9:18 AM, Allen Wittenauer awittena...@linkedin.comwrote: On 3/10/10 9:14 AM, Neo Anderson javadeveloper...@yahoo.co.uk wrote: At the moment I use hadoop 0.20.2 and I can not find code that relates to 'preempt' function; however, I read the jira MAPREDUCE-551 saying preempt function is already been fixed at version 0.20.0. MR-551 says fixed in 0.21 at the top. Reading the text shows that patches are available if you want to patch your own build of 0.20. If you'd rather not patch your own build of Hadoop, the fair scheduler preemption feature is also available in CDH2: http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.tar.gz -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: can't start namenode
Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were deleted since the last checkpoint will have come back. But, the blocks will have been removed from the datanodes. So, the NN is complaining since there are some files that have missing blocks. That is to say, some of your files are corrupt (ie unreadable because the data is gone but the metadata is still there) In order to force it out of safemode, you can run hadoop dfsadmin -safemode leave You should also run hadoop fsck in order to determine which files are broken, and then probably use the -delete option to remove their metadata. Thanks -Todd On Thu, Mar 4, 2010 at 11:37 AM, mike anderson saidthero...@gmail.comwrote: Removing edits.new and starting worked, though it didn't seem that happy about it. It started up nonetheless, in safe mode. Saying that The ratio of reported blocks 0.9948 has not reached the threshold 0.9990. Safe mode will be turned off automatically. Unfortunately this is holding up the restart of hbase. About how long does it take to exit safe mode? is there anything I can do to expedite the process? On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: Sorry, I actually meant ls -l from name.dir/current/ Having only one dfs.name.dir isn't recommended - after you get your system back up and running I would strongly suggest running with at least two, preferably with one on a separate server via NFS. Thanks -Todd On Thu, Mar 4, 2010 at 9:05 AM, mike anderson saidthero...@gmail.com wrote: We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was your namenode configured with multiple dfs.name.dir settings? If so, can you please reply with ls -l from each dfs.name.dir? Thanks -Todd On Thu, Mar 4, 2010 at 8:57 AM, mike anderson saidthero...@gmail.com wrote: Our hadoop cluster went down last night when the namenode ran out of hard drive space. Trying to restart fails with this exception (see below). Since I don't really care that much about losing a days worth of data or so I'm fine with blowing away the edits file if that's what it takes (we don't have a secondary namenode to restore from). I tried removing the edits file from the namenode directory, but then it complained about not finding an edits file. I touched a blank edits file and I got the exact same exception. Any thoughts? I googled around a bit, but to no avail. -mike 2010-03-04 10:50:44,768 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=54310 2010-03-04 10:50:44,772 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: carr.projectlounge.com/10.0.16.91:54310 2010-03-04 http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04 10:50:44,773 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-03-04 10:50:44,774 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,816 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=pubget,pubget 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-03-04 10:50:44,817 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2010-03-04 10:50:44,823 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2010-03-04 10:50:44,825 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-03-04 10:50:44,849 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 2687 2010-03-04 10:50:45,092 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7 2010-03-04 10:50:45,095 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 347821 loaded in 0 seconds. 2010-03-04 10:50:45,104 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 39 loaded in 0 seconds. 2010-03-04 10:50:45,114 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException
Re: Sun JVM 1.6.0u18
On Thu, Feb 25, 2010 at 11:09 AM, Scott Carey sc...@richrelevance.comwrote: On Feb 15, 2010, at 9:54 PM, Todd Lipcon wrote: Hey all, Just a note that you should avoid upgrading your clusters to 1.6.0u18. We've seen a lot of segfaults or bus errors on the DN when running with this JVM - Stack found the ame thing on one of his clusters as well. Have you seen this for 32bit, 64 bit, or both? If 64 bit, was it with -XX:+UseCompressedOops? Just 64-bit, no compressed oops. But I haven't tested other variables. Any idea if there are Sun bugs open for the crashes? I opened one, yes. I think Stack opened a separate one. Haven't heard back. I have found some notes that suggest that -XX:-ReduceInitialCardMarks will work around some known crash problems with 6u18, but that may be unrelated. Yep, I think that is probably a likely workaround as well. For now I'm recommending downgrade to our clients, rather than introducing cryptic XX flags :) Lastly, I assume that Java 6u17 should work the same as 6u16, since it is a minor patch over 6u16 where 6u18 includes a new version of Hotspot. Can anyone confirm that? I haven't heard anything bad about u17 either. But since we know 16 to be very good and nothing important is new in 17, I like to recommend 16 still. -Todd
Re: java.net.SocketException: Network is unreachable
Hi Neo, See this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560044 as well as the discussion here: http://issues.apache.org/jira/browse/HADOOP-6056 Thanks -Todd On Wed, Feb 24, 2010 at 9:16 AM, neo anderson javadeveloper...@yahoo.co.uk wrote: While running example programe ('hadoop jar *example*jar pi 2 2'), I encounter 'Network is unreachable' problem (at $HADOOP_HOME/logs/userlogs/.../stderr), as below: Exception in thread main java.io.IOException: Call to /127.0.0.1:port failed on local exception: java.net.SocketException: Network is unreachable at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) at org.apache.hadoop.ipc.Client.call(Client.java:742) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) at org.apache.hadoop.mapred.Child.main(Child.java:64) Caused by: java.net.SocketException: Network is unreachable at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) at org.apache.hadoop.ipc.Client.getConnection(Client.java:859) at org.apache.hadoop.ipc.Client.call(Client.java:719) ... 6 more Initially, it seems to me that is firewall issue, but after disabling iptables the example programe still can not execute correctly. command for disabling iptables. iptables -P INPUT ACCEPT iptables -P FORWARD ACCEPT iptables -P OUTPUT ACCEPT iptables -X iptables -F When starting up hadoop cluster (start-dfs.sh and start-mapred.sh), it looks like the namenode was correctly started up because the log in name node contains information ... org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/111.222.333.5:10010 ... org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/111.222.333.4:10010 ... org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/111.222.333.3:10010 Also, in datanode ... INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /111.222.333.4:34539, dest: /111.222.333.5:50010, bytes: 4, op: HDFS_WRITE, ... INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /111.222.333.4:51610, dest: /111.222.333.3:50010, bytes: 118, op: HDFS_WRITE, cliID: ... ... The command 'hadoop fs -ls' can list the data uploaded to the hdfs without a problem. And jps shows the necessary processes are running. name node: 7710 SecondaryNameNode 7594 NameNode 8038 JobTracker data nodes: 3181 TaskTracker 3000 DataNode Environment: Debian squeeze, hadoop 0.20.1, jdk 1.6.x I search online and couldn't find the possible root cause. Is there any possibility that may cause such issue? Or any place that I may be able to check for more deatail information? Thanks for help. -- View this message in context: http://old.nabble.com/java.net.SocketException%3A-Network-is-unreachable-tp27714253p27714253.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: On CDH2, (Cloudera EC2) No valid local directories in property: mapred.local.dir
Hi Saptarshi, Can you please ssh into the JobTracker node and check that this directory is mounted, writable by the hadoop user, and not full? -Todd On Fri, Feb 19, 2010 at 2:13 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, Not sure if i should post this here or on Cloudera's message board, but here goes. When I run EC2 using the latest CDH2 and Hadoop 0.20 (by settiing the env variables are hadoop-ec2), and launch a job hadoop jar ... I get the following error 10/02/19 17:04:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:975) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:279) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:256) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:240) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3026) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy0.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:841) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.godhuli.f.RHMR.submitAndMonitorJob(RHMR.java:195) but the value of mapred.local.dir is /mnt/hadoop/mapred/local Any ideas?
Re: Namenode data loss
Hi Xiao, Are you sure that your secondary namenode was properly running before the dataloss event? Did you configure multiple dfs.name.dirs for your NN? -Todd P.S I moved this thread to common-user - probably a better place than -general. On Sun, Feb 21, 2010 at 8:20 PM, xiao yang yangxiao9...@gmail.com wrote: My namenode failed, and I can't find data back on this machine. I have look into the data directory in secondary namenode, but the name directory is empty. What should I do? Thanks! Xiao
Re: why not zookeeper for the namenode
On Fri, Feb 19, 2010 at 12:41 AM, Thomas Koch tho...@koch.ro wrote: Hi, yesterday I read the documentation of zookeeper and the zk contrib bookkeeper. From what I read, I thought, that bookkeeper would be the ideal enhancement for the namenode, to make it distributed and therefor finaly highly available. Now I searched, if work in that direction has already started and found out, that apparently a totaly different approach has been choosen: http://issues.apache.org/jira/browse/HADOOP-4539 Since I'm new to hadoop, I do trust in your decision. However I'd be glad, if somebody could satisfy my curiosity: I didn't work on that particular design, but I'll do my best to answer your questions below: - Why hasn't zookeeper(-bookkeeper) not been choosen? Especially since it seems to do a similiar job already in hbase. HBase does not use Bookkeeper, currently. Rather, it just uses ZK for election and some small amount of metadata tracking. It therefore is only storing a small amount of data in ZK, whereas the Hadoop NN would have to store many GB worth of namesystem data. I don't think anyone has tried putting such a large amount of data in ZK yet, and being the first to do something is never without problems :) Additionally, when this design was made, Bookkeeper was very new. It's still in development, as I understand it. - Isn't it, that with HADOOP-4539 client's can only connect to one namenode at a time, leaving the burden of all reads and writes on the one's shoulder? Yes. - Isn't it, that zookeeper would be more network efficient. It requires only a majority of nodes to receive a change, while HADOOP-4539 seems to require all backup nodes to receive a change before its persisted. Potentially. However, all backup nodes is usually just 1. In our experience, and the experience of most other Hadoop deployments I've spoken with, the primary factors decreasing NN availability are *not* system crashes, but rather lack of online upgrade capability, slow restart time for planned restarts, etc. Adding a hot standby can help with the planned upgrade situation, but two standbys doesn't give you much reliability above one. In a datacenter, the failure correlations are generally such that racks either fail independently, or the entire DC has lost power. So, there aren't a lot of cases where 3 NN replicas would buy you much over 2. -Todd Thanks for any explanation, Thomas Koch, http://www.koch.ro
Re: Hadoop Streaming File-not-found error on Cloudera's training VM
Are you passing the python script to the cluster using the -file option? eg -mapper foo.py -file foo.py Thanks -Todd On Wed, Feb 17, 2010 at 7:45 PM, Dan Starr dsta...@gmail.com wrote: Hi, I've tried posting this to Cloudera's community support site, but the community website getsatisfaction.com returns various server errors at the moment. I believe the following is an issue related to my environment within Cloudera's Training virtual machine. Despite having success running Hadoop streaming on other Hadoop clusters and on Cloudera's Training VM in local mode, I'm currently getting an error when attempting to run a simple Hadoop streaming job in the normal queue based mode on the Training VM. I'm thinking the error described below is an issue related to the worker node not recognizing the python reference in the script's top shebang line. The hadoop command I am executing is: hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-0.20.1+133-streaming.jar -mapper blah.py -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test_input/* -output output Where the test_input directory contains 3 UNIX formatted, single line files: training-vm: 3$ hadoop dfs -ls /user/training/test_input/ Found 3 items -rw-r--r-- 1 training supergroup 11 2010-02-17 10:48 /user/training/test_input/file1 -rw-r--r-- 1 training supergroup 11 2010-02-17 10:48 /user/training/test_input/file2 -rw-r--r-- 1 training supergroup 11 2010-02-17 10:48 /user/training/test_input/file3 training-vm: 3$ hadoop dfs -cat /user/training/test_input/* test_line1 test_line2 test_line3 And where blah.py looks like (UNIX formatted): #!/usr/bin/python import sys for line in sys.stdin: print line The resulting Hadoop-Streaming error is: java.io.IOException: Cannot run program blah.py: java.io.IOException: error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) ... I get the same error when placing the python script on the HDFS, and then using this in the hadoop command: ... -mapper hdfs:///user/training/blah.py ... One suggestion found online, which may not be relevant to Cloudera's distribution, mentions that the first line of the hadoop-streaming python script (the shebang line) may not describe an applicable path for the system. The solution mentioned is to use: ... -mapper python blah.py ... in the Hadoop streaming command. This doesn't seem to work correctly for me, since I find that the lines from the input data files are also parsed by the Python interpreter. But this does reveal that python is available on the worker node when using this technique. I have also tried without success the '-mapper blah.py' technique using shebang lines: #!/usr/bin/env python, although on the training VM Python is installed under /usr/bin/python. Maybe the issue is something else. Any suggestions or insights will be helpful.
Re: Reducer stuck at pending state
Hi Song, What version are you running? How much memory have you allocated to the reducers in mapred.child.java.opts? -Todd On Tue, Feb 16, 2010 at 4:01 PM, Song Liu lamfeeli...@gmail.com wrote: Sorry, seems no attachment is allowed, I paste it here: Jobid Priority User Name Map % Complete Map Total Maps Completed Reduce % Complete Reduce Total Reduces Completed Job Scheduling Information job_2... NORMAL sl9885 TF/IDF 100.00% 26 26 0.00% 1 0 NA job_2... NORMAL sl9885 Rank 100.00% 22 22 0.00% 1 0 NA job_2... NORMAL sl9885 TF/IDF 100.00% 20 20 0.00% 1 0 NA The format is horrible, sorry for that, but it's the best I can do :( BTW, I guess it should not be my program's problem, since I have tested it on some other clusters before. Regards Song Liu On Tue, Feb 16, 2010 at 11:51 PM, Song Liu lamfeeli...@gmail.com wrote: Hi all, I recently have me t a problem that sometimes, reducer hang up at pending state, with 0% complete. It seems all the mappers are completely done, and when it just about to start the reducer, the reducer stuck, without any given warnings and errors and was staying at the pending state. I have a cluster with 12 nodes. But this situation only appears when the scale of data is large (2GB or more), smaller cases never met this problem. Any one has met this issue before? I searched JIRA, some one proposed this issue before, but no solution was given. ( https://issues.apache.org/jira/browse/MAPREDUCE-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647230#action_12647230 ) The typical case of this issue is captured in the attachment. Regards Song Liu
Re: Problem with large .lzo files
On Mon, Feb 15, 2010 at 8:07 AM, Steve Kuo kuosen...@gmail.com wrote: On Sun, Feb 14, 2010 at 12:46 PM, Todd Lipcon t...@cloudera.com wrote: By the way, if all files have been indexed, DistributedLzoIndexer does not detect that and hadoop throws an exception complaining that the input dir (or file) does not exist. I work around this by catching the exception. Just fixed that in my github repo. Thanks for the bug report. - It's possible to sacrifice parallelism by having hadoop work on each .lzo file without indexing. This worked well until the file size exceeded 30G when array indexing exception got thrown. Apparently the code processed the file in chunks and stored the references to the chunk in an array. When the number of chunks was greater than a certain number (around 256 was my recollection), exception was thrown. - My current work around is to increase the number of reducers to keep the .lzo file size low. I would like to get advices on how people handle large .lzo files. Any pointers on the cause of the stack trace below and best way to resolve it are greatly appreciated. Is this reproducible every time? If so, is it always at the same point in the LZO file that it occurs? It's at the same point. Do you know how to print out the lzo index for the task? I only print out the input file now. You should be able to downcast the InputSplit to FileSplit, if you're using the new API. From there you can get the start and length of the split. Would it be possible to download that lzo file to your local box and use lzop -d to see if it decompresses successfully? That way we can isolate whether it's a compression bug or decompression. Bothe java LzoDecompressor and lzop -d were able to decompress the file correctly. As a matter of fact, my job does not index .lzo files now but process each as a whole and it works Interesting. If you can somehow make a reproducible test case I'd be happy to look into this. Thanks -Todd