Hadoop Quickstart (was Re: [ANNOUNCE] Hadoop release 0.18.3 available)
Hi! On Fri, Jan 30, 2009 at 12:05 PM, Anum Ali wrote: > Hi, > > > Need some kind of guidance related to started with Hadoop Installation and > system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8, > should I start from a stable release of Hadoop or get it from svn developing > version (from contribute site). This might help you: http://hadoop.apache.org/core/docs/current/quickstart.html Also, if you are looking to play around, either the stable release or the SVN stuffs should be fine.. Best, Amit > > > > Thank You > > > > > On Thu, Jan 29, 2009 at 7:38 PM, Nigel Daley wrote: > >> Release 0.18.3 fixes many critical bugs in 0.18.2. >> >> For Hadoop release details and downloads, visit: >> http://hadoop.apache.org/core/releases.html >> >> Hadoop 0.18.3 Release Notes are at >> http://hadoop.apache.org/core/docs/r0.18.3/releasenotes.html >> >> Thanks to all who contributed to this release! >> >> Nigel >> > -- Amit Kumar Saha http://amitksaha.blogspot.com http://amitsaha.in.googlepages.com/ *Bangalore Open Java Users Group*:http:www.bojug.in
Re: Netbeans/Eclipse plugin
On Tue, Jan 27, 2009 at 2:52 AM, Aaron Kimball wrote: > The Eclipse plugin (which, btw, is now part of Hadoop core in src/contrib/) > currently is inoperable. The DFS viewer works, but the job submission code > is broken. I have started conversation with 3 other community members to work on the NetBeans plugin. You can track the progress at http://wiki.netbeans.org/Nbhadoop. Best, Amit > > - Aaron > > On Sun, Jan 25, 2009 at 9:07 PM, Amit k. Saha wrote: > >> On Sun, Jan 25, 2009 at 9:32 PM, Edward Capriolo >> wrote: >> > On Sun, Jan 25, 2009 at 10:57 AM, vinayak katkar >> wrote: >> >> Any one knows Netbeans or Eclipse plugin for Hadoop Map -Reduce job. I >> want >> >> to make plugin for netbeans >> >> >> >> http://vinayakkatkar.wordpress.com >> >> -- >> >> Vinayak Katkar >> >> Sun Campus Ambassador >> >> Sun Microsytems,India >> >> COEP >> >> >> > >> > There is an ecplipse plugin. >> http://www.alphaworks.ibm.com/tech/mapreducetools >> > >> > Seems like some work is being done on netbeans >> > https://nbhadoop.dev.java.net/ >> >> I started this project. But well, its caught up in the requirements >> gathering phase. >> >> @ Vinayak, >> >> Lets take this offline and discuss. What do you think? >> >> >> Thanks, >> Amit >> >> > >> > The world needs more netbeans love. >> > >> >> Definitely :-) >> >> >> -- >> Amit Kumar Saha >> http://amitksaha.blogspot.com >> http://amitsaha.in.googlepages.com/ >> *Bangalore Open Java Users Group*:http:www.bojug.in >> > -- Amit Kumar Saha http://amitksaha.blogspot.com http://amitsaha.in.googlepages.com/ *Bangalore Open Java Users Group*:http:www.bojug.in
Re: Netbeans/Eclipse plugin
On Sun, Jan 25, 2009 at 9:32 PM, Edward Capriolo wrote: > On Sun, Jan 25, 2009 at 10:57 AM, vinayak katkar > wrote: >> Any one knows Netbeans or Eclipse plugin for Hadoop Map -Reduce job. I want >> to make plugin for netbeans >> >> http://vinayakkatkar.wordpress.com >> -- >> Vinayak Katkar >> Sun Campus Ambassador >> Sun Microsytems,India >> COEP >> > > There is an ecplipse plugin. http://www.alphaworks.ibm.com/tech/mapreducetools > > Seems like some work is being done on netbeans > https://nbhadoop.dev.java.net/ I started this project. But well, its caught up in the requirements gathering phase. @ Vinayak, Lets take this offline and discuss. What do you think? Thanks, Amit > > The world needs more netbeans love. > Definitely :-) -- Amit Kumar Saha http://amitksaha.blogspot.com http://amitsaha.in.googlepages.com/ *Bangalore Open Java Users Group*:http:www.bojug.in
Re: Why does Hadoop need ssh access to master and slaves?
On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer wrote: > Hi all, > > we've made our first steps in evaluating hadoop. The setup of 2 VMs as a > hadoop grid was very easy and works fine. > > Now our operations team wonders why hadoop has to be able to connect to > the master and slaves via password-less ssh?! Can anyone give us an > answer to this question? 1. There has to be a way to connect to the remote hosts- slaves and a secondary master, and SSH is the secure way to do it 2. It has to be password-less to enable automatic logins -Amit > > Thanks & Regards > Matthias > -- Amit Kumar Saha http://amitksaha.blogspot.com http://amitsaha.in.googlepages.com/ *Bangalore Open Java Users Group*:http:www.bojug.in
Re: Seeking Someone to Review Hadoop Article
On Wed, Nov 5, 2008 at 3:17 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote: > Done. I also added a link to the article that Amit Kumar Saha wrote > just a few weeks ago for linux.com. Thanks you Tom :-) -Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Seeking Someone to Review Hadoop Article
On Mon, Nov 3, 2008 at 7:27 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote: > The article I've written about Hadoop has just been published: > > http://www.ociweb.com/jnb/jnbNov2008.html > > I'd like to again thank Mafish Liu and Amit Kumar Saha for reviewing > my draft and offering suggestions for helping me improve it. I hope > the article is compelling, clear and technically accurate. However, > if you notice anything in need of correction, please contact me > offlist and I will address it ASAP. Nice article. Thanks for the opportunity, Tom! -Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: JNI Crash using hadoop
Hi! On Sat, Oct 25, 2008 at 6:48 PM, lamfeeling <[EMAIL PROTECTED]> wrote: > > > > Dear all: >I'm a new guy to hadoop, and I want to immigrate my existing project (by > C++ ) to Hadoop using JNI. >All the feature seems OK, except one method. >When it is invoked, hadoop gives me a error message, which is : bad_alloc. >I googled this message, it tells me, this is a common problem when your > memory is used up, but my memory is not full yet. > >Are there some limitations of memory in Haoop? Especially when using JNI > methods? > >This program has been tested millions of times, so the problem should not > be in my C++ program. >Could anyone give me a answer? Thanks a lot!! Consider using 'Pipes': http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html -Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Article on Apache Hadoop
Hello, I have just started exploring Hadoop purely out of hobbyistic reasons. I love writing and hence have published a dummy style article on Apache Hadoop titled: "Hands-on Hadoop for cluster computing". Its available at http://www.linux.com/feature/150395 I am very thankful to all the folks on this list to get me rid of the initial doubts. Thanks, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: mysql in hadoop
Hi Deepak, On Mon, Oct 20, 2008 at 10:13 PM, Deepak Diwakar <[EMAIL PROTECTED]> wrote: > Hi all, > > I am sure someone must have tried mysql connection using hadoop. But I am > getting problem. > Basically I am not getting how to inlcude classpath of jar of jdbc connector > in the run command of hadoop or is there any other way so that we can > incorporate jdbc connector jar into the main jar which we run using > $hadoop-home/bin/hadoop? > > plz help me . > > Thanks in advance, Just inquisitive: What application on Hadoop are you working on which uses MySQL? Thanks, Amit > -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Need reboot the whole system if adding new datanodes?
On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote: > It seems that we need to restart the whole hadoop system in order to add new > nodes inside the cluster. Any solution for us that no need for the > rebooting? >From what I know so far, you have to start the HDFS dameon (which reads the 'slaves' file) to 'let it know' which are the data nodes. So everytime you add a new DataNode, I believe you will have to restarted the daemon, which is like re-initiating the NameNode. Hope I am not very wrong :-) Best, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Are There Books of Hadoop/Pig?
On Wed, Oct 15, 2008 at 4:10 AM, Steve Gao <[EMAIL PROTECTED]> wrote: > Does anybody know if there are books about hadoop or pig? The wiki and manual > are kind of ad-hoc and hard to comprehend, for example "I want to know how to > apply patchs to my Hadoop, but can't find how to do it" that kind of things. > > Would anybody help? Thanks. http://oreilly.com/catalog/9780596521998/ HTH, Amit > > > > -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Use of 'dfs.replication'
Hi! What does the value of the property: "dfs.replication" determine? Say, I have 3 nodes: Name node, Job Tracker and task tracker cum data node. What should my "dfs.replication" be? Thanks. Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Typical Configuration of a task tracker
On Sun, Oct 12, 2008 at 1:44 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote: > On Sun, Oct 12, 2008 at 12:15 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote: >> Hi! >> >> I am setting up a Hadoop cluster for 'domestic' purpose to play >> around. I have 3 nodes- Namenode, Job tracker and a task tracker. >> (10.10.10.1, 10.10.10.2, 10.10.10.3) >> >> My Namenode and Job tracker is set up successfully as I can view the >> web administration panel. >> >> However, though my job tracker shows: >> >> 10.10.10.3: starting tasktracker, logging to >> /home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out >> >> there is no task tracker process running on 10.10.10.3 and hence the >> number of "Live Nodes" seem to be 0. >> >> I have kept the hadoop-site.xml file on my task tracker empty. I am >> not sure what to fill in there. >> >> Is that the reason there is no task tracker process running? > > Well, i figured out that I need to fill in information regarding the > "dfs.datanode.*" in hadoop-site.xml. So, here is the file: > > > > > > > > > > > > dfs.datanode.address > > 10.10.10.3:50090 > > > > > > dfs.datanode.http.address > > 10.10.10.3:50075 > > > > > > >dfs.replication >1 > > > > > The log says: "2008-10-12 01:16:26,960 ERROR > org.apache.hadoop.mapred.TaskTracker: Can not start task tracker > because java.lang.RuntimeException: Not a host:port pair: local >at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121) >at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:768) >at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:799) >at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2266) > > 2008-10-12 01:16:26,970 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > " > > What is going wrong? > > Help appreciated! I have solved the problem. Some observations in a later mail :-) Best, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Typical Configuration of a task tracker
On Sun, Oct 12, 2008 at 12:15 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote: > Hi! > > I am setting up a Hadoop cluster for 'domestic' purpose to play > around. I have 3 nodes- Namenode, Job tracker and a task tracker. > (10.10.10.1, 10.10.10.2, 10.10.10.3) > > My Namenode and Job tracker is set up successfully as I can view the > web administration panel. > > However, though my job tracker shows: > > 10.10.10.3: starting tasktracker, logging to > /home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out > > there is no task tracker process running on 10.10.10.3 and hence the > number of "Live Nodes" seem to be 0. > > I have kept the hadoop-site.xml file on my task tracker empty. I am > not sure what to fill in there. > > Is that the reason there is no task tracker process running? Well, i figured out that I need to fill in information regarding the "dfs.datanode.*" in hadoop-site.xml. So, here is the file: dfs.datanode.address 10.10.10.3:50090 dfs.datanode.http.address 10.10.10.3:50075 dfs.replication 1 The log says: "2008-10-12 01:16:26,960 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.RuntimeException: Not a host:port pair: local at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121) at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:768) at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:799) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2266) 2008-10-12 01:16:26,970 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: " What is going wrong? Help appreciated! Best Regards, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Typical Configuration of a task tracker
Hi! I am setting up a Hadoop cluster for 'domestic' purpose to play around. I have 3 nodes- Namenode, Job tracker and a task tracker. (10.10.10.1, 10.10.10.2, 10.10.10.3) My Namenode and Job tracker is set up successfully as I can view the web administration panel. However, though my job tracker shows: 10.10.10.3: starting tasktracker, logging to /home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out there is no task tracker process running on 10.10.10.3 and hence the number of "Live Nodes" seem to be 0. I have kept the hadoop-site.xml file on my task tracker empty. I am not sure what to fill in there. Is that the reason there is no task tracker process running? Thanks, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Newbie doubt: Where are the files/directories?
On Sat, Oct 11, 2008 at 3:14 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > data under Hadoop is stored as blocks and is not visible using normal > Unix commands such as "ls" etc. to see your files, use > > hadoop dfs -ls Thanks, That does it! > > your files will actually be stored as follows: >> > Specify directories for dfs.name.dir and dfs.data.dir in > conf/hadoop-site.xml. These are used to hold distributed filesystem > data on the master node and slave nodes respectively. Note that > dfs.data.dir may contain a space- or comma-separated list of directory > names, so that data may be stored on multiple devices. Thanks Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Newbie doubt: Where are the files/directories?
Hi! I am just getting started with Hadoop in 'pseudo-distributed' mode. My FS is formatted on /tmp/hadoop-amit/ I have started the daemons and have created a 'input' directory using the DFS shell. Now my question is: where does it 'physically' live? My initial guess was that it would be in /tmp/hadoop-amit/dfs/data/. But I don't see it. The web-based filesystem browser shows the following directories: tmp and /user/amit/input. Where do they physically live? Thanks a ton. Best, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha