Hadoop Quickstart (was Re: [ANNOUNCE] Hadoop release 0.18.3 available)

2009-01-29 Thread Amit k. Saha
Hi!

On Fri, Jan 30, 2009 at 12:05 PM, Anum Ali  wrote:
> Hi,
>
>
> Need some kind of guidance related to started with Hadoop Installation and
> system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8,
> should I start from a stable release of Hadoop or get it from svn developing
> version (from contribute site).

This might help you: http://hadoop.apache.org/core/docs/current/quickstart.html

Also, if you are looking to play around, either the stable release or
the SVN stuffs should be fine..

Best,
Amit
>
>
>
> Thank You
>
>
>
>
> On Thu, Jan 29, 2009 at 7:38 PM, Nigel Daley  wrote:
>
>> Release 0.18.3 fixes many critical bugs in 0.18.2.
>>
>> For Hadoop release details and downloads, visit:
>> http://hadoop.apache.org/core/releases.html
>>
>> Hadoop 0.18.3 Release Notes are at
>> http://hadoop.apache.org/core/docs/r0.18.3/releasenotes.html
>>
>> Thanks to all who contributed to this release!
>>
>> Nigel
>>
>



-- 
Amit Kumar Saha
http://amitksaha.blogspot.com
http://amitsaha.in.googlepages.com/
*Bangalore Open Java Users Group*:http:www.bojug.in


Re: Netbeans/Eclipse plugin

2009-01-26 Thread Amit k. Saha
On Tue, Jan 27, 2009 at 2:52 AM, Aaron Kimball  wrote:
> The Eclipse plugin (which, btw, is now part of Hadoop core in src/contrib/)
> currently is inoperable. The DFS viewer works, but the job submission code
> is broken.

I have started conversation with 3 other community members to work on
the NetBeans plugin. You can track the progress at
http://wiki.netbeans.org/Nbhadoop.

Best,
Amit


>
> - Aaron
>
> On Sun, Jan 25, 2009 at 9:07 PM, Amit k. Saha  wrote:
>
>> On Sun, Jan 25, 2009 at 9:32 PM, Edward Capriolo 
>> wrote:
>> > On Sun, Jan 25, 2009 at 10:57 AM, vinayak katkar 
>> wrote:
>> >> Any one knows Netbeans or Eclipse plugin for Hadoop Map -Reduce job. I
>> want
>> >> to make plugin for netbeans
>> >>
>> >> http://vinayakkatkar.wordpress.com
>> >> --
>> >> Vinayak Katkar
>> >> Sun Campus Ambassador
>> >> Sun Microsytems,India
>> >> COEP
>> >>
>> >
>> > There is an ecplipse plugin.
>> http://www.alphaworks.ibm.com/tech/mapreducetools
>> >
>> > Seems like some work is being done on netbeans
>> > https://nbhadoop.dev.java.net/
>>
>> I started this project. But well, its caught up in the requirements
>> gathering phase.
>>
>> @ Vinayak,
>>
>> Lets take this offline and discuss. What do you think?
>>
>>
>> Thanks,
>> Amit
>>
>> >
>> > The world needs more netbeans love.
>> >
>>
>> Definitely :-)
>>
>>
>> --
>> Amit Kumar Saha
>> http://amitksaha.blogspot.com
>> http://amitsaha.in.googlepages.com/
>> *Bangalore Open Java Users Group*:http:www.bojug.in
>>
>



-- 
Amit Kumar Saha
http://amitksaha.blogspot.com
http://amitsaha.in.googlepages.com/
*Bangalore Open Java Users Group*:http:www.bojug.in


Re: Netbeans/Eclipse plugin

2009-01-25 Thread Amit k. Saha
On Sun, Jan 25, 2009 at 9:32 PM, Edward Capriolo  wrote:
> On Sun, Jan 25, 2009 at 10:57 AM, vinayak katkar  
> wrote:
>> Any one knows Netbeans or Eclipse plugin for Hadoop Map -Reduce job. I want
>> to make plugin for netbeans
>>
>> http://vinayakkatkar.wordpress.com
>> --
>> Vinayak Katkar
>> Sun Campus Ambassador
>> Sun Microsytems,India
>> COEP
>>
>
> There is an ecplipse plugin. http://www.alphaworks.ibm.com/tech/mapreducetools
>
> Seems like some work is being done on netbeans
> https://nbhadoop.dev.java.net/

I started this project. But well, its caught up in the requirements
gathering phase.

@ Vinayak,

Lets take this offline and discuss. What do you think?


Thanks,
Amit

>
> The world needs more netbeans love.
>

Definitely :-)


-- 
Amit Kumar Saha
http://amitksaha.blogspot.com
http://amitsaha.in.googlepages.com/
*Bangalore Open Java Users Group*:http:www.bojug.in


Re: Why does Hadoop need ssh access to master and slaves?

2009-01-21 Thread Amit k. Saha
On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer
 wrote:
> Hi all,
>
> we've made our first steps in evaluating hadoop. The setup of 2 VMs as a
> hadoop grid was very easy and works fine.
>
> Now our operations team wonders why hadoop has to be able to connect to
> the master and slaves via password-less ssh?! Can anyone give us an
> answer to this question?

1. There has to be a way to connect to the remote hosts- slaves and a
secondary master, and SSH is the secure way to do it
2. It has to be password-less to enable automatic logins

-Amit


>
> Thanks & Regards
> Matthias
>



-- 
Amit Kumar Saha
http://amitksaha.blogspot.com
http://amitsaha.in.googlepages.com/
*Bangalore Open Java Users Group*:http:www.bojug.in


Re: Seeking Someone to Review Hadoop Article

2008-11-04 Thread Amit k. Saha
On Wed, Nov 5, 2008 at 3:17 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote:
> Done.  I also added a link to the article that Amit Kumar Saha wrote
> just a few weeks ago for linux.com.

Thanks you Tom :-)

-Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Seeking Someone to Review Hadoop Article

2008-11-02 Thread Amit k. Saha
On Mon, Nov 3, 2008 at 7:27 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote:
> The article I've written about Hadoop has just been published:
>
>   http://www.ociweb.com/jnb/jnbNov2008.html
>
> I'd like to again thank Mafish Liu and Amit Kumar Saha for reviewing
> my draft and offering suggestions for helping me improve it.  I hope
> the article is compelling, clear and technically accurate.  However,
> if you notice anything in need of correction, please contact me
> offlist and I will address it ASAP.

Nice article.

Thanks for the opportunity, Tom!

-Amit
-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: JNI Crash using hadoop

2008-10-25 Thread Amit k. Saha
Hi!


On Sat, Oct 25, 2008 at 6:48 PM, lamfeeling <[EMAIL PROTECTED]> wrote:
>
>
>
>  Dear all:
>I'm a new guy to hadoop, and I want to immigrate my existing project (by 
> C++ ) to Hadoop using JNI.
>All the feature seems OK, except one method.
>When it is invoked, hadoop gives me a error message, which is : bad_alloc.
>I googled this message, it tells me, this is a common problem when your 
> memory is used up, but my memory is not full yet.
>
>Are there some limitations of memory in Haoop? Especially when using JNI 
> methods?
>
>This program has been tested millions of times, so the problem should not 
> be in my C++ program.
>Could anyone give me a answer? Thanks a lot!!

Consider using 'Pipes':
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html

-Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Article on Apache Hadoop

2008-10-23 Thread Amit k. Saha
Hello,

I have just started exploring Hadoop purely out of hobbyistic reasons.
I love writing and hence have published a dummy style article on
Apache Hadoop titled: "Hands-on Hadoop for cluster computing". Its
available at http://www.linux.com/feature/150395

I am very thankful to all the folks on this list to get me rid of the
initial doubts.

Thanks,
Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: mysql in hadoop

2008-10-21 Thread Amit k. Saha
Hi Deepak,

On Mon, Oct 20, 2008 at 10:13 PM, Deepak Diwakar <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am sure someone must have tried mysql connection using hadoop. But I am
> getting problem.
> Basically I am not getting how to inlcude classpath of jar of jdbc connector
> in the run command of hadoop or is there any other way so that we can
> incorporate  jdbc connector jar into the main jar which we run using
> $hadoop-home/bin/hadoop?
>
> plz help me .
>
> Thanks in advance,

Just inquisitive: What application on Hadoop are you working on which
uses MySQL?

Thanks,
Amit
>



-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Need reboot the whole system if adding new datanodes?

2008-10-14 Thread Amit k. Saha
On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote:
> It seems that we need to restart the whole hadoop system in order to add new
> nodes inside the cluster. Any solution for us that no need for the
> rebooting?

>From what I know so far, you have to start the HDFS dameon (which
reads the 'slaves' file) to 'let it know' which are the data nodes. So
everytime you add a new DataNode, I believe you will have to restarted
the daemon, which is like re-initiating the NameNode.

Hope I am not very wrong :-)

Best,
Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Are There Books of Hadoop/Pig?

2008-10-14 Thread Amit k. Saha
On Wed, Oct 15, 2008 at 4:10 AM, Steve Gao <[EMAIL PROTECTED]> wrote:
> Does anybody know if there are books about hadoop or pig? The wiki and manual 
> are kind of ad-hoc and hard to comprehend, for example "I want to know how to 
> apply patchs to my Hadoop, but can't find how to do it" that kind of things.
>
> Would anybody help? Thanks.

http://oreilly.com/catalog/9780596521998/

HTH,
Amit
>
>
>
>



-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha



Use of 'dfs.replication'

2008-10-11 Thread Amit k. Saha
Hi!
What does the value of the property: "dfs.replication" determine?

Say, I have 3 nodes: Name node, Job Tracker and task tracker cum data
node. What should my "dfs.replication" be?

Thanks.
Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Typical Configuration of a task tracker

2008-10-11 Thread Amit k. Saha
On Sun, Oct 12, 2008 at 1:44 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote:
> On Sun, Oct 12, 2008 at 12:15 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote:
>> Hi!
>>
>> I am setting up a Hadoop cluster for 'domestic' purpose to play
>> around. I have 3 nodes- Namenode, Job tracker and a task tracker.
>> (10.10.10.1, 10.10.10.2, 10.10.10.3)
>>
>> My Namenode and Job tracker is set up successfully as I can view the
>> web administration panel.
>>
>> However, though my job tracker shows:
>>
>> 10.10.10.3: starting tasktracker, logging to
>> /home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out
>>
>> there is no task tracker process running on 10.10.10.3 and hence the
>> number of "Live Nodes" seem to be 0.
>>
>> I have kept the hadoop-site.xml file on my task tracker empty. I am
>> not sure what to fill in there.
>>
>> Is that the reason there is no task tracker process running?
>
> Well, i figured out that I need to fill in information regarding the
> "dfs.datanode.*" in hadoop-site.xml. So, here is the file:
>
> 
> 
>
> 
>
>
> 
>
> 
>
>   dfs.datanode.address
>
>  10.10.10.3:50090
>
>  
>
>  
>
>  dfs.datanode.http.address
>
>  10.10.10.3:50075
>
>  
>
>
>
>  
>dfs.replication
>1
>  
>
> 
>
> The log says: "2008-10-12 01:16:26,960 ERROR
> org.apache.hadoop.mapred.TaskTracker: Can not start task tracker
> because java.lang.RuntimeException: Not a host:port pair: local
>at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:768)
>at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:799)
>at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2266)
>
> 2008-10-12 01:16:26,970 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> "
>
> What is going wrong?
>
> Help appreciated!

I have solved the problem. Some observations in a later mail :-)

Best,
Amit
-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Typical Configuration of a task tracker

2008-10-11 Thread Amit k. Saha
On Sun, Oct 12, 2008 at 12:15 AM, Amit k. Saha <[EMAIL PROTECTED]> wrote:
> Hi!
>
> I am setting up a Hadoop cluster for 'domestic' purpose to play
> around. I have 3 nodes- Namenode, Job tracker and a task tracker.
> (10.10.10.1, 10.10.10.2, 10.10.10.3)
>
> My Namenode and Job tracker is set up successfully as I can view the
> web administration panel.
>
> However, though my job tracker shows:
>
> 10.10.10.3: starting tasktracker, logging to
> /home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out
>
> there is no task tracker process running on 10.10.10.3 and hence the
> number of "Live Nodes" seem to be 0.
>
> I have kept the hadoop-site.xml file on my task tracker empty. I am
> not sure what to fill in there.
>
> Is that the reason there is no task tracker process running?

Well, i figured out that I need to fill in information regarding the
"dfs.datanode.*" in hadoop-site.xml. So, here is the file:











   dfs.datanode.address

  10.10.10.3:50090

 

 

  dfs.datanode.http.address

  10.10.10.3:50075

 



  
dfs.replication
1
  



The log says: "2008-10-12 01:16:26,960 ERROR
org.apache.hadoop.mapred.TaskTracker: Can not start task tracker
because java.lang.RuntimeException: Not a host:port pair: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:768)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:799)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2266)

2008-10-12 01:16:26,970 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
"

What is going wrong?

Help appreciated!


Best Regards,
Amit
-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Typical Configuration of a task tracker

2008-10-11 Thread Amit k. Saha
Hi!

I am setting up a Hadoop cluster for 'domestic' purpose to play
around. I have 3 nodes- Namenode, Job tracker and a task tracker.
(10.10.10.1, 10.10.10.2, 10.10.10.3)

My Namenode and Job tracker is set up successfully as I can view the
web administration panel.

However, though my job tracker shows:

10.10.10.3: starting tasktracker, logging to
/home/amit/hadoop/hadoop-0.17.2.1/bin/../logs/hadoop-amit-tasktracker-lenny-2.out

there is no task tracker process running on 10.10.10.3 and hence the
number of "Live Nodes" seem to be 0.

I have kept the hadoop-site.xml file on my task tracker empty. I am
not sure what to fill in there.

Is that the reason there is no task tracker process running?

Thanks,
Amit
-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Re: Newbie doubt: Where are the files/directories?

2008-10-11 Thread Amit k. Saha
On Sat, Oct 11, 2008 at 3:14 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> data under Hadoop is stored as blocks and is not visible using normal
> Unix commands such as "ls" etc.  to see your files, use
>
> hadoop dfs -ls

Thanks, That does it!

>
> your files will actually be stored as follows:
>>
> Specify directories for dfs.name.dir and dfs.data.dir in
> conf/hadoop-site.xml. These are used to hold distributed filesystem
> data on the master node and slave nodes respectively. Note that
> dfs.data.dir may contain a space- or comma-separated list of directory
> names, so that data may be stored on multiple devices.

Thanks
Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha


Newbie doubt: Where are the files/directories?

2008-10-11 Thread Amit k. Saha
Hi!

I am just getting started with Hadoop in 'pseudo-distributed' mode. My
FS is formatted on /tmp/hadoop-amit/

I have started the daemons and have created a 'input' directory using
the DFS shell. Now my question is: where does it 'physically' live?

My initial guess was that it would be in /tmp/hadoop-amit/dfs/data/.
But I don't see it.

The web-based filesystem browser shows the following directories: tmp
and /user/amit/input.

Where do they physically live?

Thanks a ton.

Best,
Amit


-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha