Re: Hadoop install

2012-02-18 Thread Keith Wiley
I always use the Cloudera packages, CDH3 I think it's called...but it isn't the 
latest by any shot.  It's still .20.  I think Hadoop is nearly to .23 although 
I'm not proficient on those kinds of details.  I mentioned Cloudera's 
distribution because it falls into place pretty smoothly.  For example, a few 
weeks ago and downloaded and installed it on a Mac in a few hours (and ran an 
example) and then installed it on a Tier 3 Linux VM and had it running examples 
there too.

On Feb 18, 2012, at 06:24 , Mohit Anchlia wrote:

 What's the best way or guide to install latest hadoop. Is the latest Hadoop
 still .20 which comes up in google search. Could someone guide me with the
 latest hadoop distribution. I also need pig and mahout xmlinputformat.



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow.
   --  Keith Wiley




Re: Hadoop install

2012-02-18 Thread Mohit Anchlia
Thanks Do I have to do something special to get Mahout xmlinput format and
Pig with the new release of hadoop?

On Sat, Feb 18, 2012 at 6:42 AM, Tom Deutsch tdeut...@us.ibm.com wrote:

 Mohit - one place to start is here;

 http://hadoop.apache.org/common/releases.html#Download

 The release notes, as always, are well worth reading.

 
 Tom Deutsch
 Program Director
 Information Management
 Big Data Technologies
 IBM
 3565 Harbor Blvd
 Costa Mesa, CA 92626-1420
 tdeut...@us.ibm.com




 Mohit Anchlia mohitanch...@gmail.com
 02/18/2012 06:24 AM
 Please respond to
 common-user@hadoop.apache.org


 To
 common-user@hadoop.apache.org
 cc

 Subject
 Hadoop install






 What's the best way or guide to install latest hadoop. Is the latest
 Hadoop
 still .20 which comes up in google search. Could someone guide me with the
 latest hadoop distribution. I also need pig and mahout xmlinputformat.




Hi, I'm graduate student and I have one question, multiple hadoop install.

2011-03-01 Thread Sungho Jeon
Hi, I'm graduate student and my major is computer science, data mining.
Is that possible that install multiple hadoop in one node?


I mean, I want to install several hadoop that have different conf.
Specifically, one hadoop has 5 datanode and other hadoop has 10 datanode.


Of course I can control number of datanode by change conf and restart.
But, without changing conf, install multiple hadoop in one node is possible?

Thanks


Re: Hi, I'm graduate student and I have one question, multiple hadoop install.

2011-03-01 Thread Harsh J
Hello,

On Tue, Mar 1, 2011 at 5:49 PM, Sungho Jeon sdev...@gmail.com wrote:
 Hi, I'm graduate student and my major is computer science, data mining.
 Is that possible that install multiple hadoop in one node?

It is possible, but what would you gain from this? One DN can handle
several disks parallely just fine I think.

 I mean, I want to install several hadoop that have different conf.
 Specifically, one hadoop has 5 datanode and other hadoop has 10 datanode.

This is possible with a bunch of datanode configuration tweaks (Data
directories, ipc and http ports, logging directories, etc.).

 Of course I can control number of datanode by change conf and restart.
 But, without changing conf, install multiple hadoop in one node is possible?

Not possible without certain set of configuration changes per
instance. You may make use of a set of directories and using
`hadoop-daemon.sh --config conf_dir_for_this_instance start
datanode` to start each of the DN though.

-- 
Harsh J
www.harshj.com


Re: Hi, I'm graduate student and I have one question, multiple hadoop install.

2011-03-01 Thread Matthew Foley
Hi Sungho,
Here is a recipe for how to run multiple nodes on a single server, posted to 
this list on Sept. 15:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3c8a898c33-dc4e-418c-adc0-5689d434b...@yahoo-inc.com%3E

For v22 and later, the world has been split into three parts; where there was 
formerly HADOOP_HOME, there is now HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, and 
HADOOP_MAPRED_HOME, and in the default configuration each of them has its own 
conf/ subdirectory.  However, it is acceptable to pile all the contents of 
the three conf directories into a single conf directory somewhere else (the 
only name conflict is configuration.xsl which can be shared), set an 
environment variable $HADOOP_CONF_DIR to point to it, and pass that value in 
with the --config option whenever you launch processes with bin/hadoop or 
bin/hdfs.

Now, the above recipe assumes you want multiple nodes from ONE cluster running 
on a single server.  I suggest you start with that and get it working, so you 
understand the hdfs-site.xml file and how it is used.

You seem to be asking to run multiple CLUSTERS on a single server.  I believe 
the same mechanism will work (pointing different node invocations at 
different config directories), but you will need to make several more changes 
in the $HADOOP_CONF_DIR/hdfs-site.xml files, to create different namenode 
configurations as well as the different datanode configurations addressed in 
the recipe.  Please look at the documentation for which parameters to change.

A couple comments:
-  You probably can't run two namenodes simultaneously in the same server, 
unless it has a huge amount of memory and you don't care about performance.  
But you can have two different configurations stored, and run them at different 
times.
-  If the ONLY difference in the two clusters is the number of datanodes, you 
actually don't have to have different namenode configurations.  You can just 
configure 10 datanodes, and then sometimes run only 5 of them (clearing storage 
in between test runs, of course, so it doesn't look like you lost half your 
stored blocks!).  This is because namenodes have no configuration for which or 
how many datanodes to expect; namenodes simply accept registration from any 
datanode that initiates communication with it.
-  Your statement I can control number of datanode by change conf and restart 
is therefore not entirely correct.  Each datanode launched has to be pointed at 
its own config, but there is no place in the config to define how many 
datanodes to launch. (This is partly because running multiple nodes on a single 
server is not considered normal for a production environment, even though it is 
useful for a test environment.) You may be thinking of the slaves file, which 
is used by some launch scripts, but that is a tool to assist users in launching 
clusters, not part of namenode configuration, and is also not really oriented 
toward launching multiple nodes in a single server, if you read the scripts.

If you want launch scripts to help you locally launch different numbers of 
nodes with different configs, you'll have to write them yourself, but they're 
really easy.  They just consist of multiple lines that look like
$HADOOP_COMMON_HOME/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script 
$HADOOP_HDFS_HOME/bin/hdfs start datanode|namenode [args]
with different values of $HADOOP_CONF_DIR for each line.

The same lines with stop instead of start will give you a well-behaved kill 
script.
As always you have to start and stop each node with appropriate userId so they 
have read/write and i/o access permissions.

Hope this helps,
--Matt


On Mar 1, 2011, at 4:19 AM, Sungho Jeon wrote:

Hi, I'm graduate student and my major is computer science, data mining.
Is that possible that install multiple hadoop in one node?


I mean, I want to install several hadoop that have different conf.
Specifically, one hadoop has 5 datanode and other hadoop has 10 datanode.


Of course I can control number of datanode by change conf and restart.
But, without changing conf, install multiple hadoop in one node is possible?

Thanks