Re: Map-Reduce Slow Down

2009-04-17 Thread jason hadoop
Assuming you are on a linux box, on both machines
verify that the servers are listening on the ports you expect via
netstat -a -n -t -p
-a show sockets accepting connections
-n do not translate ip addresses to host names
-t only list tcp sockets
-p list the pid/process name

on the machine 192.168.0.18
you should have sockets bound to 0.0.0.0:54310 with a process of java, and
the pid should be the pid of your namenode process.

On the remote machine you should be able to *telnet 192.168.0.18 54310* and
have it connect
*Connected to 192.168.0.18.
Escape character is '^]'.
*

If the netstat shows the socket accepting and the telnet does not connect,
then something is blocking the TCP packets between the machines. one or both
machines has a firewall, an intervening router has a firewall, or there is
some routing problem
the command /sbin/iptables -L will normally list the firewall rules, if any
for a linux machine.


You should be able to use telnet to verify that you can connect from the
remote machine.

On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra mnage...@asu.edu wrote:

 Thanks! I ll see what I can find out.

 On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop jason.had...@gmail.com
 wrote:

  The firewall was run at system startup, I think there was a
  /etc/sysconfig/iptables file present which triggered the firewall.
  I don't currently have access to any centos 5 machines so I can't easily
  check.
 
 
 
  On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop jason.had...@gmail.com
  wrote:
 
   The kickstart script was something that the operations staff was using
 to
   initialize new machines, I never actually saw the script, just figured
  out
   that there was a firewall in place.
  
  
  
   On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra mnage...@asu.edu
  wrote:
  
   Jason: the kickstart script - was it something you wrote or is it run
  when
   the system turns on?
   Mithila
  
   On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra mnage...@asu.edu
   wrote:
  
Thanks Jason! Will check that out.
Mithila
   
   
On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop 
 jason.had...@gmail.com
   wrote:
   
Double check that there is no firewall in place.
At one point a bunch of new machines were kickstarted and placed in
 a
cluster and they all failed with something similar.
It turned out the kickstart script turned enabled the firewall with
 a
   rule
that blocked ports in the 50k range.
It took us a while to even think to check that was not a part of
 our
normal
machine configuration
   
On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
 mnage...@asu.edu
  
wrote:
   
 Hi Aaron
 I will look into that thanks!

 I spoke to the admin who overlooks the cluster. He said that the
   gateway
 comes in to the picture only when one of the nodes communicates
  with
   a
node
 outside of the cluster. But in my case the communication is
 carried
   out
 between the nodes which all belong to the same cluster.

 Mithila

 On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
 aa...@cloudera.com
  
wrote:

  Hi,
 
  I wrote a blog post a while back about connecting nodes via a
   gateway.
 See
 

   
  
 
 http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
 
  This assumes that the client is outside the gateway and all
  datanodes/namenode are inside, but the same principles apply.
   You'll
just
  need to set up ssh tunnels from every datanode to the namenode.
 
  - Aaron
 
 
  On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari 
rphul...@yahoo-inc.com
 wrote:
 
  Looks like your NameNode is down .
  Verify if hadoop process are running (   jps should show you
 all
   java
  running process).
  If your hadoop process are running try restarting your hadoop
   process
.
  I guess this problem is due to your fsimage not being correct
 .
  You might have to format your namenode.
  Hope this helps.
 
  Thanks,
  --
  Ravi
 
 
  On 4/15/09 10:15 AM, Mithila Nagendra mnage...@asu.edu
  wrote:
 
  The log file runs into thousands of line with the same message
   being
  displayed every time.
 
  On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra 
   mnage...@asu.edu
  wrote:
 
   The log file : hadoop-mithila-datanode-node19.log.2009-04-14
  has
the
   following in it:
  
   2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
  STARTUP_MSG:
  
 /
   STARTUP_MSG: Starting DataNode
   STARTUP_MSG:   host = node19/127.0.0.1
   STARTUP_MSG:   args = []
   STARTUP_MSG:   version = 0.18.3
   STARTUP_MSG:   build =
  
   https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
   736250; compiled by 'ndaley' on 

If I make two map reduce, can I don't save the medial output?

2009-04-17 Thread 王红宝
as the tittle.


Thank You!
imcaptor


Problem with using differnt username

2009-04-17 Thread Puri, Aseem
Hi

I am running Hadoop Cluster on windows. I have 4 datnodes. 3
data node have same username so they always start. But one datanode have
different username. When I run command $bin/start-all.sh master tries to
find $bin/Hadoop-demon.sh giving master username instead of username in
which file is there. Please where should I make change so master find
file on the different user name.

 

Thanks  Regards

Aseem Puri

Project Trainee

Honeywell Technology Solutions Lab

Bangalore

 



Re: Question about the classpath setting for bin/hadoop jar

2009-04-17 Thread Sharad Agarwal

 I noticed that the bin/hadoop jar command doesn't add the jar being 
 executed to the classpath. Is this deliberate and what is the reasoning? The 
 result is that resources in the jar are not accessible from the system class 
 loader. Rather they are only available from the thread context class loader 
 and the class loader of the main class.
In map and reduce tasks' jvm,  job libraries are added to the system 
classloader. However for others only framework code is present in system 
classloader. If you are seeing this as a problem in your client side code, you 
can use Configuration#getClassByName(String name) instead of Class.forName() 
for loading your job related classes.


Re: Map-Reduce Slow Down

2009-04-17 Thread Mithila Nagendra
Thanks Jason! This helps a lot. I m planning to talk to my network admin
tomorrow. I hoping he ll be able to fix this problem.
Mithila

On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop jason.had...@gmail.comwrote:

 Assuming you are on a linux box, on both machines
 verify that the servers are listening on the ports you expect via
 netstat -a -n -t -p
 -a show sockets accepting connections
 -n do not translate ip addresses to host names
 -t only list tcp sockets
 -p list the pid/process name

 on the machine 192.168.0.18
 you should have sockets bound to 0.0.0.0:54310 with a process of java, and
 the pid should be the pid of your namenode process.

 On the remote machine you should be able to *telnet 192.168.0.18 54310* and
 have it connect
 *Connected to 192.168.0.18.
 Escape character is '^]'.
 *

 If the netstat shows the socket accepting and the telnet does not connect,
 then something is blocking the TCP packets between the machines. one or
 both
 machines has a firewall, an intervening router has a firewall, or there is
 some routing problem
 the command /sbin/iptables -L will normally list the firewall rules, if any
 for a linux machine.


 You should be able to use telnet to verify that you can connect from the
 remote machine.

 On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra mnage...@asu.edu
 wrote:

  Thanks! I ll see what I can find out.
 
  On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop jason.had...@gmail.com
  wrote:
 
   The firewall was run at system startup, I think there was a
   /etc/sysconfig/iptables file present which triggered the firewall.
   I don't currently have access to any centos 5 machines so I can't
 easily
   check.
  
  
  
   On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop jason.had...@gmail.com
   wrote:
  
The kickstart script was something that the operations staff was
 using
  to
initialize new machines, I never actually saw the script, just
 figured
   out
that there was a firewall in place.
   
   
   
On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra mnage...@asu.edu
   wrote:
   
Jason: the kickstart script - was it something you wrote or is it
 run
   when
the system turns on?
Mithila
   
On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra mnage...@asu.edu
 
wrote:
   
 Thanks Jason! Will check that out.
 Mithila


 On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop 
  jason.had...@gmail.com
wrote:

 Double check that there is no firewall in place.
 At one point a bunch of new machines were kickstarted and placed
 in
  a
 cluster and they all failed with something similar.
 It turned out the kickstart script turned enabled the firewall
 with
  a
rule
 that blocked ports in the 50k range.
 It took us a while to even think to check that was not a part of
  our
 normal
 machine configuration

 On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
  mnage...@asu.edu
   
 wrote:

  Hi Aaron
  I will look into that thanks!
 
  I spoke to the admin who overlooks the cluster. He said that
 the
gateway
  comes in to the picture only when one of the nodes communicates
   with
a
 node
  outside of the cluster. But in my case the communication is
  carried
out
  between the nodes which all belong to the same cluster.
 
  Mithila
 
  On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
  aa...@cloudera.com
   
 wrote:
 
   Hi,
  
   I wrote a blog post a while back about connecting nodes via a
gateway.
  See
  
 

   
  
 
 http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
  
   This assumes that the client is outside the gateway and all
   datanodes/namenode are inside, but the same principles apply.
You'll
 just
   need to set up ssh tunnels from every datanode to the
 namenode.
  
   - Aaron
  
  
   On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari 
 rphul...@yahoo-inc.com
  wrote:
  
   Looks like your NameNode is down .
   Verify if hadoop process are running (   jps should show you
  all
java
   running process).
   If your hadoop process are running try restarting your
 hadoop
process
 .
   I guess this problem is due to your fsimage not being
 correct
  .
   You might have to format your namenode.
   Hope this helps.
  
   Thanks,
   --
   Ravi
  
  
   On 4/15/09 10:15 AM, Mithila Nagendra mnage...@asu.edu
   wrote:
  
   The log file runs into thousands of line with the same
 message
being
   displayed every time.
  
   On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra 
mnage...@asu.edu
   wrote:
  
The log file :
 hadoop-mithila-datanode-node19.log.2009-04-14
   has
 the
following in it:
   
2009-04-14 10:08:11,499 INFO
 

Re: Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-17 Thread Sharad Agarwal


 The last map task is forrever in the pending queue - is this is issue my
 setup/config or do others have the problem?
Do you mean the left over maps are not at all scheduled ? What do you see in 
jobtracker logs ?


Re: Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-17 Thread Jothi Padmanabhan


On 4/17/09 12:26 PM, Sharad Agarwal shara...@yahoo-inc.com wrote:

 
 
 The last map task is forrever in the pending queue - is this is issue my
 setup/config or do others have the problem?
 Do you mean the left over maps are not at all scheduled ? What do you see in
 jobtracker logs ?

Also in the JT UI, please check on how many maps are marked as running, when
this map is still pending?



Re: If I make two map reduce, can I don't save the medial output?

2009-04-17 Thread Shengkai Zhu
More detail description?

On Fri, Apr 17, 2009 at 2:21 PM, 王红宝 imcap...@gmail.com wrote:

 as the tittle.


 Thank You!
 imcaptor




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: Datanode Setup

2009-04-17 Thread jpe30

ok, I have my hosts file setup the way you told me, I changed my replication
factor to 1.  The thing that I don't get is this line from the datanodes...

STARTUP_MSG:   host = java.net.UnknownHostException: myhost: myhost

If I have my hadoop-site.xml setup correctly, with the correct address it
should work right?  It seems like the datanodes aren't getting an IP address
to use, and I'm not sure why.


jpe30 wrote:
 
 That helps a lot actually.  I will try setting up my hosts file tomorrow
 and make the other changes you suggested.
 
 Thanks!
 
 
 
 Mithila Nagendra wrote:
 
 Hi,
 The replication factor has to be set to 1. Also for you dfs and job
 tracker
 configuration you should insert the name of the node rather than the i.p
 address.
 
 For instance:
  value192.168.1.10:54310/value
 
 can be:
 
  valuemaster:54310/value
 
 The nodes can be renamed by renaming them in the hosts files in /etc
 folder.
 It should look like the following:
 
 # Do not remove the following line, or various programs
 # that require network functionality will fail.
 127.0.0.1   localhost.localdomain   localhost   node01
 192.168.0.1 node01
 192.168.0.2 node02
 192.168.0.3 node03
 
 Hope this helps
 Mithila
 
 On Wed, Apr 15, 2009 at 9:40 PM, jpe30 jpotte...@gmail.com wrote:
 

 I'm setting up a Hadoop cluster and I have the name node and job tracker
 up
 and running.  However, I cannot get any of my datanodes or tasktrackers
 to
 start.  Here is my hadoop-site.xml file...



 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration

 property
  namehadoop.tmp.dir/name
  value/home/hadoop/h_temp/value
  descriptionA base for other temporary directories./description
 /property

 property
  namedfs.data.dir/name
  value/home/hadoop/data/value
 /property

 property
  namefs.default.name/name
   value192.168.1.10:54310/value
  descriptionThe name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem./description
  finaltrue/final
 /property

 property
  namemapred.job.tracker/name
   value192.168.1.10:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
  and reduce task.
   /description
 /property

 property
  namedfs.replication/name
  value0/value
   descriptionDefault block replication.
   The actual number of replications can be specified when the file is
 created.
  The default is used if replication is not specified in create time.
   /description
 /property

 /configuration


 and here is the error I'm getting...




 2009-04-15 14:00:48,208 INFO org.apache.hadoop.dfs.DataNode:
 STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = java.net.UnknownHostException: myhost: myhost
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.18.3
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
 736250;
 compiled by 'ndaley' on Thu Jan 22 23:12:0$
 /
 2009-04-15 14:00:48,355 ERROR org.apache.hadoop.dfs.DataNode:
 java.net.UnknownHostException: myhost: myhost
at java.net.InetAddress.getLocalHost(InetAddress.java:1353)
at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:185)
at
 org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:249)
 at org.apache.hadoop.dfs.DataNode.init(DataNode.java:223)
 at
 org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3071)
at
 org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:3026)
at
 org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:3034)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3156)

 2009-04-15 14:00:48,356 INFO org.apache.hadoop.dfs.DataNode:
 SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException:
 myhost: myhost
 /

 --
 View this message in context:
 http://www.nabble.com/Datanode-Setup-tp23064660p23064660.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Datanode-Setup-tp23064660p23100910.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Problem with using differnt username

2009-04-17 Thread Alex Loddengaard
I don't think you can tell start-all.sh to log in as a different user on
certain nodes.  Why not just create the same user on the fourth node?

An alternative would be to start the fourth node manually via
hadoop-daemon.sh script.  Here's an example:

bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start tasktracker

Those commands should be run on the fourth node.  This allows you to bypass
the SSH step that start-all.sh does.

Alex

On Thu, Apr 16, 2009 at 11:24 PM, Puri, Aseem aseem.p...@honeywell.comwrote:

 Hi

I am running Hadoop Cluster on windows. I have 4 datnodes. 3
 data node have same username so they always start. But one datanode have
 different username. When I run command $bin/start-all.sh master tries to
 find $bin/Hadoop-demon.sh giving master username instead of username in
 which file is there. Please where should I make change so master find
 file on the different user name.



 Thanks  Regards

 Aseem Puri

 Project Trainee

 Honeywell Technology Solutions Lab

 Bangalore






Re: Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-17 Thread Saptarshi Guha
I forgot to examine logs, I will next time it happens.
Thank you
BW, no maps are running. Only few are pending and the rest are complete.
Saptarshi Guha


On Fri, Apr 17, 2009 at 3:06 AM, Jothi Padmanabhan joth...@yahoo-inc.comwrote:



 On 4/17/09 12:26 PM, Sharad Agarwal shara...@yahoo-inc.com wrote:

 
 
  The last map task is forrever in the pending queue - is this is issue my
  setup/config or do others have the problem?
  Do you mean the left over maps are not at all scheduled ? What do you see
 in
  jobtracker logs ?

 Also in the JT UI, please check on how many maps are marked as running,
 when
 this map is still pending?




Ec2 instability

2009-04-17 Thread Rakhi Khatwani
Hi,
Its been several days since we have been trying to stabilize
hadoop/hbase on ec2 cluster. but failed to do so.
We still come across frequent region server fails, scanner timeout
exceptions and OS level deadlocks etc...

and 2day while doing a list of tables on hbase i get the following
exception:

hbase(main):001:0 list
09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 0 time(s).
09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 1 time(s).
09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 2 time(s).
09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
available yet, Z...
09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
not be reached after 1 tries, giving up.
09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 0 time(s).
09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 1 time(s).
09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 2 time(s).
09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
available yet, Z...
09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
not be reached after 1 tries, giving up.
09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 0 time(s).
09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 1 time(s).
09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 2 time(s).
09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
available yet, Z...
09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
not be reached after 1 tries, giving up.
09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 0 time(s).
09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 1 time(s).
09/04/17 13:57:31 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 2 time(s).
09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
available yet, Z...
09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
not be reached after 1 tries, giving up.
09/04/17 13:57:34 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 0 time(s).
09/04/17 13:57:35 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 1 time(s).
09/04/17 13:57:36 INFO ipc.HBaseClass: Retrying connect to server: /
10.254.234.32:60020. Already tried 2 time(s).
09/04/17 13:57:36 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
available yet, Z...

but if i check on the UI, hbase master is still on, (tried refreshing it
several times).


and i have been getting a lot of exceptions from time to time including
region servers going down (which happens very frequently due to which there
is heavy data loss... that too on production data), scanner timeout
exceptions, cannot allocate memory exceptions etc.

I am working on amazon ec2 Large cluster with 6 nodes...
with each node having the hardware configuration as follows:

   - Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores
   with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit
   platform


I am using hadoop-0.19.0 and hbase 0.19.0 (resynced to all the nodes and
made sure that there is a symbolic link to hadoop-site from hbase/conf)

Following is my configuration on hadoop-site.xml
configuration

property
  namehadoop.tmp.dir/name
  value/mnt/hadoop/value
/property

property
  namefs.default.name/name
  valuehdfs://domU-12-31-39-00-E5-D2.compute-1.internal:50001/value
/property

property
  namemapred.job.tracker/name
  valuedomU-12-31-39-00-E5-D2.compute-1.internal:50002/value
/property

property
  nametasktracker.http.threads/name
  value80/value
/property

property
  namemapred.tasktracker.map.tasks.maximum/name
  value3/value
/property

property
  namemapred.tasktracker.reduce.tasks.maximum/name
  value3/value
/property

property
  namemapred.output.compress/name
  valuetrue/value
/property

property
  namemapred.output.compression.type/name
  valueBLOCK/value
/property

property
  namedfs.client.block.write.retries/name
  value3/value
/property

property
namemapred.child.java.opts/name
value-Xmx4096m/value
/property

Given it a high value since the RAM on each node is 7GB... not sure of this
setting though
**i got Cannot Allocate Memory Exception after making this setting. (got it
for the first time)
after going through the archives, someone suggested enabling the overcommit
memorynot sure of it though **

property

Re: Ec2 instability

2009-04-17 Thread Rakhi Khatwani
Hi,
 this is the exception i have been getting @ the mapreduce

java.io.IOException: Cannot run program bash: java.io.IOException:
error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at 
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1199)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
allocate memory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 10 more



On Fri, Apr 17, 2009 at 10:09 PM, Rakhi Khatwani
rakhi.khatw...@gmail.comwrote:

 Hi,
 Its been several days since we have been trying to stabilize
 hadoop/hbase on ec2 cluster. but failed to do so.
 We still come across frequent region server fails, scanner timeout
 exceptions and OS level deadlocks etc...

 and 2day while doing a list of tables on hbase i get the following
 exception:

 hbase(main):001:0 list
 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
 available yet, Z...
 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
 available yet, Z...
 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
 available yet, Z...
 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:31 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
 available yet, Z...
 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:34 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:35 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:36 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:36 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020 not
 available yet, Z...

 but if i check on the UI, hbase master is still on, (tried refreshing it
 several times).


 and i have been getting a lot of exceptions from time to time including
 region servers going down (which happens very frequently due to which there
 is heavy data loss... that too on production data), scanner timeout
 exceptions, cannot allocate memory exceptions etc.

 I am working on amazon ec2 Large cluster with 6 nodes...
 with each node having the hardware configuration as follows:

- Large Instance 7.5 GB of memory, 4 EC2 

Re: Datanode Setup

2009-04-17 Thread Mithila Nagendra
You have to make sure that you can ssh between the nodes. Also check the
file hosts in /etc folder. Both the master and the slave much have each
others machines defined in it. Refer to my previous mail
Mithila

On Fri, Apr 17, 2009 at 7:18 PM, jpe30 jpotte...@gmail.com wrote:


 ok, I have my hosts file setup the way you told me, I changed my
 replication
 factor to 1.  The thing that I don't get is this line from the datanodes...

 STARTUP_MSG:   host = java.net.UnknownHostException: myhost: myhost

 If I have my hadoop-site.xml setup correctly, with the correct address it
 should work right?  It seems like the datanodes aren't getting an IP
 address
 to use, and I'm not sure why.


 jpe30 wrote:
 
  That helps a lot actually.  I will try setting up my hosts file tomorrow
  and make the other changes you suggested.
 
  Thanks!
 
 
 
  Mithila Nagendra wrote:
 
  Hi,
  The replication factor has to be set to 1. Also for you dfs and job
  tracker
  configuration you should insert the name of the node rather than the i.p
  address.
 
  For instance:
   value192.168.1.10:54310/value
 
  can be:
 
   valuemaster:54310/value
 
  The nodes can be renamed by renaming them in the hosts files in /etc
  folder.
  It should look like the following:
 
  # Do not remove the following line, or various programs
  # that require network functionality will fail.
  127.0.0.1   localhost.localdomain   localhost   node01
  192.168.0.1 node01
  192.168.0.2 node02
  192.168.0.3 node03
 
  Hope this helps
  Mithila
 
  On Wed, Apr 15, 2009 at 9:40 PM, jpe30 jpotte...@gmail.com wrote:
 
 
  I'm setting up a Hadoop cluster and I have the name node and job
 tracker
  up
  and running.  However, I cannot get any of my datanodes or tasktrackers
  to
  start.  Here is my hadoop-site.xml file...
 
 
 
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
 
  property
   namehadoop.tmp.dir/name
   value/home/hadoop/h_temp/value
   descriptionA base for other temporary directories./description
  /property
 
  property
   namedfs.data.dir/name
   value/home/hadoop/data/value
  /property
 
  property
   namefs.default.name/name
value192.168.1.10:54310/value
   descriptionThe name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem./description
   finaltrue/final
  /property
 
  property
   namemapred.job.tracker/name
value192.168.1.10:54311/value
   descriptionThe host and port that the MapReduce job tracker runs
at.  If local, then jobs are run in-process as a single map
   and reduce task.
/description
  /property
 
  property
   namedfs.replication/name
   value0/value
descriptionDefault block replication.
The actual number of replications can be specified when the file is
  created.
   The default is used if replication is not specified in create time.
/description
  /property
 
  /configuration
 
 
  and here is the error I'm getting...
 
 
 
 
  2009-04-15 14:00:48,208 INFO org.apache.hadoop.dfs.DataNode:
  STARTUP_MSG:
  /
  STARTUP_MSG: Starting DataNode
  STARTUP_MSG:   host = java.net.UnknownHostException: myhost: myhost
  STARTUP_MSG:   args = []
  STARTUP_MSG:   version = 0.18.3
  STARTUP_MSG:   build =
  https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r
  736250;
  compiled by 'ndaley' on Thu Jan 22 23:12:0$
  /
  2009-04-15 14:00:48,355 ERROR org.apache.hadoop.dfs.DataNode:
  java.net.UnknownHostException: myhost: myhost
 at java.net.InetAddress.getLocalHost(InetAddress.java:1353)
 at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:185)
 at
  org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:249)
  at org.apache.hadoop.dfs.DataNode.init(DataNode.java:223)
  at
  org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3071)
 at
  org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:3026)
 at
  org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:3034)
 at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3156)
 
  2009-04-15 14:00:48,356 INFO org.apache.hadoop.dfs.DataNode:
  SHUTDOWN_MSG:
  /
  SHUTDOWN_MSG: Shutting down DataNode at java.net.UnknownHostException:
  myhost: myhost
  /
 
  --
  View this message in context:
  http://www.nabble.com/Datanode-Setup-tp23064660p23064660.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 
 
 

 --
 View 

RE: Ec2 instability

2009-04-17 Thread Ted Coyle
Rakhi,
I'd suggest going to 0.19.1.  hbase and hadoop.

We had so many problems with .0.19.0 on EC2 that we couldn't use it.
Having problems with name resolution and generic startup scripts with
.0.19.1 release but not a show stopper.

Ted


-Original Message-
From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com] 
Sent: Friday, April 17, 2009 12:45 PM
To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
Subject: Re: Ec2 instability

Hi,
 this is the exception i have been getting @ the mapreduce

java.io.IOException: Cannot run program bash: java.io.IOException:
error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
orWrite(LocalDirAllocator.java:321)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
cator.java:124)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFi
le.java:61)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java
:1199)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
allocate memory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 10 more



On Fri, Apr 17, 2009 at 10:09 PM, Rakhi Khatwani
rakhi.khatw...@gmail.comwrote:

 Hi,
 Its been several days since we have been trying to stabilize
 hadoop/hbase on ec2 cluster. but failed to do so.
 We still come across frequent region server fails, scanner timeout
 exceptions and OS level deadlocks etc...

 and 2day while doing a list of tables on hbase i get the following
 exception:

 hbase(main):001:0 list
 09/04/17 13:57:18 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:19 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:20 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
not
 available yet, Z...
 09/04/17 13:57:20 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:21 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:22 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:23 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
not
 available yet, Z...
 09/04/17 13:57:23 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:26 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:27 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:28 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
not
 available yet, Z...
 09/04/17 13:57:28 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:29 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:30 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:31 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
not
 available yet, Z...
 09/04/17 13:57:31 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
could
 not be reached after 1 tries, giving up.
 09/04/17 13:57:34 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 0 time(s).
 09/04/17 13:57:35 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 1 time(s).
 09/04/17 13:57:36 INFO ipc.HBaseClass: Retrying connect to server: /
 10.254.234.32:60020. Already tried 2 time(s).
 09/04/17 13:57:36 INFO ipc.HbaseRPC: Server at /10.254.234.32:60020
not
 available yet, Z...

 but if i check on the UI, hbase master is still on, (tried refreshing
it
 

Re: Datanode Setup

2009-04-17 Thread jpe30



Mithila Nagendra wrote:
 
 You have to make sure that you can ssh between the nodes. Also check the
 file hosts in /etc folder. Both the master and the slave much have each
 others machines defined in it. Refer to my previous mail
 Mithila
 
 


I have SSH setup correctly and here is the /etc/hosts file on node6 of the
datanodes.

#ip-address   hostname.domain.org   hostname
127.0.0.1   localhost.localdomain   localhost node6
192.168.1.10master
192.168.1.1 node1
192.168.1.2 node2
192.168.1.3 node3
192.168.1.4 node4
192.168.1.5 node5
192.168.1.6 node6

I have the slaves file on each machine set as node1 to node6, and each
masters file set to master except for the master itself.  Still, I keep
getting that same error in the datanodes...
-- 
View this message in context: 
http://www.nabble.com/Datanode-Setup-tp23064660p23101738.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Map-Reduce Slow Down

2009-04-17 Thread Mithila Nagendra
Hey Jason
The problem s fixed! :) My network admin had messed something up! Now it
works! Thanks for your help!

Mithila

On Thu, Apr 16, 2009 at 11:58 PM, Mithila Nagendra mnage...@asu.edu wrote:

 Thanks Jason! This helps a lot. I m planning to talk to my network admin
 tomorrow. I hoping he ll be able to fix this problem.
 Mithila


 On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop jason.had...@gmail.comwrote:

 Assuming you are on a linux box, on both machines
 verify that the servers are listening on the ports you expect via
 netstat -a -n -t -p
 -a show sockets accepting connections
 -n do not translate ip addresses to host names
 -t only list tcp sockets
 -p list the pid/process name

 on the machine 192.168.0.18
 you should have sockets bound to 0.0.0.0:54310 with a process of java,
 and
 the pid should be the pid of your namenode process.

 On the remote machine you should be able to *telnet 192.168.0.18 54310*
 and
 have it connect
 *Connected to 192.168.0.18.
 Escape character is '^]'.
 *

 If the netstat shows the socket accepting and the telnet does not connect,
 then something is blocking the TCP packets between the machines. one or
 both
 machines has a firewall, an intervening router has a firewall, or there is
 some routing problem
 the command /sbin/iptables -L will normally list the firewall rules, if
 any
 for a linux machine.


 You should be able to use telnet to verify that you can connect from the
 remote machine.

 On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra mnage...@asu.edu
 wrote:

  Thanks! I ll see what I can find out.
 
  On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop jason.had...@gmail.com
  wrote:
 
   The firewall was run at system startup, I think there was a
   /etc/sysconfig/iptables file present which triggered the firewall.
   I don't currently have access to any centos 5 machines so I can't
 easily
   check.
  
  
  
   On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop jason.had...@gmail.com
   wrote:
  
The kickstart script was something that the operations staff was
 using
  to
initialize new machines, I never actually saw the script, just
 figured
   out
that there was a firewall in place.
   
   
   
On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra mnage...@asu.edu
   wrote:
   
Jason: the kickstart script - was it something you wrote or is it
 run
   when
the system turns on?
Mithila
   
On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra 
 mnage...@asu.edu
wrote:
   
 Thanks Jason! Will check that out.
 Mithila


 On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop 
  jason.had...@gmail.com
wrote:

 Double check that there is no firewall in place.
 At one point a bunch of new machines were kickstarted and placed
 in
  a
 cluster and they all failed with something similar.
 It turned out the kickstart script turned enabled the firewall
 with
  a
rule
 that blocked ports in the 50k range.
 It took us a while to even think to check that was not a part of
  our
 normal
 machine configuration

 On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
  mnage...@asu.edu
   
 wrote:

  Hi Aaron
  I will look into that thanks!
 
  I spoke to the admin who overlooks the cluster. He said that
 the
gateway
  comes in to the picture only when one of the nodes
 communicates
   with
a
 node
  outside of the cluster. But in my case the communication is
  carried
out
  between the nodes which all belong to the same cluster.
 
  Mithila
 
  On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
  aa...@cloudera.com
   
 wrote:
 
   Hi,
  
   I wrote a blog post a while back about connecting nodes via
 a
gateway.
  See
  
 

   
  
 
 http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
  
   This assumes that the client is outside the gateway and all
   datanodes/namenode are inside, but the same principles
 apply.
You'll
 just
   need to set up ssh tunnels from every datanode to the
 namenode.
  
   - Aaron
  
  
   On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari 
 rphul...@yahoo-inc.com
  wrote:
  
   Looks like your NameNode is down .
   Verify if hadoop process are running (   jps should show
 you
  all
java
   running process).
   If your hadoop process are running try restarting your
 hadoop
process
 .
   I guess this problem is due to your fsimage not being
 correct
  .
   You might have to format your namenode.
   Hope this helps.
  
   Thanks,
   --
   Ravi
  
  
   On 4/15/09 10:15 AM, Mithila Nagendra mnage...@asu.edu
   wrote:
  
   The log file runs into thousands of line with the same
 message
being
   displayed every time.
  
   On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra 

Re: Datanode Setup

2009-04-17 Thread Mithila Nagendra
You should have conf/slaves file on the master node set to master, node01,
node02. so on and the masters file on master set to master. Also in the
/etc/hosts file get rid of 'node6' in the line 127.0.0.1
localhost.localdomain   localhost node6 on all your nodes. Ensure that the
/etc/hosts file contain the same information on all nodes. Also
hadoop-site.xml files on all nodes should have master:portno for hdfs and
tasktracker.
Once you do this restart hadoop.

On Fri, Apr 17, 2009 at 10:04 AM, jpe30 jpotte...@gmail.com wrote:




 Mithila Nagendra wrote:
 
  You have to make sure that you can ssh between the nodes. Also check the
  file hosts in /etc folder. Both the master and the slave much have each
  others machines defined in it. Refer to my previous mail
  Mithila
 
 


 I have SSH setup correctly and here is the /etc/hosts file on node6 of the
 datanodes.

 #ip-address   hostname.domain.org   hostname
 127.0.0.1   localhost.localdomain   localhost node6
 192.168.1.10master
 192.168.1.1 node1
 192.168.1.2 node2
 192.168.1.3 node3
 192.168.1.4 node4
 192.168.1.5 node5
 192.168.1.6 node6

 I have the slaves file on each machine set as node1 to node6, and each
 masters file set to master except for the master itself.  Still, I keep
 getting that same error in the datanodes...
 --
 View this message in context:
 http://www.nabble.com/Datanode-Setup-tp23064660p23101738.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-17 Thread Bradford Stephens
There's definitely a false dichotomy to this paper, and I think it's a
tad disingenuous. It's titled A Comparison Of Approaches To Large
Scale Data Analysis, when it should be titled A Comparison of
Parallel RDBMSs to MapReduce for RDBMS-specific problems. There's
little surprise that the people who wrote the paper have been
gunning for Hadoop for quite a while -- they've written papers
before which describe MR as a Big Step Backwards. Not to mention the
primary authors are a CTO of Vertica, a parallel DB company, and a
lead tech from Microsoft.

We all know MapReduce is not meant for non-parallelizable, non-indexed
tasks like O(1) access to data,table joins, grepping indexed stuff,
etc. MapReduce excels at highly parallelizable tasks, like keyword and
document indexing, web crawling, gene sequencing, etc.

What would have been *great*, and what I'm working on a whitepaper
for, is a study on what classes of problems are ideal for parallel
RDBMs, what are ideal for MapReduce, and then performance timing on
those solutions.

The study is about as useful as if I had written Comparison of
Approaches to Operating System File Allocation Table Management, and
then compared SQL and Ext3.

Yes, I'm in one of *those* moods today :)

Cheers,
Bradford

On Wed, Apr 15, 2009 at 8:22 AM, Jonathan Gray jl...@streamy.com wrote:
 I agree with you, Andy.

 This seems to be a great look into what Hadoop MapReduce is not good at.

 Over in the HBase world, we constantly deal with comparisons like this to
 RDBMSs, trying to determine if one is better than the other.  It's a false
 choice and completely depends on the use case.

 Hadoop is not suited for random access, joins, dealing with subsets of
 your data; ie. it is not a relational database!  It's designed to
 distribute a full scan of a large dataset, placing tasks on the same nodes
 as the data its processing.  The emphasis is on task scheduling, fault
 tolerance, and very large datasets, low-latency has not been a priority.
 There are no indexes to speak of, it's completely orthogonal to what it
 does, so of course there is an enormous disparity in cases where that
 makes sense.  Yes, B-Tree indexes are a wonderful breakthrough in data
 technology :)

 In short, I'm using Hadoop (HDFS and MapReduce) for a broad spectrum of
 applications including batch log processing, web crawling, and number of
 machine learning and natural language processing jobs... These may not be
 tasks that DBMS-X or Vertica would be good at, if even capable of them,
 but all things that I would include under Large-Scale Data Analysis.

 Would have been really interesting to see how things like Pig, Hive, and
 Cascading would stack up against DBMS-X/Vertica for very complex,
 multi-join/sort/etc queries, across a broad spectrum of use cases and
 dataset/result sizes.

 There are a wide variety of solutions to the problems out there.  It's
 important to know the strengths and weaknesses of each, so a bit
 unfortunate that this paper set the stage as it did.

 JG

 On Wed, April 15, 2009 6:44 am, Andy Liu wrote:
 Not sure if comparing Hadoop to databases is an apples to apples
 comparison.  Hadoop is a complete job execution framework, which
 collocates the data with the computation.  I suppose DBMS-X and Vertica do
 that to some certain extent, by way of SQL, but you're restricted to that.
 If you want
 to say, build a distributed web crawler, or a complex data processing
 pipeline, Hadoop will schedule those processes across a cluster for you,
 while Vertica and DBMS-X only deal with the storage of the data.

 The choice of experiments seemed skewed towards DBMS-X and Vertica.  I
 think everybody is aware that Map-Reduce is inefficient for handling
 SQL-like
 queries and joins.

 It's also worth noting that I think 4 out of the 7 authors either
 currently or at one time work with Vertica (or c-store, the precursor to
 Vertica).


 Andy


 On Tue, Apr 14, 2009 at 10:16 AM, Guilherme Germoglio
 germog...@gmail.comwrote:


 (Hadoop is used in the benchmarks)


 http://database.cs.brown.edu/sigmod09/


 There is currently considerable enthusiasm around the MapReduce
 (MR) paradigm for large-scale data analysis [17]. Although the
 basic control flow of this framework has existed in parallel SQL
 database management systems (DBMS) for over 20 years, some have called
 MR a dramatically new computing model [8, 17]. In
 this paper, we describe and compare both paradigms. Furthermore, we
 evaluate both kinds of systems in terms of performance and de- velopment
 complexity. To this end, we define a benchmark con- sisting of a
 collection of tasks that we have run on an open source version of MR as
 well as on two parallel DBMSs. For each task, we measure each system’s
 performance for various degrees of par- allelism on a cluster of 100
 nodes. Our results reveal some inter- esting trade-offs. Although the
 process to load data into and tune the execution of parallel DBMSs took
 much longer than the MR 

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :)

Surely there must be dozens more of you guys out there... c'mon,
accelerate your knowledge! Join us in Seattle!



On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
 Greetings,

 Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
 with me in the Seattle area? I can donate some facilities, etc. -- I
 also always have topics to speak about :)

 Cheers,
 Bradford