Re: Amazon Elastic MapReduce

2009-04-06 Thread Patrick A.

Are intermediate results stored in S3 as well?

Also, any plans to support HTable?



Chris K Wensel-2 wrote:
 
 
 FYI
 
 Amazons new Hadoop offering:
 http://aws.amazon.com/elasticmapreduce/
 
 And Cascading 1.0 supports it:
 http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html
 
 cheers,
 ckw
 
 --
 Chris K Wensel
 ch...@wensel.net
 http://www.cascading.org/
 http://www.scaleunlimited.com/
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Amazon-Elastic-MapReduce-tp22842658p22911128.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Amazon Elastic MapReduce

2009-04-06 Thread Peter Skomoroch
Intermediate results can be stored in hdfs on the EC2 machines, or in S3
using s3n... performance is better if you store on hdfs:

 -input,
s3n://elasticmapreduce/samples/similarity/lastfm/input/,
 -output,hdfs:///home/hadoop/output2/,



On Mon, Apr 6, 2009 at 11:27 AM, Patrick A. patrickange...@gmail.comwrote:


 Are intermediate results stored in S3 as well?

 Also, any plans to support HTable?



 Chris K Wensel-2 wrote:
 
 
  FYI
 
  Amazons new Hadoop offering:
  http://aws.amazon.com/elasticmapreduce/
 
  And Cascading 1.0 supports it:
  http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html
 
  cheers,
  ckw
 
  --
  Chris K Wensel
  ch...@wensel.net
  http://www.cascading.org/
  http://www.scaleunlimited.com/
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Amazon-Elastic-MapReduce-tp22842658p22911128.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch


Re: Amazon Elastic MapReduce

2009-04-03 Thread Steve Loughran

Brian Bockelman wrote:


On Apr 2, 2009, at 3:13 AM, zhang jianfeng wrote:

seems like I should pay for additional money, so why not configure a 
hadoop

cluster in EC2 by myself. This already have been automatic using script.




Not everyone has a support team or an operations team or enough time to 
learn how to do it themselves.  You're basically paying for the fact 
that the only thing you need to know to use Hadoop is:

1) Be able to write the Java classes.
2) Press the go button on a webpage somewhere.

You could use Hadoop with little-to-zero systems knowledge (and without 
institutional support), which would always make some researchers happy.


Brian


True, but this way nobody gets the opportunity to learn how to do it 
themselves, which can be a tactical error one comes to regret further 
down the line. By learning the pain of cluster management today, you get 
to keep it under control as your data grows.


I am curious what bug patches AWS will supply, for they have been very 
silent on their hadoop work to date.


Re: Amazon Elastic MapReduce

2009-04-03 Thread Tim Wintle
On Fri, 2009-04-03 at 11:19 +0100, Steve Loughran wrote:
 True, but this way nobody gets the opportunity to learn how to do it 
 themselves, which can be a tactical error one comes to regret further 
 down the line. By learning the pain of cluster management today, you get 
 to keep it under control as your data grows.

Personally I don't want to have to learn (and especially not support in
production) the EC2 / S3 part, so it does sound appealing.

On a side note, I'd hope that at some point they give some control over
the priority of the overall job - on the level of you can boot up these
machines whenever you want, or boot up these machines now - that
should let them manage the load on their hardware and reduce costs
(which I'd obviously expect them to pass on the users of low-priority
jobs). I'm not sure how that would fit into the give me 10 nodes
method at the moment.

 
 I am curious what bug patches AWS will supply, for they have been very 
 silent on their hadoop work to date.

I'm hoping it will involve security of EC2 images, but not expectant.





Re: Amazon Elastic MapReduce

2009-04-03 Thread Stuart Sierra
On Thu, Apr 2, 2009 at 4:13 AM, zhang jianfeng zjf...@gmail.com wrote:
 seems like I should pay for additional money, so why not configure a hadoop
 cluster in EC2 by myself. This already have been automatic using script.

Personally, I'm excited about this.  They're charging a tiny fraction
above the standard EC2 rate.  I like that the cluster shuts down
automatically when the job completes -- you don't have to sit around
and watch it.  Yeah, you can automate that, but it's one more thing to
think about.

-Stuart


Re: Amazon Elastic MapReduce

2009-04-03 Thread Lukáš Vlček
I may be wrong but I would welcome this. As far as I understand the hot
topic in cloud computing these days is standardization ... and I would be
happy if Hadoop could be considered as a standard for cloud computing
architecture. So the more Amazon pushes Hadoop the more it could be accepted
by other players in this market (and the better for customers when switching
from one cloud provider to the other). Just my 2 cents.
Regards,
Lukas

On Fri, Apr 3, 2009 at 4:36 PM, Stuart Sierra
the.stuart.sie...@gmail.comwrote:

 On Thu, Apr 2, 2009 at 4:13 AM, zhang jianfeng zjf...@gmail.com wrote:
  seems like I should pay for additional money, so why not configure a
 hadoop
  cluster in EC2 by myself. This already have been automatic using script.

 Personally, I'm excited about this.  They're charging a tiny fraction
 above the standard EC2 rate.  I like that the cluster shuts down
 automatically when the job completes -- you don't have to sit around
 and watch it.  Yeah, you can automate that, but it's one more thing to
 think about.

 -Stuart




-- 
http://blog.lukas-vlcek.com/


RE: Amazon Elastic MapReduce

2009-04-03 Thread Ricky Ho
I disagree.  This is like arguing that everyone should learn everything 
otherwise they don't know how to do everything.

A better situation is having the algorithm designer just focusing in how to 
break down their algorithm into Map/Reduce form and test it out immediately, 
rather than requiring them to learn all the admin aspects of Hadoop, which 
becomes a hurdle for them to move fast.

Rgds,
Ricky

-Original Message-
From: Steve Loughran [mailto:ste...@apache.org] 
Sent: Friday, April 03, 2009 2:19 AM
To: core-user@hadoop.apache.org
Subject: Re: Amazon Elastic MapReduce

Brian Bockelman wrote:
 
 On Apr 2, 2009, at 3:13 AM, zhang jianfeng wrote:
 
 seems like I should pay for additional money, so why not configure a 
 hadoop
 cluster in EC2 by myself. This already have been automatic using script.


 
 Not everyone has a support team or an operations team or enough time to 
 learn how to do it themselves.  You're basically paying for the fact 
 that the only thing you need to know to use Hadoop is:
 1) Be able to write the Java classes.
 2) Press the go button on a webpage somewhere.
 
 You could use Hadoop with little-to-zero systems knowledge (and without 
 institutional support), which would always make some researchers happy.
 
 Brian

True, but this way nobody gets the opportunity to learn how to do it 
themselves, which can be a tactical error one comes to regret further 
down the line. By learning the pain of cluster management today, you get 
to keep it under control as your data grows.

I am curious what bug patches AWS will supply, for they have been very 
silent on their hadoop work to date.


Re: Amazon Elastic MapReduce

2009-04-02 Thread zhang jianfeng
Does it support pig ?


On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net wrote:


 FYI

 Amazons new Hadoop offering:
 http://aws.amazon.com/elasticmapreduce/

 And Cascading 1.0 supports it:
 http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html

 cheers,
 ckw

 --
 Chris K Wensel
 ch...@wensel.net
 http://www.cascading.org/
 http://www.scaleunlimited.com/




Re: Amazon Elastic MapReduce

2009-04-02 Thread Miles Osborne
... and only in the US

Miles

2009/4/2 zhang jianfeng zjf...@gmail.com:
 Does it support pig ?


 On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net wrote:


 FYI

 Amazons new Hadoop offering:
 http://aws.amazon.com/elasticmapreduce/

 And Cascading 1.0 supports it:
 http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html

 cheers,
 ckw

 --
 Chris K Wensel
 ch...@wensel.net
 http://www.cascading.org/
 http://www.scaleunlimited.com/






-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Amazon Elastic MapReduce

2009-04-02 Thread zhang jianfeng
seems like I should pay for additional money, so why not configure a hadoop
cluster in EC2 by myself. This already have been automatic using script.





On Thu, Apr 2, 2009 at 4:09 PM, Miles Osborne mi...@inf.ed.ac.uk wrote:

 ... and only in the US

 Miles

 2009/4/2 zhang jianfeng zjf...@gmail.com:
  Does it support pig ?
 
 
  On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net wrote:
 
 
  FYI
 
  Amazons new Hadoop offering:
  http://aws.amazon.com/elasticmapreduce/
 
  And Cascading 1.0 supports it:
  http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html
 
  cheers,
  ckw
 
  --
  Chris K Wensel
  ch...@wensel.net
  http://www.cascading.org/
  http://www.scaleunlimited.com/
 
 
 



 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.



Announcing Amazon Elastic MapReduce

2009-04-02 Thread Sirota, Peter
Dear Hadoop community,

We are excited today to introduce the public beta of Amazon Elastic MapReduce, 
a web service that enables developers to easily and cost-effectively process 
vast amounts of data. It utilizes a hosted Hadoop (0.18.3) running on the 
web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and 
Amazon Simple Storage Service (Amazon S3).

Using Amazon Elastic MapReduce, you can instantly provision as much or as 
little capacity as you like to perform data-intensive tasks for applications 
such as web indexing, data mining, log file analysis, machine learning, 
financial analysis, scientific simulation, and bioinformatics research.  Amazon 
Elastic MapReduce lets you focus on crunching or analyzing your data without 
having to worry about time-consuming set-up, management or tuning of Hadoop 
clusters or the compute capacity upon which they sit.

Working with the service is easy: Develop your processing application using our 
samples or by building your own, upload your data to Amazon S3, use the AWS 
Management Console or APIs to specify the number and type of instances you 
want, and click Create Job Flow. We do the rest, running Hadoop over the 
number of specified instances, providing progress monitoring, and delivering 
the output to Amazon S3.

We will be posting several patches to Hadoop today and are hoping to become a 
part of this exciting community moving forward.

We hope this new service will prove a powerful tool for your data processing 
needs and becomes a great development platform to build sophisticated data 
processing applications. You can sign up and start using the service today at 
http://aws.amazon.com/elasticmapreduce.

Our forums are available to ask any questions or suggest features: 
http://developer.amazonwebservices.com/connect/forum.jspa?forumID=52

Sincerely,

The Amazon Web Services Team



Re: Amazon Elastic MapReduce

2009-04-02 Thread Brian Bockelman


On Apr 2, 2009, at 3:13 AM, zhang jianfeng wrote:

seems like I should pay for additional money, so why not configure a  
hadoop
cluster in EC2 by myself. This already have been automatic using  
script.





Not everyone has a support team or an operations team or enough time  
to learn how to do it themselves.  You're basically paying for the  
fact that the only thing you need to know to use Hadoop is:

1) Be able to write the Java classes.
2) Press the go button on a webpage somewhere.

You could use Hadoop with little-to-zero systems knowledge (and  
without institutional support), which would always make some  
researchers happy.


Brian





On Thu, Apr 2, 2009 at 4:09 PM, Miles Osborne mi...@inf.ed.ac.uk  
wrote:



... and only in the US

Miles

2009/4/2 zhang jianfeng zjf...@gmail.com:

Does it support pig ?


On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net  
wrote:




FYI

Amazons new Hadoop offering:
http://aws.amazon.com/elasticmapreduce/

And Cascading 1.0 supports it:
http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html

cheers,
ckw

--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/








--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.





Re: Amazon Elastic MapReduce

2009-04-02 Thread Chris K Wensel

You should check out the new pricing.

On Apr 2, 2009, at 1:13 AM, zhang jianfeng wrote:

seems like I should pay for additional money, so why not configure a  
hadoop
cluster in EC2 by myself. This already have been automatic using  
script.






On Thu, Apr 2, 2009 at 4:09 PM, Miles Osborne mi...@inf.ed.ac.uk  
wrote:



... and only in the US

Miles

2009/4/2 zhang jianfeng zjf...@gmail.com:

Does it support pig ?


On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net  
wrote:




FYI

Amazons new Hadoop offering:
http://aws.amazon.com/elasticmapreduce/

And Cascading 1.0 supports it:
http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html

cheers,
ckw

--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/








--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/



Re: Amazon Elastic MapReduce

2009-04-02 Thread Kevin Peterson
So if I understand correctly, this is an automated system to bring up a
hadoop cluster on EC2, import some data from S3, run a job flow, write the
data back to S3, and bring down the cluster?

This seems like a pretty good deal. At the pricing they are offering, unless
I'm able to keep a cluster at more than about 80% capacity 24/7, it'll be
cheaper to use this new service.

Does this use an existing Hadoop job control API, or do I need to write my
flows to conform to Amazon's API?


Re: Amazon Elastic MapReduce

2009-04-02 Thread Peter Skomoroch
Kevin,

The API accepts any arguments you can pass in the standard jobconf for
Hadoop 18.3, it is pretty easy to convert over an existing jobflow to a JSON
job description that will run on the service.

-Pete

On Thu, Apr 2, 2009 at 2:44 PM, Kevin Peterson kpeter...@biz360.com wrote:

 So if I understand correctly, this is an automated system to bring up a
 hadoop cluster on EC2, import some data from S3, run a job flow, write the
 data back to S3, and bring down the cluster?

 This seems like a pretty good deal. At the pricing they are offering,
 unless
 I'm able to keep a cluster at more than about 80% capacity 24/7, it'll be
 cheaper to use this new service.

 Does this use an existing Hadoop job control API, or do I need to write my
 flows to conform to Amazon's API?




-- 
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch