Thanks for the link Sean.

Whenever I looked into recovering wasted compute cycles (e.g by letting
a job scheduler like sun grid engine fire off jobs during downtime) we
found that the hassle of administering such a heterogeneous environment
wasn't worth it. Maybe running as an applet under hadoop, and the
implied virtual environment will make that easier.

If you're running in an applet without hdfs, doesn't that mean "your
moving both data and computation to the machine" as opposed to moving
"computation to the data?". Would this be a big issue for mahout? For
example,  if you're running kmeans and 90% of your machines are
workstations that would otherwise be idle, then wouldn't you need to
transfer roughly 90% of your dataset to the various clients (e.g client
might only receive a small fraction but you 90% needs to be shipped out
of your central storage)? It seems like network bottlenecks could easily
swamp the benefits of using workstation cycles.

J

On Sun, 2011-05-15 at 18:09 +0100, Sean Owen wrote:
> Hi all, in my travels I've come across a small interesting startup that I
> thought might be of interest to the user@ audience. It's MapFreeduce (
> http://mapfreeduce.com/), and they're spinning an interesting twist on
> MapReduce. They've constructed a simplified MapReduce API, one for which
> workers are able to run as Java applets in the browser sandbox.
> 
> It's interesting for two reasons, I can tell you, after playing with it
> myself. One, I think it's interesting as it asks whether a simpler version
> of MapReduce than what you get in Hadoop is viable. That is -- it's not
> Hadoop. Can you do something interesting without, say, direct access to
> HDFS? Combiners? custom InputFormats? And two, since it can fairly
> automatically turn office PCs with a browser into a safe background MR
> worker, might let organizational skunk-works create a cluster for cheap out
> of truly unused cycles to do something interesting.
> 
> I managed to reconstruct parts of the recommender pipeline on this framework
> without too much modification. It is possible to 'port' some parts of Mahout
> to this framework, if not all. MapReduce fans will probably enjoy taking a
> look at what they can get away with in a browser sandbox.
> 
> From a conversation with their founder I know they'd really like feedback
> and testers. Here's their pitch and plea for beta users in their own words.
> (I have no affiliation with or interest in the company.)
> 
> 
> *"MapFreeduce.com is a Washington DC-based startup making Big Data
> accessible to everyone. Our software service enables users to quickly and
> easily build a mapreduce cluster from the spare CPU-cycles of available
> computers without installing or configuring any software. To add a node to
> your MapFreeduce cluster and increase its power, you simply click on a link
> from any idle computer. You can scale your cluster to thousands of nodes to
> perform computation- and data-intensive tasks such as web indexing, data
> mining, business analytics, data warehousing, machine learning, financial
> analysis, scientific simulation, and bioinformatics research. MapFreeduce
> allows you to focus on crunching your data without having to worry about
> either the cost and complexity of setting up a traditional hardware cluster
> or the perpetual fees charged per hour and per node by common cloud
> providers.
> 
> We are looking for individuals that would be interested in joining our free,
> private beta test and/or providing feedback to our service."*

Reply via email to