Re: [VOTE] accept Tashi into the Incubator

2008-09-04 Thread David O'Hallaron
The voting appears to have ended, with 14+ votes and zero negative
votes, and three volunteers to be mentors (thank you!):

1. Matthieu Riou ([EMAIL PROTECTED])
2. Craig L Russell ([EMAIL PROTECTED])
3. Paul Freemantle ([EMAIL PROTECTED])

What is the next step for admission into the incubator?

Dave O

On Thu, Aug 14, 2008 at 10:11 PM, Matthieu Riou [EMAIL PROTECTED] wrote:
 So shouldn't this vote get tallied now? Seems that we're well passed the 72
 hours.

 Matthieu

 On Thu, Aug 14, 2008 at 12:44 PM, Matt Hogstrom [EMAIL PROTECTED] wrote:

 +1


 On Aug 4, 2008, at 1:48 PM, Doug Cutting wrote:

  Please vote on accepting Tashi into the Incubator.

 Tashi's proposal is at:

  http://wiki.apache.org/incubator/TashiProposal

 Thanks!

 Doug


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-- 
-- David O'Hallaron
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] Tashi

2008-07-23 Thread David O'Hallaron
No worries. I've removed the entry on the wiki version of the proposal
at http://wiki.apache.org/incubator/TashiProposal. It now reads
simply:

Initially, there will be one committer each from Carnegie Mellon and
Intel Research:
 * Michael Stroucken ([EMAIL PROTECTED])
 * Michael Ryan ([EMAIL PROTECTED])




Dave


On Tue, Jul 22, 2008 at 8:36 PM, William A. Rowe, Jr.
[EMAIL PROTECTED] wrote:
 Doug Cutting wrote:

 Noel J. Bergman wrote:

 With respect to Initially, we plan to start with one committer each from
 Carnegie Mellon and Intel Research, with a Yahoo committer to be
 determined
 later, that's awkwardly phrased.  It appears to imply a corporate
 representative doing commits for hidden people, something that we
 consider
 to be an anti-pattern.

 That is not the intent.  The intent is for Yahoo! to assign someone to
 work on this project as a direct contributor.  But that person has not yet
 been identified.

 Whomever is to be actively involved in development
 should be on the committer list.  If it is just a community of just The
 Two
 Michaels, fine, but the wording should be rephrased.

 +1 If Y! does not name someone soon, then that entry should be removed.  A
 committer from Y! can always be added later, based on merit.

 No; the entry should be removed now.

 Yahoo the Company cannot place a reservation on a place at the table for
 an unnamed body.  This is not how the ASF works.  Nor are committers
 expressed as delegates of the institutions the work for/study at.
 This places a cloud over the acceptance of this particular project, and
 I would encourage everyone to be sure their mentors/champions review the
 text before posting a proposal for incubation..

 Bill

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
-- David O'Hallaron
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] Tashi

2008-07-16 Thread David O'Hallaron
Matthieu,

Thanks very much. I'll add you as a proposed mentor on the wiki proposal.

http://wiki.apache.org/incubator/TashiProposal

Dave

On Tue, Jul 15, 2008 at 3:00 PM, Matthieu Riou [EMAIL PROTECTED] wrote:
 On Tue, Jul 15, 2008 at 6:55 AM, David O'Hallaron [EMAIL PROTECTED] wrote:

 Matthieu,

   * The sponsoring entity is the Incubator so I'm guessing you're shooting
  for graduating as a TLP. What kind of interactions do you foresee with
  Hadoop for example?

 We talked with Doug Cutting about whether to shoot for a TLP or a
 subproject under Hadoop. We decided ultimately to go the TLP route
 because Tashi and Hadoop are at different levels in the stack.
 However, we see Hadoop as one of the important applications running on
 Tashi virtual clusters, so the two projects are quite complementary.

   * IIC Tashi is only about management, not about the underlying
  storage/computing technology. Is that correct? And if so which ones do
 you
  plan to integrate with?

 Yes, that's correct. We plan to integrate with the major VMMs, such as
 Xen, Linux KVM and VMWare. For storage, we're looking at integrating
 with HDFS, pVFS and later pNFS when it becomes more mature.An
 important goal is to provide the hooks and interfaces that allow any
 DFS and VMM vendor to integrate with the system.

   * I can't help asking for more technical details. What's the
  implementation language for your POC code? What are the non-proprietary
  interfaces you're thinking of?

 The POC code is a couple of thousand lines of original Python code.
 Major components are cluster manager (cm), which runs on one of the
 cluster nodes, a node manager (nm), which runs on each of the other
 physical cluster nodes, a simple db on the cluster manager for
 configuration data, and some client utilities.

 We're really thinking hard about interfaces now, but don't have clear
 definitions yet. The kinds of things wer are  thinking about are
 interfaces between the client and cm and cm and nm for manipulating
 starting, starting, and migrating vms, interfaces for some kind of
 event/messaging system for monitoring and reporting system state to
 the cm and client, interfaces between the cm/nm and storage system to
 allow the cm to do storage-aware scheduling of vms, interfaces to
 power management and system management hardware features, and possibly
 interfaces for federating different Tashi clusters.



 Okay, sounds good to me, thanks for the clarifications.

 Also if you need another mentor, you can count me in.

 Cheers,
 Matthieu




 Thanks,

 Dave

 --
 -- David O'Hallaron
 -- Director, Intel Research Pittsburgh
 -- Assoc Prof of CS and ECE, Carnegie Mellon University
 -- http://www.cs.cmu.edu/~droh http://www.cs.cmu.edu/%7Edroh

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-- 
-- David O'Hallaron
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] Tashi

2008-07-15 Thread David O'Hallaron
Matthieu,

  * The sponsoring entity is the Incubator so I'm guessing you're shooting
 for graduating as a TLP. What kind of interactions do you foresee with
 Hadoop for example?

We talked with Doug Cutting about whether to shoot for a TLP or a
subproject under Hadoop. We decided ultimately to go the TLP route
because Tashi and Hadoop are at different levels in the stack.
However, we see Hadoop as one of the important applications running on
Tashi virtual clusters, so the two projects are quite complementary.

  * IIC Tashi is only about management, not about the underlying
 storage/computing technology. Is that correct? And if so which ones do you
 plan to integrate with?

Yes, that's correct. We plan to integrate with the major VMMs, such as
Xen, Linux KVM and VMWare. For storage, we're looking at integrating
with HDFS, pVFS and later pNFS when it becomes more mature.An
important goal is to provide the hooks and interfaces that allow any
DFS and VMM vendor to integrate with the system.

  * I can't help asking for more technical details. What's the
 implementation language for your POC code? What are the non-proprietary
 interfaces you're thinking of?

The POC code is a couple of thousand lines of original Python code.
Major components are cluster manager (cm), which runs on one of the
cluster nodes, a node manager (nm), which runs on each of the other
physical cluster nodes, a simple db on the cluster manager for
configuration data, and some client utilities.

We're really thinking hard about interfaces now, but don't have clear
definitions yet. The kinds of things wer are  thinking about are
interfaces between the client and cm and cm and nm for manipulating
starting, starting, and migrating vms, interfaces for some kind of
event/messaging system for monitoring and reporting system state to
the cm and client, interfaces between the cm/nm and storage system to
allow the cm to do storage-aware scheduling of vms, interfaces to
power management and system management hardware features, and possibly
interfaces for federating different Tashi clusters.

Thanks,

Dave

-- 
-- David O'Hallaron
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] Tashi

2008-07-11 Thread David O'Hallaron
Noel,

I've fixed the wording on the wiki text to clear up the initial committers:

We've been talking with the storage group at HP, haven't approached
the others yet, but would certainly welcome them.

Thanks!

Dave
http://wiki.apache.org/incubator/TashiProposal

***
David,

I just reviewed http://wiki.apache.org/incubator/TashiProposal.
Interesting.  Has anyone been in touch with VMware, XenSource, Google,
Amazon, et al to invite them to participate?

With respect to Initially, we plan to start with one committer each from
Carnegie Mellon and Intel Research, with a Yahoo committer to be determined
later, that's awkwardly phrased.  It appears to imply a corporate
representative doing commits for hidden people, something that we consider
to be an anti-pattern.  Whomever is to be actively involved in development
should be on the committer list.  If it is just a community of just The Two
Michaels, fine, but the wording should be rephrased.

--- Noel



-- 
-- David O'Hallaron
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[PROPOSAL] Tashi

2008-07-10 Thread David O'Hallaron
This is a  proposal to enter the incubator.

See http://wiki.apache.org/incubator/TashiProposal for the most
up-to-date version.

We're looking forward to comments from the community.

Thanks!
Dave

-- 
-- David O'Hallaron,
-- Director, Intel Research Pittsburgh
-- Assoc Prof of CS and ECE, Carnegie Mellon University
-- http://www.cs.cmu.edu/~droh

---cut--
= Tashi Proposal =

A proposal to the Apache Software Foundation Incubator PMC by

David O'Hallaron^*+^, Michael Kozuch^*^, Michael Ryan^*^, Steven
Schlosser^*^, Jim Cipar^+^, Greg Ganger^+^, Garth Gibson^+^, Julio
Lopez^+^, Michael Strouken^+^, Wittawat Tantisiriroj^+^, Doug
Cutting^#^, Jay Kistler^#^, Thomas Kwan^#^

^*^Intel Research Pittsburgh, ^+^Carnegie Mellon University, ^#^Yahoo!


July 10, 2008


== 1. Abstract ==


Tashi is a cluster management system for cloud computing on Big Data.

== 2. Proposal ==

The Tashi project aims to build a software infrastructure for cloud
computing on massive internet-scale datasets (what we call ''Big
Data''). The idea is to build a cluster management system that enables
the Big Data that are stored in a cluster/data center to be accessed,
shared, manipulated, and computed on by remote users in a convenient,
efficient, and safe manner.  The system aims to  provide the following
basic capabilities:

(a) ''On-demand provisioning of storage and compute resources.'' Users
request a number of compute nodes, which can be either virtual or
physical machines, and a set of disk images to boot up on the nodes.
In response they receive their own persistent logical cluster of
compute and storage nodes, which they can then manage and use.

(b) ''Extensible end-to-end system management.'' Tashi will define
open non-proprietary interfaces for management tasks such as
observation, inference, planning, and actuation. This will keep the
system vendor-neutral and allow different research and development
groups to plug in different implementations of different management
modules.

(c) ''Cooperative storage and compute management.''  The system will
define new non-proprietary interfaces and methods that will allow
compute and storage management to work together in concert.

(d) ''Flexible storage models.'' The system will support a range of
different storage models, such as network-attached storage, per-node
storage, and hybrids, to allow developers, researchers, and large
scale cluster/data center operators to experiment with different kinds
of file systems.

(e) ''Flexible machine models.'' The system will support different
machine models.  In particular, it will be VMM-agnostic, able to run
different virtual machine monitors such as KVM and Xen. Also, in order
to address the cluster squatting problem (when clusters are balkanized
by users who reserve and hold nodes for their exclusive use) the
system will support a novel bi-model booting capability, in which
virtual machine and physical machine instances can boot from the same
disk image.

== 3. Rationale and Approach ==

Digital media, pervasive sensing, web authoring, mobile computing,
scientific and medical instruments, physical simulations, and virtual
worlds are all delivering vast new datasets relating to every aspect
of our lives. A growing fraction of this Big Data is going unused or
being underexploited due to the overwhelming scale of the data
involved.  Effective sharing, understanding, and use of this new
wealth of raw information poses one of the great challenges for the
new century.

In order to compute on this emerging Big Data, many research and
development groups are purchasing their own racks of compute and
storage servers. The goal of the Tashi project is to develop a layer
of utility software that turns these raw racks of servers into easily
managed cloud computers that will allow remote users to share and
explore their Big Data.

To our knowledge there are no open source projects addressing cluster
management for Big Data applications. We need a project such as Tashi
for a number of reasons: (1) No cloud computing cluster management
systems have tackled the problem of having both compute and storage
management working together in concert, which we believe will be
necessary to support Big Data. (2) We need non-proprietary interfaces
for cloud computing, and open source is the way to develop these. For
example, Google's new App Engine and Amazon's web services require
people to build to proprietary API's, so that their applications are
no longer vendor neutral, but are tied to a particular service
provider. (3) We need an extensible system that can serve as a
platform to stimulate research in cluster management for cloud
computing.

The Tashi system is targeted at two (not always distinct) communities:

(1) As a production system for organizations who want to offer medium
to large scale clusters to their users. For example, many companies
and university departments are purchasing such clusters, and a system
like Tashi would help them provide their users