Re: Questions about Myriad background

2016-05-30 Thread Swapnil Daingade
Great Questions Dave!

Here is what I think
1. The current way an organization would run Yarn and Mesos is to have two
separate clusters with dedicated resources (compute, storage, networking).
This, I feel, is not the best use of the clusters as resources cannot be
shared. You could have one cluster starved for resources while the other
remains idle.

Better resource sharing and utilization
With Myriad, we delegate all Yarn resource management to Mesos. You can
have one Mesos cluster, one instance of the DFS (HDFS, MapRFS etc) where
you can run any Mesos application (including Yarn).
For E.g. Imagine you are running a bunch of webservers during the day and
you want to analyse the web server logs at night using yarn when the web
traffic is less. You can reduce the web server instances, returning back
resources to Mesos which in turn could be used for launching and expanding
a yarn cluster.

Multi tenancy
Things become more interesting when you launch multiple tenant yarn
clusters on Mesos which can expand and contract dynamically along with
other mesos applications. Lets extend the earlier example. You still have
one physical cluster running Mesos and the DFS. Resources are being shared
between webservers and two tenant yarn clusters (say one for the
Engineering department and another for Finance). At the end of a quarter,
the finance department could be allocated more resources by shutting down a
few nodemanagers from the engineering yarn cluster, or a few web servers
and launching new nodemanagers for the finance yarn cluster.

Yarn as a service
Say your org grows big and you need more yarn clusters (dev, test, prod,
finance). You can still have one physical cluster running Mesos and single
DFS instance. The physical cluster scales as you add new nodes to it. The
new resources become available to all the Mesos applications including
webservers, multiple tenant yarn clusters. Each individual tenant yarn
cluster can expand and shrink dynamically. You might also want to run
multiple versions of yarn in your clusters (say 2.7 in prod vs 3.0 in dev).
One could easily do this using docker or binary distribution. There are
other ideas about complete isolation (compute, storage, networking) between
yarn clusters using docker that have been floated around.

Fine grained scaling
Another cool thing that Myriad provides is fine grained scaling (fgs for
short). With fgs, each yarn cluster has certain guaranteed capacity.
However it can also utilize resources beyond its guaranteed capacity if
they are available in the cluster. This improves resource utilization when
the physical cluster is lightly loaded.

2. Not many that I can think of. If MapReduce is all that you need to run,
then MR1 on Mesos might work well. You might not have access to fgs. I
think one of the reasons that YARN came into being was because the Hadoop
community wanted to separate the cluster resource management part from the
application part (MapReduce). With Myriad, any new application that runs on
Yarn also automatically runs on Mesos without the need for a Mesos
framework to be written for it.

3. I feel data locality needs to be better addressed. There are multiple
ideas that have been floated around.
Like running multiple nodemanagers per node in fgs mode to get advantage of
data locality etc

4. I think this will be more clear when 3 is addressed.

Hope this help.

Regards
Swapnil


On Mon, May 30, 2016 at 12:02 AM, Dave Webb  wrote:

> Hi,
>
> I have read about Mesos [1], Yarn [2] and Myriad, but I couldn't find an
> explicit answer to a few general questions. First of all, I don't have an
> actual cluster with a business usecase to solve, but I'm interested in the
> technologies and motivation behind these systems.
>
> From my understanding Myriad is a Mesos Framework (just like Marathon,
> Spark, ...) which acts as a "wrapper" around Yarn. This enables a dynamic
> coexistence of Yarn and Mesos on the same cluster which was originally not
> possible.
> However, from a theoretical standpoint, Yarn and Mesos appear to be - in
> general - only different variations of the same thing: Resource Negotiators
> in a cluster environment.
> This leads to the first question:
>
> (1) Why would you want to run Mesos and Yarn together?
> What would be the disadvantages of choosing only one of them?
> One valid argument might be that there are Mesos Frameworks / Yarn
> Applications which you don't want to port to Yarn / Mesos and vice versa.
> Myriad would allow you to use Mesos (and all frameworks built for it), but
> still use all Yarn applications.
>
> Nevertheless, in many cases I would suspect that even though there surely
> are interesting Yarn applications, the most prominent example is MapReduce.
> However, MapReduce v1 has been ported to a Mesos Framework [1, 3] several
> years ago.
> This leads to the second question:
>
> (2) What are the advantages of running MapReduce v2 using Yarn via Myriad
> on a Mesos Cluster instead of 

Podling Report Reminder - June 2016

2016-05-30 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 15 June 2016, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, June 1st).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

http://wiki.apache.org/incubator/May2016

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC