Re: [VOTE] Release Apache Parquet Format 2.3.0 RC2

2015-02-17 Thread Ryan Blue

On 02/16/2015 11:16 AM, Ted Dunning wrote:

I have downloaded this release and verified the signature and md5
checksum.  The SHA checksum is a binary file and it isn't obvious to me how
to check it.  How was it produced?

I have verified that the LICENSE and NOTICE files are present and seem
well-formed.  I have verified that the DISCLAIMER is present and
well-formed.

I was unable to compile the code, apparently due to a thrift version
mismatch of some kind.  The README does not specify pre-requisites to
compilation, but simply states that [mvn package] should suffice.  It did
not suffice for me.  Here is the error message:

[ERROR] thrift failed output:

[ERROR] thrift failed error: !!! Unrecognized option: -out
Usage: thrift [options] file



Painful as it is to say this, I am -1 on this release unless somebody can
explain some simple process to resolve my issues.  Those issues are:

1) What format is used for the SHA checksum?  How is it expected that I
should check it?

2) What are the true pre-requisites for compilation?

I would be willing to upgrade my vote on this release if the checksum is
corrected and the README is updated for future releases (and I am told how
to compile things now).  Since the only change that I am requesting to this
release is the checksum file, it should be possible to move forward without
cutting a new release file.


As Julien noted, you need to have thrift 0.7.0 installed. You also need 
protoc 2.5.0 installed for protobuf. I'll add more information in the 
README for this.


In the mean time, if you take a look at the .travis.yml file, you can 
see the preparation steps used to run CI jobs. That's what I usually 
point people at instead of maintaining it in two places, but I agree 
that shouldn't be the solution.


Thanks for taking a look at this!

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] [PROPOSAL] Myriad for Apache Incubator

2015-02-17 Thread Henry Saputra
Oh it is painless =)

From what I have seen, having just dev@ list early would help ramping
up dev quickly.

@Adam and @Ted, IMHO once the transition is over and the project has
one release under ASF adding user@ list would be beneficial.

- Henry

On Tue, Feb 17, 2015 at 9:59 PM, Adam Bordelon a...@mesosphere.io wrote:
 Good point. I'm fine with starting with just a dev@ first, and then we can
 add user@ if/when dev becomes too noisy.
 I assume adding a new mailing list is relatively painless.

 On Tue, Feb 17, 2015 at 9:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Tue, Feb 17, 2015 at 9:38 PM, Henry Saputra henry.sapu...@gmail.com
 wrote:

  @Adam and @Ted, like any new incubator projects coming we always check
  if you need user@ so early in the process?
  Would probably better to have all discussion in dev@ early in
 incubation.
 

 Henry,

 This is a good question to ask (and I have asked it in the past).

 I think that Myriad is in, or nearly in production here and there already.
 That means that a user@ list might well be useful.


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] [PROPOSAL] Myriad for Apache Incubator

2015-02-17 Thread Naresh Agarwal
Looks interesting. Looking forward to this.

Thanks
Naresh

On Wed, Feb 18, 2015 at 11:08 AM, Henry Saputra henry.sapu...@gmail.com
wrote:

 I love this project and the idea. Tried to hack it couple years ago
 could not make it work.

 Looking forward seeing it in ASF incubator for sure.

 @Adam and @Ted, like any new incubator projects coming we always check
 if you need user@ so early in the process?
 Would probably better to have all discussion in dev@ early in incubation.

 - Henry

 On Fri, Feb 13, 2015 at 5:06 PM, Adam Bordelon a...@mesosphere.io wrote:
  Hello friends,
 
  The Myriad team and I would like to propose the Myriad project for
  inclusion in the Apache Incubator.
  Full text of the proposal is below. I can add it to the incubator wiki as
  well, if desired.
  Please review and discuss. If there are no major concerns, I will call
 for
  a Vote after a week.
 
  Cheers,
  -Adam-
  me@apache
 
  ==
  Apache Myriad Proposal
 
  * Abstract
  Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos
 together
  on the same cluster and allows dynamic resource allocations across both
  Hadoop and other applications running on the same physical data center
  infrastructure.
 
  * Proposal
  The vision of Myriad is to provide a comprehensive framework to ensure
  Apache Hadoop YARN and Apache Mesos can interoperate with minimal changes
  on either side and prevent the static fragmentation of data center
  resources.
 
  * Background
  Project Myriad is the first resource management framework that allows big
  data developers to run YARN-based Hadoop jobs alongside other
 applications
  and services in production. ebay Inc., MapR, and Mesosphere jointly built
  Myriad (available on Github at https://github.com/mesos/myriad) with the
  vision of freeing big data jobs from siloed clusters and consolidating
  infrastructure into a single pool of resources for greater utilization
 and
  operational efficiency. Several companies including Twitter have
 expressed
  interest in Myriad and have begun testing it.
 
  * Rationale
  Many Hadoop users are building larger clusters (data lake/data hub
  architectures) that support multiple workloads - made possible by the
  advent of Apache Hadoop YARN. As the clusters grow in size and
 importance,
  they become an important application within the broader datacenter. At
 the
  same time, Apache Mesos enables efficient resource isolation and sharing
  across distributed applications for the broader data center, for instance
  MPI, Spark, long running web services, build/test infrastructure,
  traditional linux applications/scripts, and others (including arbitrary
  docker images).
 
  Myriad aims to enable co-existence of Apache Hadoop YARN and Apache Mesos
  on the same physical data center resources, reducing fragmentation of
 data
  center resources.
 
  * Project Goals
  ** Initial Goals
  - Run Myriad alongside Apache Hadoop YARN and Apache Mesos to allow
 policy
  based allocation of data center resources across Apache Hadoop and other
  distributed applications
  - Ensure YARN based execution frameworks work without any changes when
  running alongside Myriad. YARN Applications will continue to interact and
  run on top of YARN and can choose to be unaware of Myriad.
  - Ensure Mesos based execution frameworks work without any changes when
  running alongside Myriad. Mesos applications will continue to interact
 and
  run on Mesos and can choose to be unaware of Myriad.
  - Provide isolation for multi-tenancy.
- Use linux cgroups (and optionally Docker-like technologies to ease
  packaging, deployment and broader isolation) so that multiple YARN
 clusters
  can run in their own space and are isolated from each other. YARN’s RM
 and
  NMs are dockerized.
  - Myriad should be able to manage full YARN lifecycle:
- Bring up YARN (RM, NM)
- Scale Up/Down YARN
- Release resources and shut down YARN
 
  ** Longer Term Goals
  - Allow fine-grained dynamic allocation of resources to Hadoop including
  the ability to scale up and scale down the cluster.
- Provide different policies to allow downsizing running applications
 on
  Hadoop when resources are taken away from it.
- Provide a framework so the downsizing policy is pluggable and users
 can
  write their own implementations.
  - Allow multiple versions of Apache Hadoop to run on the same physical
  infrastructure
  - Allow workload portability - ability to migrate YARN workloads across
  various cloud infrastructures seamlessly (e.g. GCE, AWS, etc)
  - Security:
- Authentication Requirements:
  - Support basic CRAM-MD5 password authentication between Myriad and
  Mesos. Additional authentication mechanisms may be supported in the
 future.
  - Traditional user authentication with Hadoop’s HTTP web-consoles
  should work as usual.
- Authorization:
  - Only authorized users are allowed to launch YARN 

Re: [DISCUSS] [PROPOSAL] Myriad for Apache Incubator

2015-02-17 Thread Henry Saputra
I love this project and the idea. Tried to hack it couple years ago
could not make it work.

Looking forward seeing it in ASF incubator for sure.

@Adam and @Ted, like any new incubator projects coming we always check
if you need user@ so early in the process?
Would probably better to have all discussion in dev@ early in incubation.

- Henry

On Fri, Feb 13, 2015 at 5:06 PM, Adam Bordelon a...@mesosphere.io wrote:
 Hello friends,

 The Myriad team and I would like to propose the Myriad project for
 inclusion in the Apache Incubator.
 Full text of the proposal is below. I can add it to the incubator wiki as
 well, if desired.
 Please review and discuss. If there are no major concerns, I will call for
 a Vote after a week.

 Cheers,
 -Adam-
 me@apache

 ==
 Apache Myriad Proposal

 * Abstract
 Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together
 on the same cluster and allows dynamic resource allocations across both
 Hadoop and other applications running on the same physical data center
 infrastructure.

 * Proposal
 The vision of Myriad is to provide a comprehensive framework to ensure
 Apache Hadoop YARN and Apache Mesos can interoperate with minimal changes
 on either side and prevent the static fragmentation of data center
 resources.

 * Background
 Project Myriad is the first resource management framework that allows big
 data developers to run YARN-based Hadoop jobs alongside other applications
 and services in production. ebay Inc., MapR, and Mesosphere jointly built
 Myriad (available on Github at https://github.com/mesos/myriad) with the
 vision of freeing big data jobs from siloed clusters and consolidating
 infrastructure into a single pool of resources for greater utilization and
 operational efficiency. Several companies including Twitter have expressed
 interest in Myriad and have begun testing it.

 * Rationale
 Many Hadoop users are building larger clusters (data lake/data hub
 architectures) that support multiple workloads - made possible by the
 advent of Apache Hadoop YARN. As the clusters grow in size and importance,
 they become an important application within the broader datacenter. At the
 same time, Apache Mesos enables efficient resource isolation and sharing
 across distributed applications for the broader data center, for instance
 MPI, Spark, long running web services, build/test infrastructure,
 traditional linux applications/scripts, and others (including arbitrary
 docker images).

 Myriad aims to enable co-existence of Apache Hadoop YARN and Apache Mesos
 on the same physical data center resources, reducing fragmentation of data
 center resources.

 * Project Goals
 ** Initial Goals
 - Run Myriad alongside Apache Hadoop YARN and Apache Mesos to allow policy
 based allocation of data center resources across Apache Hadoop and other
 distributed applications
 - Ensure YARN based execution frameworks work without any changes when
 running alongside Myriad. YARN Applications will continue to interact and
 run on top of YARN and can choose to be unaware of Myriad.
 - Ensure Mesos based execution frameworks work without any changes when
 running alongside Myriad. Mesos applications will continue to interact and
 run on Mesos and can choose to be unaware of Myriad.
 - Provide isolation for multi-tenancy.
   - Use linux cgroups (and optionally Docker-like technologies to ease
 packaging, deployment and broader isolation) so that multiple YARN clusters
 can run in their own space and are isolated from each other. YARN’s RM and
 NMs are dockerized.
 - Myriad should be able to manage full YARN lifecycle:
   - Bring up YARN (RM, NM)
   - Scale Up/Down YARN
   - Release resources and shut down YARN

 ** Longer Term Goals
 - Allow fine-grained dynamic allocation of resources to Hadoop including
 the ability to scale up and scale down the cluster.
   - Provide different policies to allow downsizing running applications on
 Hadoop when resources are taken away from it.
   - Provide a framework so the downsizing policy is pluggable and users can
 write their own implementations.
 - Allow multiple versions of Apache Hadoop to run on the same physical
 infrastructure
 - Allow workload portability - ability to migrate YARN workloads across
 various cloud infrastructures seamlessly (e.g. GCE, AWS, etc)
 - Security:
   - Authentication Requirements:
 - Support basic CRAM-MD5 password authentication between Myriad and
 Mesos. Additional authentication mechanisms may be supported in the future.
 - Traditional user authentication with Hadoop’s HTTP web-consoles
 should work as usual.
   - Authorization:
 - Only authorized users are allowed to launch YARN clusters.  Mesos
 allows to specify which framework principal is allowed to register as a
 particular role.
   - Encryption on wire:
 - All control traffic to/from Myriad/Mesos
 - Logs
   - Audits (where to store them)
 - Log all major 

Re: [VOTE] Release Apache Parquet Format 2.3.0 RC2

2015-02-17 Thread Ted Dunning
On Tue, Feb 17, 2015 at 5:02 PM, Ryan Blue b...@cloudera.com wrote:

 In the mean time, if you take a look at the .travis.yml file, you can see
 the preparation steps used to run CI jobs. That's what I usually point
 people at instead of maintaining it in two places, but I agree that
 shouldn't be the solution.


Does that answer the question about the checksum?


Re: [DISCUSS] [PROPOSAL] Myriad for Apache Incubator

2015-02-17 Thread Ted Dunning
On Tue, Feb 17, 2015 at 9:38 PM, Henry Saputra henry.sapu...@gmail.com
wrote:

 @Adam and @Ted, like any new incubator projects coming we always check
 if you need user@ so early in the process?
 Would probably better to have all discussion in dev@ early in incubation.


Henry,

This is a good question to ask (and I have asked it in the past).

I think that Myriad is in, or nearly in production here and there already.
That means that a user@ list might well be useful.


Re: [DISCUSS] [PROPOSAL] Myriad for Apache Incubator

2015-02-17 Thread Adam Bordelon
Good point. I'm fine with starting with just a dev@ first, and then we can
add user@ if/when dev becomes too noisy.
I assume adding a new mailing list is relatively painless.

On Tue, Feb 17, 2015 at 9:52 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Tue, Feb 17, 2015 at 9:38 PM, Henry Saputra henry.sapu...@gmail.com
 wrote:

  @Adam and @Ted, like any new incubator projects coming we always check
  if you need user@ so early in the process?
  Would probably better to have all discussion in dev@ early in
 incubation.
 

 Henry,

 This is a good question to ask (and I have asked it in the past).

 I think that Myriad is in, or nearly in production here and there already.
 That means that a user@ list might well be useful.