Re: [VOTE] Recommend Apache Rya graduation to Top Level Project resolution to the Board

2019-09-11 Thread Madhawa Kasun Gunasekara
+1

Thanks,
Madhawa


On Thu, Sep 12, 2019 at 8:08 AM Justin Mclean 
wrote:

> Hi,
>
> +1 and good luck with your journey as a TLP.
>
> Thanks,
>
> Justin
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept Science Data Analytics Platform (SDAP) into Apache Incubator WAS Re: [DISCUSS] Accept Science Data Analytics Platform (SDAP) into Apache Incubator

2017-10-17 Thread Madhawa Kasun Gunasekara
Here is my +1

Thanks,
Madhawa

Madhawa

On Tue, Oct 17, 2017 at 4:04 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
> Having secured a mentorship team consisting of the following IPMC Members,
> I am happy to open a formal VOTE thread on accepting the Science Data
> Analytics Platform (SDAP) into Apache Incubator.
>
>- Lewis John McGibbney (lewi...@apache.org)
>- Raphael Bircher (bircher at apace dot org)
>- Suneel Marthi (smarthi at apache dot org)
>
> Thank you to both Raphael and Suneel for coming forward. :)
> The VOTE will be open for at least 72 hours.
>
> [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache Incubator
> [ ] +/-0 ... just because
> [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache
> Incubator... because
>
> Thanks in advance to all participants.
> Lewis
>
> P.S. Here is a binding +1 from me
>
> On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney  >
> wrote:
>
> > Hi Folks,
> > I would like to open a DISCUSS thread on the topic of accepting the
> > Science Data Analytics Platform (SDAP)  > incubator/SDAPProposal> Project into the Incubator.
> > I am CC'ing Thomas Huang from NASA JPL who I have been working with to
> > build community around a kick-ass set of software projects under the SDAP
> > umbrella.
> > At this stage we would very much appreciate critical feedback from
> general@
> > community. We are also open to mentors who may have an interest in the
> > project proposal.
> > The proposal is pasted below.
> > Thanks in advance,
> > Lewis
> >
> > = Abstract =
> > The Science Data Analytics Platform (SDAP) establishes an integrated data
> > analytic center for Big Science problems. It focuses on technology
> > integration, advancement and maturity.
> >
> > = Proposal =
> > SDAP currently represents a collaboration between NASA Jet Propulsion
> > Laboratory (JPL), Florida State University (FSU), the National Center for
> > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP
> brings
> > together a number of big data technologies including a NASA funded
> > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data
> > analytic platform), DOMS (Distributed in-situ to satellite matchup),
> MUDROD
> > (Search relevancy and discovery) and VQSS (Virtualized Quality Screening
> > Service) under a single umbrella. Within the original Incubator proposal,
> > VQSS will not be included however it is anticipated that a future source
> > code donation will cover VQSS.
> >
> > = Background and Rationale =
> > SDAP is a technology software solution currently geared to better enable
> > scientists involved in advancing the study of the Earth's physical
> > oceanography. With increasing global temperature, warming of the ocean,
> and
> > melting ice sheets and glaciers, the impacts can be observed from changes
> > in anomalous ocean temperature and circulation patterns, to increasing
> > extreme weather events and stronger/more frequent hurricanes, sea level
> > rise and storm surges affecting coastlines, and may involve drastic
> changes
> > and shifts in marine ecosystems. Ocean science communities are relying on
> > data distributed through data centers such as the JPL's Physical
> > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their
> > research. In typical investigations, oceanographers follow a traditional
> > workflow for using datasets: search, evaluate, download, and apply tools
> > and algorithms to look for trends. While this workflow has been working
> > very well historically for the oceanographic community, it cannot scale
> if
> > the research involves massive amount of data. NASA's Surface Water and
> > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, is
> > expected to generate over 20PB data for a nominal 3-year mission. This
> will
> > challenge all existing NASA Earth Science data archival/distribution
> > paradigms. It will no longer be feasible for Earth scientists to download
> > and analyze such volumes of data. SDAP was therefore developed primarily
> as
> > a Web-service platform for big ocean data science at the PO.DAAC with
> open
> > source solutions used to enable fast analysis of oceanographic data. SDAP
> > has been developed collaboratively between JPL, FSU, NCAR, and GMU and is
> > rapidly maturing to become the generic platform for the next generation
> of
> > big science data solutions. The platform is an orchestration of several
> > previously funded NASA big ocean data solutions using cloud technology,
> > which include data analysis (NEXUS), anomaly detection (OceanXtremes),
> > matchup (DOMS), subsetting, discovery (MUDROD), and visualization (VQSS).
> > SDAP will enable web-accessible, fast data analysis directly on huge
> > scientific data archives to minimize data movement and provide access,
> > including subset, only to the relevant data.
> >
> > = Science Data Analytics 

Re: [VOTE] Livy to enter Apache Incubator

2017-05-31 Thread Madhawa Kasun Gunasekara
+1 (non binding)

Madhawa

On Thu, Jun 1, 2017 at 6:23 AM, Raphael Bircher 
wrote:

> +1 (binding)
>
>
> Am .05.2017, 15:03 Uhr, schrieb Sean Busbey :
>
> Hi folks!
>>
>> I'm calling a vote to accept "Livy" into the Apache Incubator.
>>
>> The full proposal is available below, and is also available in the wiki:
>>
>> https://wiki.apache.org/incubator/LivyProposal
>>
>> For additional context, please see the discussion thread:
>>
>> https://s.apache.org/incubator-livy-proposal-thread
>>
>> Please cast your vote:
>>
>> [ ] +1, bring Livy into Incubator
>> [ ] -1, do not bring Livy into Incubator, because...
>>
>> The vote will open at least for 72 hours and only votes from the Incubator
>> PMC are binding.
>>
>> I start with my vote:
>> +1
>>
>> 
>>
>> = Abstract =
>>
>> Livy is web service that exposes a REST interface for managing long
>> running
>> Apache Spark contexts in your cluster. With Livy, new applications can be
>> built on top of Apache Spark that require fine grained interaction with
>> many
>> Spark contexts.
>>
>> = Proposal =
>>
>> Livy is an open-source REST service for Apache Spark. Livy enables
>> applications to submit Spark applications and retrieve results without a
>> co-location requirement on the Spark cluster.
>>
>> We propose to contribute the Livy codebase and associated artifacts (e.g.
>> documentation, web-site context etc) to the Apache Software Foundation.
>>
>> = Background =
>>
>> Apache Spark is a fast and general purpose distributed compute engine,
>> with
>> a versatile API. It enables processing of large quantities of static data
>> distributed over a cluster of machines, as well as processing of
>> continuous
>> streams of data. It is the preferred distributed data processing engine
>> for
>> data engineering, stream processing and data science workloads. Each Spark
>> application uses a construct called the SparkContext, which is the
>> application’s connection or entry point to the Spark engine. Each Spark
>> application will have its own SparkContext.
>>
>> Livy enables clients to interact with one or more Spark sessions through
>> the
>> Livy Server, which acts as a proxy layer. Livy Clients have fine grained
>> control over the lifecycle of the Spark sessions, as well as the ability
>> to
>> submit jobs and retrieve results, all over HTTP. Clients have two modes of
>> interaction: RPC Client API, available in Java and Python, which allows
>> results to be retrieved as Java or Python objects. The serialization and
>> deserialization of the results is handled by the Livy framework. HTTP
>> based
>> API that allows submission of code snippets, and retrieval of the results
>> in
>> different formats.
>>
>> Multi-tenant resource allocation and security: Livy enables multiple
>> independent Spark sessions to be managed simultaneously. Multiple clients
>> can also interact simultaneously with the same Spark session and share the
>> resources of that Spark session. Livy can also enforce secure,
>> authenticated
>> communication between the clients and their respective Spark sessions.
>>
>> More information on Livy can be found at the existing open source website:
>> http://livy.io/
>>
>> = Rationale =
>>
>> Users want to use Spark’s powerful processing engine and API as the data
>>
>> processing backend for interactive applications. However, the job
>> submission
>> and application interaction mechanisms built into Apache Spark are
>> insufficient and cumbersome for multi-user interactive applications.
>>
>> The primary mechanism for applications to submit Spark jobs is via
>> spark-submit
>> (http://spark.apache.org/docs/latest/submitting-applications.html),
>> which is
>> available as a command line tool as well as a programmatic API. However,
>> spark-submit has the following limitations that make it difficult to build
>> interactive applications: It is slow: each invocation of spark-submit
>> involves a setup phase where cluster resources are acquired, new processes
>> are forked, etc. This setup phase runs for many seconds, or even minutes,
>> and hence is too slow for interactive applications. It is cumbersome and
>> lacks flexibility: application code and dependencies have to be
>> pre-compiled
>> and submitted as jars, and can not be submitted interactively.
>>
>> Apache Spark comes with an ODBC/JDBC server, which can be used to submit
>> SQL
>> queries to Spark. However, this solution is limited to SQL and does not
>> allow the client to leverage the rest of the Spark API, such as RDDs,
>> MLlib
>> and Streaming.
>>
>> A third way of using Spark is via its command-line shell, which allows the
>> interactive submission of snippets of Spark code. However, the shell
>> entails
>> running Spark code on the client machine and hence is not a viable
>> mechanism
>> for remote clients to submit Spark jobs.
>>
>> Livy solves the limitations of the above three mechanisms, and provides
>> the
>> full Spark API as a 

Re: [VOTE] Accept CarbonData into the Apache Incubator

2016-05-27 Thread Madhawa Kasun Gunasekara
+1

Thanks,
Madhawa

Madhawa

On Fri, May 27, 2016 at 11:16 AM, Jean-Baptiste Onofré 
wrote:

> Hi Jim,
>
> good point. Let me try to explain this "gap" regarding my discussion with
> the team:
>
> 1. Some people have been involved mostly in architecture and design more
> directly in code. That's why they are part of the initial committer list,
> whereas they didn't really provide "visible" code on github.
>
> 2. Some people are no more involved in the project. That's why they don't
> appear on the initial committer list.
>
> Regards
> JB
>
>
> On 05/26/2016 05:45 PM, Jim Jagielski wrote:
>
>> I am trying to align the list of initial committers with
>> the list of current/active contributors, according to
>> Github, and I am seeing people proposed who have not
>> contributed anything and people NOT proposed who seem
>> to be kinda active...
>>
>> Sooo. -0
>>
>> On May 25, 2016, at 4:24 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> following the discussion thread, I'm now calling a vote to accept
>>> CarbonData into the Incubator.
>>>
>>> ​[ ] +1 Accept CarbonData into the Apache Incubator
>>> [ ] +0 Abstain
>>> [ ] -1 Do not accept CarbonData into the Apache Incubator, because ...
>>>
>>> This vote is open for 72 hours.
>>>
>>> The proposal follows, you can also access the wiki page:
>>> https://wiki.apache.org/incubator/CarbonDataProposal
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>> = Apache CarbonData =
>>>
>>> == Abstract ==
>>>
>>> Apache CarbonData is a new Apache Hadoop native file format for faster
>>> interactive
>>> query using advanced columnar storage, index, compression and encoding
>>> techniques
>>> to improve computing efficiency, in turn it will help speedup queries an
>>> order of
>>> magnitude faster over PetaBytes of data.
>>>
>>> CarbonData github address: https://github.com/HuaweiBigData/carbondata
>>>
>>> == Background ==
>>>
>>> Huawei is an ICT solution provider, we are committed to enhancing
>>> customer experiences for telecom carriers, enterprises, and consumers on
>>> big data, In order to satisfy the following customer requirements, we
>>> created a new Hadoop native file format:
>>>
>>> * Support interactive OLAP-style query over big data in seconds.
>>> * Support fast query on individual record which require touching all
>>> fields.
>>> * Fast data loading speed and support incremental load in period of
>>> minutes.
>>> * Support HDFS so that customer can leverage existing Hadoop cluster.
>>> * Support time based data retention.
>>>
>>> Based on these requirements, we investigated existing file formats in
>>> the Hadoop eco-system, but we could not find a suitable solution that
>>> satisfying requirements all at the same time, so we start designing
>>> CarbonData.
>>>
>>> == Rationale ==
>>>
>>> CarbonData contains multiple modules, which are classified into two
>>> categories:
>>>
>>> 1. CarbonData File Format: which contains core implementation for file
>>> format such as columnar,index,dictionary,encoding+compression,API for
>>> reading/writing etc.
>>> 2. CarbonData integration with big data processing framework such as
>>> Apache Spark, Apache Hive etc. Apache Beam is also planned to abstract the
>>> execution runtime.
>>>
>>> === CarbonData File Format ===
>>>
>>> CarbonData file format is a columnar store in HDFS, it has many features
>>> that a modern columnar format has, such as splittable, compression schema
>>> ,complex data type etc. And CarbonData has following unique features:
>>>
>>>  Indexing 
>>>
>>> In order to support fast interactive query, CarbonData leverage indexing
>>> technology to reduce I/O scans. CarbonData files stores data along with
>>> index, the index is not stored separately but the CarbonData file itself
>>> contains the index. In current implementation, CarbonData supports 3 types
>>> of indexing:
>>>
>>> 1. Multi-dimensional Key (B+ Tree index)
>>> The Data block are written in sequence to the disk and within each data
>>> blocks each column block is written in sequence. Finally, the metadata
>>> block for the file is written with information about byte positions of each
>>> block in the file, Min-Max statistics index and the start and end MDK of
>>> each data block. Since, the entire data in the file is in sorted order, the
>>> start and end MDK of each data block can be used to construct a B+Tree and
>>> the file can be logically  represented as a B+Tree with the data blocks as
>>> leaf nodes (on disk) and the remaining non-leaf nodes in memory.
>>> 2. Inverted index
>>> Inverted index is widely used in search engine. By using this index, it
>>> helps processing/query engine to do filtering inside one HDFS block.
>>> Furthermore, query acceleration for count distinct like operation is made
>>> possible when combining bitmap and inverted index in query time.
>>> 3. MinMax index
>>> For all columns, minmax index is created so that processing/query engine
>>> can skip scan 

Re: [VOTE] Graduate Zeppelin from the Incubator

2016-04-18 Thread Madhawa Kasun Gunasekara
+ 1 ( non binding)

Madhawa

On Mon, Apr 18, 2016 at 5:05 PM, Woonsan Ko  wrote:

> +1 (nonbinding)
>
> Woonsan
> On Apr 16, 2016 5:01 AM, "moon soo Lee"  wrote:
>
> > Hi,
> >
> > Apache Zeppelin started incubating about a year and 4 months ago
> > (2014-12-23) and the members of the community think that it is ready to
> > graduate from the incubator to be a TLP.
> >
> > Since it's inception, Zeppelin community has made 3 releases, recruited 4
> > PPMC and resolved 500+ issues [1] with 90+ contributors [2]. Now,
> community
> > is very open, active and continuously growing.
> >
> > The Apache Zeppelin community has discussed and voted on graduation to
> > top level
> > project.
> > The vote passed with 22 +1 votes (9 binding) and no 0 or -1 votes.
> >
> > Incubation Status:
> > http://incubator.apache.org/projects/zeppelin.html
> > Maturity Assessment:
> >
> >
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Apache+Zeppelin+Project+Maturity+Model
> > Discussion:
> > https://s.apache.org/gLi0
> > https://s.apache.org/GhqY (continue)
> > Vote:
> > https://s.apache.org/7hCK
> > Result:
> > https://s.apache.org/1rJD
> >
> > Please vote on the resolution pasted below to graduate Apache Zeppelin
> > from the incubator to top level project.
> >
> > [ ] +1 Graduate Apache Zeppelin from the Incubator.
> > [ ] +0 Don't care.
> > [ ] -1 Don't graduate Apache Zeppelin from the Incubator because
> >
> > This vote will be open for at least 72 hours.
> > Many thanks to our mentors and everyone else for the support,
> >
> > [1] https://s.apache.org/eswD
> > [2] https://s.apache.org/gi3o
> >
> > Apache Zeppelin top-level project resolution:
> > 
> >
> > WHEREAS, the Board of Directors deems it to be in the best
> > interests of the Foundation and consistent with the
> > Foundation's purpose to establish a Project Management
> > Committee charged with the creation and maintenance of
> > open-source software, for distribution at no charge to
> > the public, related to a collaborative data analytics and
> > visualization tool for general-purpose data processing systems.
> >
> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> > Committee (PMC), to be known as the "Apache Zeppelin Project",
> > be and hereby is established pursuant to Bylaws of the
> > Foundation; and be it further
> >
> > RESOLVED, that the Apache Zeppelin Project be and hereby is
> > responsible for the creation and maintenance of software
> > related to a collaborative data analytics and
> > visualization tool for general-purpose data processing systems; and be it
> > further
> >
> > RESOLVED, that the office of "Vice President, Apache Zeppelin" be
> > and hereby is created, the person holding such office to
> > serve at the direction of the Board of Directors as the chair
> > of the Apache Zeppelin Project, and to have primary responsibility
> > for management of the projects within the scope of
> > responsibility of the Apache Zeppelin Project; and be it further
> >
> > RESOLVED, that the persons listed immediately below be and
> > hereby are appointed to serve as the initial members of the
> > Apache Zeppelin Project:
> >
> > * Alexander Bezzubov 
> > * Anthony Corbacho 
> > * Damien Corneau 
> > * Felix Cheung 
> > * Jongyoul Lee 
> > * Kevin Sangwoo Kim 
> > * Lee Moon Soo 
> > * Mina Lee 
> > * Prabhjyot Singh 
> >
> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Lee Moon Soo
> > be appointed to the office of Vice President, Apache Zeppelin, to
> > serve in accordance with and subject to the direction of the
> > Board of Directors and the Bylaws of the Foundation until
> > death, resignation, retirement, removal or disqualification,
> > or until a successor is appointed; and be it further
> >
> > RESOLVED, that the initial Apache Zeppelin PMC be and hereby is
> > tasked with the creation of a set of bylaws intended to
> > encourage open development and increased participation in the
> > Apache Zeppelin Project; and be it further
> >
> > RESOLVED, that the Apache Zeppelin Project be and hereby
> > is tasked with the migration and rationalization of the Apache
> > Incubator Zeppelin podling; and be it further
> >
> > RESOLVED, that all responsibilities pertaining to the Apache
> > Incubator Zeppelin podling encumbered upon the Apache Incubator
> > Project are hereafter discharge.
> >
>


Re: [VOTE] Accept SystemML into Apache Incubator

2015-10-28 Thread Madhawa Kasun Gunasekara
+1

Madhawa

On Wed, Oct 28, 2015 at 10:33 AM, Luciano Resende 
wrote:

> On Tue, Oct 27, 2015 at 9:52 PM, Luciano Resende 
> wrote:
>
> >
> > After initial discussion, please vote on the acceptance of SystemML
> > Project for incubation at the Apache Incubator. The full proposal is
> > available at the end of this message and on the wiki at :
> >
> > https://wiki.apache.org/incubator/SystemML
> > 
> >
> > Please cast your votes:
> >
> > [ ] +1, bring SystemML into Incubator
> > [ ] +0, I don't care either way
> > [ ] -1, do not bring SystemML into Incubator, because...
> >
> > The vote is open for the next 72 hours and only votes from the
> > Incubator PMC are binding.
> >
> >
> > = SystemML =
> >
> > == Abstract ==
> >
> > SystemML provides declarative large-scale machine learning (ML) that aims
> > at flexible specification of ML algorithms and automatic generation of
> > hybrid runtime plans ranging from single node, in-memory computations, to
> > distributed computations on Apache Hadoop MapReduce and  Apache Spark. ML
> > algorithms are expressed in an R-like syntax, that includes linear
> algebra
> > primitives, statistical functions, and ML-specific constructs. This
> > high-level language significantly increases the productivity of data
> > scientists as it provides (1) full flexibility in expressing custom
> > analytics, and (2) data independence from the underlying input formats
> and
> > physical data representations. Automatic optimization according to data
> > characteristics such as distribution on the disk file system, and
> sparsity
> > as well as processing characteristics in the distributed environment like
> > number of nodes, CPU, memory per node, ensures both efficiency and
> > scalability.
> >
> > == Proposal ==
> >
> > The goal of SystemML is to create a commercial friendly, scalable and
> > extensible machine learning framework for data scientists to create or
> > extend machine learning algorithms using a declarative syntax. The
> machine
> > learning framework enables data scientists to develop algorithms locally
> > without the need of a distributed cluster, and scale up and scale out the
> > execution of these algorithms to distributed Apache Hadoop MapReduce or
> > Apache Spark clusters.
> >
> > == Background ==
> >
> > SystemML started as a research project in the IBM Almaden Research Center
> > around 2007 aiming to enable data scientists to develop machine learning
> > algorithms independent of data and cluster characteristics.
> >
> > == Rationale ==
> >
> > SystemML enables the specification of machine learning algorithms using a
> > declarative machine learning (DML) language. DML includes linear algebra
> > primitives, statistical functions, and additional constructs. This
> > high-level language significantly increases the productivity of data
> > scientists as it provides (1) full flexibility in expressing custom
> > analytics and (2) data independence from the underlying input formats and
> > physical data representations.
> >
> > SystemML computations can be executed in a variety of different modes. It
> > supports single node in-memory computations and large-scale distributed
> > cluster computations. This allows the user to quickly prototype new
> > algorithms in local environments but automatically scale to large data
> > sizes as well without changing the algorithm implementation.
> >
> > Algorithms specified in DML are dynamically compiled and optimized based
> > on data and cluster characteristics using rule-based and cost-based
> > optimization techniques. The optimizer automatically generates hybrid
> > runtime execution plans ranging from in-memory single-node execution to
> > distributed computations on Apache Spark or Apache Hadoop MapReduce. This
> > ensures both efficiency and scalability. Automatic optimization reduces
> or
> > eliminates the need to hand-tune distributed runtime execution plans and
> > system configurations.
> >
> > == Initial Goals ==
> >
> > The initial goals to move SystemML to the Apache Incubator is to broaden
> > the community foster the contributions from data scientists to develop
> new
> > machine learning algorithms and enhance the existing ones. Ultimately,
> this
> > may lead to the creation of an industry standard in specifying machine
> > learning algorithms.
> >
> > == Current Status ==
> >
> > The initial code has been developed at the IBM Almaden Research Center in
> > California and has recently been made available in GitHub under the
> Apache
> > Software License 2.0. The project currently supports a single node (in
> > memory computation) as well as distributed computations utilizing Apache
> > Hadoop MapReduce or Apache Spark clusters.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. Several companies have already expressed
> > interest in