Re: [DISCUSS] Spark-Kernel Incubator Proposal

2015-11-30 Thread Sree V
Hi David & All,
The 'spark-kernel/torii' is a "good to have" tool.
Pardon me, I am not the judge in any way.After going through this thread and 
the referred links, it seems, by giving it a decent publicity in Apache Spark 
(may be provide link, etc.),would be sufficient enough for its survival and 
evolution, instead going through the entire 'apache incubation'.
I am not undermining the incubation any way.  But for the prep work needed 
(license/trademark, project rename, package rename) to make 'spark-kernel' 
incubation eligible and once in incubation, it needs to keep up with the 
progress of Apache Zeppelin (which is incubating already).

Oh! that also makes me ask, can apache zeppelin & spark-kernel/torri be 
combined into one ?!


Either way,  count me in for any help required with 'spark-kernel/torii'.
Thanking you.With RegardsSree
 


On Monday, November 30, 2015 4:13 PM, Julien Le Dem  
wrote:
 

 Sorry for the late reply.
FYI there is an opensource project called torii already:
https://vestorly.github.io/torii/
Whether there is a trademark or not, I'd recommend a name that does not
collide with another project.

On Wed, Nov 25, 2015 at 9:00 PM, Luciano Resende 
wrote:

> Thanks for all your feedback, we have updated the proposal with the
> following :
>
> - Renamed the project to Torii
> - Added new mentors that volunteered during the discussion
>
> Below is an updated proposal, which I will be calling for a vote shortly.
>
> = Torii =
>
> == Abstract ==
> Torii provides applications with a mechanism to interactively and remotely
> access Apache Spark.
>
> == Proposal ==
> Torii enables interactive applications to access Apache Spark clusters.
> More specifically:
>  * Applications can send code-snippets and libraries for execution by Spark
>  * Applications can be deployed separately from Spark clusters and
> communicate with the Torii using the provided Torii client
>  * Execution results and streaming data can be sent back to calling
> applications
>  * Applications no longer have to be network connected to the workers on a
> Spark cluster because the Torii acts as each application’s proxy
>  * Work has started on enabling Torii to support languages in addition to
> Scala, namely Python (with PySpark), R (with SparkR), and SQL (with
> SparkSQL)
>
> == Background & Rationale ==
> Apache Spark provides applications with a fast and general purpose
> distributed computing engine that supports static and streaming data,
> tabular and graph representations of data, and an extensive library of
> machine learning libraries. Consequently, a wide variety of applications
> will be written for Spark and there will be interactive applications that
> require relatively frequent function evaluations, and batch-oriented
> applications that require one-shot or only occasional evaluation.
>
> Apache Spark provides two mechanisms for applications to connect with
> Spark. The primary mechanism launches applications on Spark clusters using
> spark-submit (
> http://spark.apache.org/docs/latest/submitting-applications.html); this
> requires developers to bundle their application code plus any dependencies
> into JAR files, and then submit them to Spark. A second mechanism is an
> ODBC/JDBC API (
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> )
> which enables applications to issue SQL queries against SparkSQL.
>
> Our experience when developing interactive applications, such as analytic
> applications integrated with Notebooks, to run against Spark was that the
> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> creation and forking processes to run spark-submit), and the SQL interface
> was too limiting and did not offer easy access to components other than
> SparkSQL, such as streaming. The most promising mechanism provided by
> Apache Spark was the command-line shell (
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> )
> which enabled us to execute code snippets and dynamically control the tasks
> submitted to  a Spark cluster. Spark does not provide the command-line
> shell as a consumable service but it provided us with the starting point
> from which we developed Torii.
>
> == Current Status ==
> Torii was first developed by a small team working on an internal-IBM
> Spark-related project in July 2014. In recognition of its likely general
> utility to Spark users and developers, in November 2014 the Torii project
> was moved to GitHub and made available under the Apache License V2.
>
> == Meritocracy ==
> The current developers are familiar with the meritocratic open source
> development process at Apache. As the project has gathered interest at
> GitHub the developers have actively started a process to invite additional
> developers into the project, and we have at least one new developer who is
> ready to contribute code to the project.
>
> == Community ==
> We 

Re: [DISCUSS] Metron incubator proposal

2015-11-30 Thread Sree V
+1.A perfect candidate(Metron/OpenSOC) for apache.org.
 Thanking you.With RegardsSree 


On Monday, November 30, 2015 4:07 PM, P. Taylor Goetz  
wrote:
 

 I'm interested as well, particularly given the ties to Storm.

I'd be happy to volunteer as mentor and/or committer if it would be welcome. I 
have some familiarity with both projects (obviously one more so than the other 
;) ).

-Taylor

> On Nov 30, 2015, at 1:15 PM, larry mccay  wrote:
> 
> This is an interesting proposal that seems would build a community where an
> open one doesn't really exist at the moment.
> A project like this needs a healthy community to survive and scale with the
> pace of changes in attacks.
> I for one would be interested in lending a hand as a contributor or
> committer - if that would be welcomed.
> 
> 
>> On Mon, Nov 30, 2015 at 11:55 AM, Owen O'Malley  wrote:
>> 
>> Hi all,
>> 
>> We'd like to start a discussion proposing creating Metron as an incubator
>> podling. The proposal is on the wiki here:
>> https://wiki.apache.org/incubator/MetronProposal
>> 
>> I would call your attention to the background section in particular. The
>> condensed version is that the original code base (OpenSOC) was created by a
>> company (Cisco) that put it on github as ALv2, but then hasn't been working
>> on it. We posted a message
>> 
>> to the OpenSOC support group a month ago proposing a move to Apache and got
>> a single positive response.
>> 
>> The text of the proposal is included below for easy quoting during
>> discussion.
>> 
>> Thanks,
>>  Owen
>> 
>> = Apache Metron Proposal =
>> 
>> == Abstract ==
>> 
>> The Metron project is an open source project dedicated to providing an
>> extensible and scalable advanced security analytics tool. It has strong
>> foundations in the Apache Hadoop ecosystem.
>> 
>> == Proposal ==
>> 
>> Metron integrates a variety of open source big data technologies in order
>> to offer a centralized tool for security monitoring and analysis. Metron
>> provides capabilities for log aggregation, full packet capture indexing,
>> storage, advanced behavioral analytics and data enrichment, while applying
>> the most current threat-intelligence information to security telemetry
>> within a single platform.
>> 
>> Metron can be divided into 4 areas:
>> 
>>  1. '''A mechanism to capture, store, and normalize any type of security
>> telemetry at extremely high rates.''' Because security telemetry is
>> constantly being generated, it requires a method for ingesting the data at
>> high speeds and pushing it to various processing units for advanced
>> computation and analytics.
>>  1. '''Real time processing and application of enrichments''' such as
>> threat intelligence, geolocation, and DNS information to telemetry being
>> collected. The immediate application of this information to incoming
>> telemetry provides the context and situational awareness, as well as the
>> “who” and “where” information that is critical for investigation.
>>  1. '''Efficient information storage''' based on how the information will
>> be used:
>>    a. Logs and telemetry are stored such that they can be efficiently
>> mined and analyzed for concise security visibility
>>    a. The ability to extract and reconstruct full packets helps an analyst
>> answer questions such as who the true attacker was, what data was leaked,
>> and where that data was sent
>>    a. Long-term storage not only increases visibility over time, but also
>> enables advanced analytics such as machine learning techniques to be used
>> to create models on the information. Incoming data can then be scored
>> against these stored models for advanced anomaly detection.
>>  1. '''An interface that gives a security investigator a centralized view
>> of data and alerts passed through the system.''' Metron’s interface
>> presents alert summaries with threat intelligence and enrichment data
>> specific to that alert on one single page. Furthermore, advanced search
>> capabilities and full packet extraction tools are presented to the analyst
>> for investigation without the need to pivot into additional tools.
>> 
>> Big data is a natural fit for powerful security analytics. The Metron
>> framework integrates a number of elements from the Hadoop ecosystem to
>> provide a scalable platform for security analytics, incorporating such
>> functionality as full-packet capture, stream processing, batch processing,
>> real-time search, and telemetry aggregation. With Metron, our goal is to
>> tie big data into security analytics and drive towards an extensible
>> centralized platform to effectively enable rapid detection and rapid
>> response for advanced security threats.
>> 
>> == Background ==
>> 
>> OpenSOC was developed by Cisco over the last two years and pushed out to
>> Github (https://github.com/OpenSOC/opensoc) under the ALv2. However, the

Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-30 Thread Sree V
+1 (non-binding) Thanking you.With RegardsSree
 


On Monday, November 30, 2015 9:33 AM, stack  wrote:
 

 +1 (binding)
St.Ack
On Nov 24, 2015 11:33 AM, "Todd Lipcon"  wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
>  Releases 
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial 

Re: [VOTE] Accept Torii into Apache Incubator

2015-11-30 Thread Sree V
+1 (non-binding) Thanking you.With RegardsSree 


On Monday, November 30, 2015 3:21 PM, Reynold Xin  
wrote:
 

 +1

> On Dec 1, 2015, at 2:08 AM, Luciano Resende  wrote:
> 
> And off-course, Here is my +1 (binding).
> 
> On Thu, Nov 26, 2015 at 7:33 AM, Luciano Resende 
> wrote:
> 
>> After initial discussion (under the name Spark-Kernel), please vote on
>> the acceptance of Torii Project for incubation at the Apache Incubator.
>> The full proposal is
>> available at the end of this message and on the wiki at :
>> 
>> https://wiki.apache.org/incubator/ToriiProposal
>> 
>> Please cast your votes:
>> 
>> [ ] +1, bring Torii into Incubator
>> [ ] +0, I don't care either way
>> [ ] -1, do not bring Torii into Incubator, because...
>> 
>> Due to long weekend holiday in US, I will leave the vote open until
>> December 1st.
>> 
>> 
>> = Torii =
>> 
>> == Abstract ==
>> Torii provides applications with a mechanism to interactively and remotely
>> access Apache Spark.
>> 
>> == Proposal ==
>> Torii enables interactive applications to access Apache Spark clusters.
>> More specifically:
>> * Applications can send code-snippets and libraries for execution by Spark
>> * Applications can be deployed separately from Spark clusters and
>> communicate with the Torii using the provided Torii client
>> * Execution results and streaming data can be sent back to calling
>> applications
>> * Applications no longer have to be network connected to the workers on a
>> Spark cluster because the Torii acts as each application’s proxy
>> * Work has started on enabling Torii to support languages in addition to
>> Scala, namely Python (with PySpark), R (with SparkR), and SQL (with
>> SparkSQL)
>> 
>> == Background & Rationale ==
>> Apache Spark provides applications with a fast and general purpose
>> distributed computing engine that supports static and streaming data,
>> tabular and graph representations of data, and an extensive library of
>> machine learning libraries. Consequently, a wide variety of applications
>> will be written for Spark and there will be interactive applications that
>> require relatively frequent function evaluations, and batch-oriented
>> applications that require one-shot or only occasional evaluation.
>> 
>> Apache Spark provides two mechanisms for applications to connect with
>> Spark. The primary mechanism launches applications on Spark clusters using
>> spark-submit (
>> http://spark.apache.org/docs/latest/submitting-applications.html); this
>> requires developers to bundle their application code plus any dependencies
>> into JAR files, and then submit them to Spark. A second mechanism is an
>> ODBC/JDBC API (
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
>> which enables applications to issue SQL queries against SparkSQL.
>> 
>> Our experience when developing interactive applications, such as analytic
>> applications integrated with Notebooks, to run against Spark was that the
>> spark-submit mechanism was overly cumbersome and slow (requiring JAR
>> creation and forking processes to run spark-submit), and the SQL interface
>> was too limiting and did not offer easy access to components other than
>> SparkSQL, such as streaming. The most promising mechanism provided by
>> Apache Spark was the command-line shell (
>> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
>> which enabled us to execute code snippets and dynamically control the tasks
>> submitted to  a Spark cluster. Spark does not provide the command-line
>> shell as a consumable service but it provided us with the starting point
>> from which we developed Torii.
>> 
>> == Current Status ==
>> Torii was first developed by a small team working on an internal-IBM
>> Spark-related project in July 2014. In recognition of its likely general
>> utility to Spark users and developers, in November 2014 the Torii project
>> was moved to GitHub and made available under the Apache License V2.
>> 
>> == Meritocracy ==
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. As the project has gathered interest at
>> GitHub the developers have actively started a process to invite additional
>> developers into the project, and we have at least one new developer who is
>> ready to contribute code to the project.
>> 
>> == Community ==
>> We started building a community around Torii project when we moved it to
>> GitHub about one year ago. Since then we have grown to about 70 people, and
>> there are regular requests and suggestions from the community. We believe
>> that providing Apache Spark application developers with a general-purpose
>> and interactive API holds a lot of community potential, especially
>> considering possible tie-in’s with Notebooks and data science community.
>> 
>> == Core Developers ==
>> The core developers of the project are currently all from 

Re: [VOTE] Accept Impala into the Apache Incubator

2015-11-30 Thread Sree V
+1 (non-binding) Thanking you.With RegardsSree 


On Monday, November 30, 2015 9:34 AM, stack  wrote:
 

 +1 (binding)
St.Ack
On Nov 24, 2015 1:04 PM, "Henry Robinson"  wrote:

> Hi -
>
> The [DISCUSS] thread has been quiet for a few days, so I think there's been
> sufficient opportunity for discussion around our proposal to bring Impala
> to the ASF Incubator.
>
> I'd like to call a VOTE on that proposal, which is on the wiki at
> https://wiki.apache.org/incubator/ImpalaProposal, and which I've pasted
> below.
>
> During the discussion period, the proposal has been amended to add Brock
> Noland as a new mentor, to add one missed committer from the list and to
> correct some issues with the dependency list.
>
> Please cast your votes as follows:
>
> [] +1, accept Impala into the Incubator
> [] +/-0, non-counted vote to express a disposition
> [] -1, do not accept Impala into the Incubator (please give your reason(s))
>
> As with the concurrent Kudu vote, I propose leaving the vote open for a
> full seven days (to close at Tuesday, December 1st at noon PST), due to the
> upcoming US holiday.
>
> Thanks,
> Henry
>
> 
>
> = Abstract =
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
>
> = Proposal =
>
> We propose to contribute the Impala codebase and associated artifacts (e.g.
> documentation, web-site content etc.) to the Apache Software Foundation
> with the intent of forming a productive, meritocratic and open community
> around Impala’s continued development, according to the ‘Apache Way’.
>
> Cloudera owns several trademarks regarding Impala, and proposes to transfer
> ownership of those trademarks in full to the ASF.
>
> = Background =
> Engineers at Cloudera developed Impala and released it as an
> Apache-licensed open-source project in Fall 2012. Impala was written as a
> brand-new, modern C++ SQL engine targeted from the start for data stored in
> Apache Hadoop clusters.
>
> Impala’s most important benefit to users is high-performance, making it
> extremely appropriate for common enterprise analytic and business
> intelligence workloads. This is achieved by a number of software
> techniques, including: native support for data stored in HDFS and related
> filesystems, just-in-time compilation and optimization of individual query
> plans, high-performance C++ codebase and massively-parallel distributed
> architecture. In benchmarks, Impala is routinely amongst the very highest
> performing SQL query engines.
>
> = Rationale =
>
> Despite the exciting innovation in the so-called ‘big-data’ space, SQL
> remains by far the most common interface for interacting with data in both
> traditional warehouses and modern ‘big-data’ clusters. There is clearly a
> need, as evidenced by the eager adoption of Impala and other SQL engines in
> enterprise contexts, for a query engine that offers the familiar SQL
> interface, but that has been specifically designed to operate in massive,
> distributed clusters rather than in traditional, fixed-hardware,
> warehouse-specific deployments. Impala is one such query engine.
>
> We believe that the ASF is the right venue to foster an open-source
> community around Impala’s development. We expect that Impala will benefit
> from more productive collaboration with related Apache projects, and under
> the auspices of the ASF will attract talented contributors who will push
> Impala’s development forward at pace.
>
> We believe that the timing is right for Impala’s development to move
> wholesale to the ASF: Impala is well-established, has been Apache-licensed
> open-source for more than three years, and the core project is relatively
> stable. We are excited to see where an ASF-based community can take Impala
> from this strong starting point.
>
> = Initial Goals =
> Our initial goals are as follows:
>
>  * Establish ASF-compatible engineering practices and workflows
>  * Refactor and publish existing internal build scripts and test
> infrastructure, in order to make them usable by any community member.
>  * Transfer source code, documentation and associated artifacts to the ASF.
>  * Grow the user and developer communities
>
> = Current Status =
>
> Impala is developed as an Apache-licensed open-source project. The source
> code is available at http://github.com/cloudera/Impala, and developer
> documentation is at https://github.com/cloudera/Impala/wiki. The majority
> of commits to the project have come from Cloudera-employed developers, but
> we have accepted some contributions from individuals from other
> organizations.
>
> All code reviews are done via a public instance of the Gerrit review tool
> at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
> list. All patches must be reviewed before they are accepted into the
> codebase, via a voting mechanism that is similar to that used on Apache
> projects such as Hadoop and HBase.
>
> Before a 

Request to add

2015-04-23 Thread Sree V
Hi,
Please add me to the contributers 
page.https://wiki.apache.org/incubator/ContributorsGroup

My wiki id is : SreeVaddi
I have been contributing to drill, calcite, ranger, kafka and spark,from their 
beginning at incubation.

Thanking you.

With Regards
Sree Vaddi650.213.2707 M
 

site content question

2014-10-28 Thread Sree V
Hi,
What is the recommended way to setup an apache incubator site ?(cms, svnpubsub, 
markdown, twiki, ...) ?
My understanding is,site content in SVN andproject sources in Apache Git 
(cloned to GitHub).What does apache recommends ? 
When does apache allows for a variation ?
Is variation an option, in the first place ?
When is it OK to have all my sources and site content in bothSVN and in Apache 
Git ?
Thanking you.

With Regards
Sree