The Apache Software Foundation Announces Apache™ Spark™ v1.0

Sally Khudairi Fri, 30 May 2014 03:09:07 -0700

>> NOTE: this announcement is also available online at http://s.apache.org/VEc


Open Source large-scale, flexible, "Hadoop Swiss Army Knife" cluster computing 
framework offers enhanced data analysis and richer integration with other 
Apache projects 

Forest Hill, MD –30 May 2014– The Apache Software Foundation (ASF), the 
all-volunteer developers, stewards, and incubators of more than 170 Open Source 
projects and initiatives, announced today the availability of Apache Spark 
v1.0, the super-fast, Open Source large-scale data processing and advanced 
analytics engine. 

Apache Spark has been dubbed a "Hadoop Swiss Army knife" for its remarkable 
speed and ease of use, allowing developers to quickly write applications in 
Java, Scala, or Python, using its built-in set of over 80 high-level operators. 
With Spark, programs can run up to 100x faster than Apache Hadoop MapReduce in 
memory. 

"1.0 is a huge milestone for the fast-growing Spark community. Every 
contributor and user who's helped bring Spark to this point should feel proud 
of this release," said Matei Zaharia, Vice President of Apache Spark. 

Apache Spark is well-suited for machine learning,  interactive queries, and 
stream processing. It is 100% compatible with Hadoop’s Distributed File System 
(HDFS), HBase, Cassandra, as well as any Hadoop storage system, making existing 
data immediately usable in Spark. In addition, Spark supports SQL queries, 
streaming data, and complex analytics such as machine learning and graph 
algorithms out-of-the-box. 

New in v1.0, Apache Spark offers strong API stability guarantees 
(backward-compatibility throughout the 1.X series), a new Spark SQL component 
for accessing structured data, as well as richer integration with other Apache 
projects (Hadoop YARN, Hive, and Mesos). 

Patrick Wendell, software engineer at Databricks and Apache Spark 1.0 release 
manager explained, "In addition to providing long-term stability for Spark's 
core APIs, this release contains a several new features. Spark 1.0 adds a 
unified submission tool for deploying applications on a local machine, Mesos, 
YARN, or a dedicated cluster. We've added a new module, Spark SQL, to provide 
schema-aware data modeling and SQL language support in Spark. Spark's machine 
learning library, MLLib, has been enhanced with several new algorithms. Spark’s 
streaming and graph libraries have also seen major updates. Across the board, 
we've focused on building tools to empower the data scientists, statisticians 
and engineers who must grapple with large data sets every day." 

Spark was originally developed at UC Berkeley AMP Lab, and its ease of use has 
made it a go-to solution for both small and large enterprise environments 
across a wide range of industries, including Alibaba, ClearStory Data, 
Cloudera, Databricks, IBM, Intel, MapR, Ooyala, and Yahoo, among others. Not 
only are organizations rapidly adopting and deploying Apache Spark, many 
contributors are committing code to the project as well. 

"Apache Spark is an important big data technology in delivering a high 
performance analytics solution for the IT industry and satisfying the 
fast-growing customer demand," said Michael Greene, Vice President and General 
Manager of System Technologies and Optimization at Intel. "Intel is proud to 
participate in its development and we congratulate the community on this 
release." 

"At NASA, we're really excited to leverage Spark and its highly interactive 
analytic capabilities and the speedups offered by 1.0 along with Spark SQL are 
going to help out critical projects looking at measurement of Snow in the 
Western US and also on projects related to Regional Climate Modeling and in 
Model Evaluation for the U.S. National Climate Assessment related Activities," 
said Chris Mattmann, an ASF Director, Chief Architect, Instrument and Science 
Data Systems Section at NASA JPL, and Adjunct Associate Professor at the 
University of Southern California. "I'm looking forward to designing 
Spark-related projects in my Software Architectures and in my Search Engines 
courses at USC as well. The community is one of our most active at the ASF and 
the interest has really peaked and these guys are doing a great job." 

"We're continuing to see very fast growth — 102 individuals have contributed 
patches to this release over the past four months, which is our highest number 
of contributors ever," added Zaharia. 

Availability and Oversight
As with all Apache products, Apache Spark software is released under the Apache 
License v2.0, and is overseen by a self-selected team of active contributors to 
the project. A Project Management Committee (PMC) guides the Project’s 
day-to-day operations, including community development and product releases. 
For documentation and ways to become involved with Apache Spark, visit 
http://spark.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than one 
hundred and seventy leading Open Source projects, including Apache HTTP Server 
--the world's most popular Web server software. Through the ASF's meritocratic 
process known as "The Apache Way," more than 400 individual Members and 3,500 
Committers successfully collaborate to develop freely available 
enterprise-grade software, benefiting millions of users worldwide: thousands of 
software solutions are distributed under the Apache License; and the community 
actively participates in ASF mailing lists, mentoring initiatives, and 
ApacheCon, the Foundation's official user conference, trainings, and expo. The 
ASF is a US 501(c)(3) charitable organization, funded by individual donations 
and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, 
Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt 
Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo.
For more information, visit http://www.apache.org/ or follow @TheASF on 
Twitter. 

"Apache", "Spark", "Apache Spark", and "ApacheCon" are trademarks of The Apache 
Software Foundation. All other brands and trademarks are the property of their 
respective owners.

# # #

NOTE: you are receiving this message because you are subscribed to the 
announce@apache.org distribution list. To unsubscribe, send email from the 
recipient account to announce-unsubscr...@apache.org with the word 
"Unsubscribe" in the subject line.

The Apache Software Foundation Announces Apache™ Spark™ v1.0

Reply via email to