Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Scott
+1

On Mon, Mar 11, 2024 at 4:11 AM yangjie01 
wrote:

> +1
>
>
>
> Jie Yang
>
>
>
> *发件人**: *Haejoon Lee 
> *日期**: *2024年3月11日 星期一 17:09
> *收件人**: *Gengliang Wang 
> *抄送**: *dev 
> *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark
>
>
>
> +1
>
>
>
> On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang  wrote:
>
> Hi all,
>
> I'd like to start the vote for SPIP: Structured Logging Framework for
> Apache Spark
>
>
> References:
>
>- JIRA ticket
>
> 
>- SPIP doc
>
> 
>- Discussion thread
>
> 
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
>
> Gengliang Wang
>
>


Spark+AI Summit 2018 (promo code within)

2018-04-23 Thread Scott walent
Spark+AI Summit is only 6 week away.  Keynotes this year include talks from
Tesla, Apple, Databricks, Andreessen Horowitz and many more!

Use code *"*SparkList" and save 15% when registering at
http://databricks.com/sparkaisummit

We hope to see you there.

-Scott


Spark+AI Summit 2018 - San Francisco June 4-6, 2018

2018-03-05 Thread Scott walent
Early Bird pricing ends on Friday.  Book now to save $200+

Full agenda is available: www.databricks.com/sparkaisummit


Futures timeout exception in executor logs

2017-09-14 Thread Simon Scott
Hi,

Just wondering if anybody has any insights on this SPARK-14140: Futures timeout 
exception in executor logs ?

We are seeing the exact same exception during a long-running iterative 
application on a Spark Standalone cluster, v 2.1.
At the same time as the exception appears on an executor, the driver dies 
without any logged message.

Any advice much appreciated

Thanks
Simon Scott
Viavi Solutions


4 days left to submit your abstract to Spark Summit SF

2017-02-02 Thread Scott walent
We are just 4 days away from closing the CFP for Spark Summit 2017.

We have expanded the tracks in SF to include sessions that focus on AI,
Machine Learning and a 60 min deep dive track with technical demos.

Submit your presentation today and join us for the 10th Spark Summit!
Hurry, the CFP closes on February 6th!

https://spark-summit.org/2017/call-for-presentations/


CFP for Spark Summit San Francisco closes on Feb. 6

2017-01-27 Thread Scott walent
In June, the 10th Spark Summit will take place in San Francisco at Moscone
West. We have expanded our CFP to include more topics and deep-dive
technical sessions.

Take center stage in front of your fellow Spark enthusiasts. Submit your
presentation and join us for the big ten. The CFP closes on February 6th!

Submit your abstracts at https://spark-summit.org/2017


Spark Summit East in Boston ‒ 20% off Code

2017-01-25 Thread Scott walent
*There’s less than two weeks to go until Spark Summit East 2017, happening
February 7-9 at the Hynes Convention Center in downtown Boston. It will be
the largest Spark Summit conference ever held on the East Coast, and we
hope to see you there. Sign up at https://spark-summit.org/east-2017
 and use promo code "SPARK17" to save
20% on a two-day pass.The program will explore the future of Apache Spark
and the latest developments in data science, artificial intelligence,
machine learning and more with 110+ community talks in seven different
tracks. There will also be a dozen Keynote Presentations featuring leading
experts from the Broad Institute, Databricks, Forrester, IBM, Intel, UC
Berkeley’s new RISE Lab and other organizations.You’re also invited to
participated in the various networking activities associated with the
conference like the pre-conference Meetup with the Boston Apache Users
Group and the Women in Big Data luncheon.View the full schedule and
register to attend at https://spark-summit.org/east-2017
. We look forward to seeing you there.*


RE: Associating user objects with SparkContext/SparkStreamingContext

2016-06-27 Thread Simon Scott
“move the functions you are passing” yes this is what I had already done – and 
what I hope to avoid

Thank you however for the reminder about @transient – with that I am able to 
create a function value that includes the non-serializable state as a 
@transient val. Which at least packages the solution closer to the code that 
causes the problem.

Cheers
Simon

From: Evan Sparks [mailto:evan.spa...@gmail.com]
Sent: 24 June 2016 16:12
To: Simon Scott <simon.sc...@viavisolutions.com>
Cc: dev@spark.apache.org
Subject: Re: Associating user objects with SparkContext/SparkStreamingContext

I would actually think about this the other way around. Move the functions you 
are passing to the streaming jobs out to their own object if possible. Spark's 
closure capture rules are necessarily far reaching and serialize the object 
that contains these methods, which is a common cause of the problem you're 
seeing.

Another option is to mark the non-serializable state as "@transient" if it is 
never accessed by the worker processes.

On Jun 24, 2016, at 1:23 AM, Simon Scott 
<simon.sc...@viavisolutions.com<mailto:simon.sc...@viavisolutions.com>> wrote:
Hi,

I am developing a streaming application using checkpointing on Spark 1.5.1

I have just run into a NotSerializableException because some of the state that 
my streaming functions need cannot be serialized. This state is only used in 
the driver process, it is the checkpointing that requires the serialization.

So I am considering moving that state into a Scala “object” – i.e. global 
singleton that must be mutable to allow the state to be set at application 
start.

I would prefer to be able to create immutable state and attach it to either the 
SparkContext or SparkStreamingContext but I can’t find an api for that.

Does anybody else think is a good idea? Is there a better way? Or would such an 
api be a useful enhancement to Spark?

Thanks in advance
Simon

Research Developer
Viavi Solutions


Associating user objects with SparkContext/SparkStreamingContext

2016-06-24 Thread Simon Scott
Hi,

I am developing a streaming application using checkpointing on Spark 1.5.1

I have just run into a NotSerializableException because some of the state that 
my streaming functions need cannot be serialized. This state is only used in 
the driver process, it is the checkpointing that requires the serialization.

So I am considering moving that state into a Scala "object" - i.e. global 
singleton that must be mutable to allow the state to be set at application 
start.

I would prefer to be able to create immutable state and attach it to either the 
SparkContext or SparkStreamingContext but I can't find an api for that.

Does anybody else think is a good idea? Is there a better way? Or would such an 
api be a useful enhancement to Spark?

Thanks in advance
Simon

Research Developer
Viavi Solutions


Agenda Announced for Spark Summit 2016 in San Francisco

2016-04-06 Thread Scott walent
Spark Summit 2016 (www.spark-summit.org/2016) will be held from June 6-8 at
the Union Square Hilton in San Francisco, and the recently released agenda
features a stellar lineup of community talks led by top engineers,
architects, data scientists, researchers, entrepreneurs and analysts from
UC Berkeley, Duke, Microsoft, Netflix, Oracle, Bloomberg, Viacom, Airbnb,
Uber, CareerBuilder and, of course, Databricks. There’s also a full day of
hands-on Spark training, with courses for both beginners and advanced
users.

As the excitement around Spark continues to grow, and the rapid adoption
rate shows no signs of slowing down, Spark Summit is growing, too. More
than 2,500 participants are expected at the San Francisco conference,
making it the largest event yet.

Join us in June to learn more about data engineering and data science at
scale, spend time with other members of the Spark community, attend
community meetups, revel in social activities associated with the Summit,
and enjoy the beautiful city by the bay.

Developer Day: (June 7)
Aimed at a highly technical audience, this day will focus on topics about
Spark dealing with memory management, performance, optimization, scale, and
integration with the ecosystem, including dedicated tracks and sessions
covering:
- Keynotes focusing on what’s new with Spark, where Spark is heading, and
technical trends within Big Data
- Five technical tracks, including Developer, Data Science, Spark
Ecosystem, Use Cases & Experiences, and Research
- Office hours from the Spark project leads at the Expo Hall Theater

Enterprise Day: (June 8)
For anyone interested in understanding how Spark is used in the enterprise,
this day will include:
- Keynotes from leading vendors contributing to Spark and enterprise use
cases
- Full day-long track of enterprise talks featuring use cases and a vendor
panel
- Four technical tracks for continued learning from Developer Day

With more than 90 sessions, you’ll be able to pick and choose the topics
that best suit your interests and expertise.

Get Tickets Online
Registration (www.spark-summit.org/2016/register/) is open now, and you can
save $200 when you buy tickets before April 8th.

We hope to see you at Spark Summit 2016 in San Francisco. Follow
@spark_summit and #SparkSummit for updates.

-Spark Summit Organizers


Spark in Production - Use Cases

2016-02-08 Thread Scott walent
Spark Summit East is just 10 days away and we are almost sold out! One of
the highlights this year will focus on how Spark is being used across
businesses to solve both big and small data needs. Check out the full
agenda here: https://spark-summit.org/east-2016/schedule/

Use "ApacheList" for 30% off at registration.

We wanted to highlight a few talks, including keynotes from:
- Chris D'Agostino: Vice President Digital and US Card Servicing Technology
and Engineering at Capital One
- Matei Zaharia:CTO and Co-founder from Databricks
- Seshu Adunuthula: Head of Analytics Infrastructure at eBay

The keynotes are just the start of the summit. The community submitted over
200 talks and we narrowed it down to 60 to be presented in NYC. Here is
just a sampling:
- Top 5 Mistakes When Writing Spark Applications from Cloudera
- Structuring Spark: DataFrames, Datasets, and Streaming from Databricks
- TopNotch: Systematically Quality Controlling Big Data from BlackRock
- Distributed Time Travel for Feature Generation by Netflix

This will be our only summit on the east coast this year, register today to
guarantee a seat! https://spark-summit.org/east-2016/


Spark Summit East - Full Schedule Available

2016-01-18 Thread Scott walent
Join the Apache Spark community at the 2nd annual Spark Summit East from
February 16-18, 2016 in New York City.

We will kick things off with a Spark update from Matei Zaharia followed by
over 60 talks that were selected by the program committee. The agenda this
year includes enterprise talks from Microsoft, Bloomberg and Comcast as
well as the popular developer, data science, research and application
tracks.  See the full agenda at https://spark-summit.org/east-2016/schedule.


If you are new to Spark or looking to improve on your knowledge of the
technology, we are offering three levels of Spark Training: Spark
Essentials, Advanced Exploring Wikipedia with Spark, and Data Science with
Spark. Visit https://spark-summit.org/east-2016/schedule/spark-training for
details.

Space is limited and we anticipate selling out, so register now! Use promo
code "ApacheListEast" to save 20% when registering before January 29, 2016.
Register at https://spark-summit.org/register.

We look forward to seeing you there.

Scott and the Summit Organizers


Spark Summit East 2016 CFP - Closing in 5 days

2015-11-18 Thread Scott walent
Hi Spark Devs and Users,

The CFP for Spark Summit East 2016 (https://spark-summit.org) is closing
this weekend. As the leading event for Apache Spark, this is the chance to
both share key learnings and to gain insights from the creators of Spark,
developers, vendors and peers who are using Spark.

We are looking for presenters who would like to showcase how Spark and its
related technologies are used in a variety of ways, including Applications,
Developer, Research, Data Science and our new in 2016 track, Enterprise.

Don’t wait! The call for presentations for Spark Summit East closes in less
than a week, November 22nd at 11:59 pm PST. Please visit our submission
page (https://spark-summit.org/east-2016/) for additional details.

Regards,
Spark Summit Organizers


Spark Summit 2015 - June 15-17 - Dev list invite

2015-05-14 Thread Scott walent
*Join the Apache Spark community at the fourth Spark Summit in San
Francisco on June 15, 2015. At Spark Summit 2015 you will hear keynotes
from NASA, the CIA, Toyota, Databricks, AWS, Intel, MapR, IBM, Cloudera,
Hortonworks, Timeful, O'Reilly, and Andreessen Horowitz. 260 talks proposal
were submitted by the community, and 55 were accepted. This year you’ll
hear about Spark in use at companies including Uber, Airbnb, Netflix,
Taobao, Red Hat, Edmunds, Oracle and more.  See the full agenda at
http://spark-summit.org/2015 http://spark-summit.org/2015.  *




*If you are new to Spark or looking to improve on your knowledge of the
technology, we have three levels of Spark Training: Intro to Spark,
Advanced DevOps with Spark, and Data Science with Spark. Space is limited
and we will sell out so register now. Use promo code DevList15 to save
15% when registering before June 1, 2015. Register at
http://spark-summit.org/2015/register
http://spark-summit.org/2015/register.I look forward to seeing you
there.Best, Scott  The Spark Summit Organizers*


Spark Summit East - March 18-19 - NYC

2015-02-10 Thread Scott walent
The inaugural Spark Summit East, an event to bring the Apache Spark
community together, will be in New York City on March 18, 2015. We are
excited about the growth of Spark and to bring the event to the east coast.

At Spark Summit East you can look forward to hearing from Matei Zaharia,
Databricks CEO Ion Stoica, representatives from Palantir, Goldman Sachs,
Baidu, Salesforce, Cloudera, Box, and many others. (See the full agenda at
http://spark-summit.org/east/2015)  All of these companies are utilizing
Spark. Come see what their experience has been and get a chance to talk
with some of the creators and committers.

If you are new to Spark or looking to improve on your knowledge of the
technology, there will be three levels of Spark Training: Intro to Spark,
Advanced Spark Training, and Data Science with Spark.

Space is limited, but we want to make sure those active in the community
are aware of the this new event in NYC. Use promo code DevList15 for 15%
off your registration fee when registering before March 1, 2015.

Register at http://spark-summit.org/east/2015/register

Looking forward to seeing you there!

Best,
Scott  The Spark Summit Organizers


Spark Summit East CFP - 5 days until deadline

2014-12-01 Thread Scott walent
The inaugural Spark Summit East (spark-summit.org/east), an event to bring
the Apache Spark community together, will be in New York City on March 18,
2015. The call for submissions is currently open, but will close this
Friday December 5, at 11:59pm PST.   The summit is looking for talks that
will cover topics including applications, development, research, and data
science.

At the Summit you can look forward to hearing from committers, developers,
CEOs, and companies who are solving real-world big data challenges with
Spark.

All submissions will be reviewed by a Program Committee that is made up of
the creators, top committers and individuals who have heavily contributed
to the Spark project. No speaker slots are being sold to sponsors in an
effort to to keep the Summit a community driven event.

To submit your abstracts please visit: spark-summit.org/east/2015/cfp

Looking forward to seeing you there!

Best,
Scott  The Spark Summit Organizers