Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Denny Lee
+1 (non-binding)


On Thu, Apr 25, 2024 at 19:26 Xinrong Meng  wrote:

> +1
>
> On Thu, Apr 25, 2024 at 2:08 PM Holden Karau 
> wrote:
>
>> +1
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Thu, Apr 25, 2024 at 11:18 AM Maciej  wrote:
>>
>>> +1
>>>
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> PGP: A30CEF0C31A501EC
>>>
>>> On 4/25/24 6:21 PM, Reynold Xin wrote:
>>>
>>> +1
>>>
>>> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale
>>>  
>>> wrote:
>>>
 +1

 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun 
 wrote:

> FYI, there is a proposal to drop Python 3.8 because its EOL is October
> 2024.
>
> https://github.com/apache/spark/pull/46228
> [SPARK-47993][PYTHON] Drop Python 3.8
>
> Since it's still alive and there will be an overlap between the
> lifecycle of Python 3.8 and Apache Spark 4.0.0, please give us your
> feedback on the PR, if you have any concerns.
>
> From my side, I agree with this decision.
>
> Thanks,
> Dongjoon.
>



Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Denny Lee
+1 (non-binding)

On Sat, Apr 13, 2024 at 7:49 PM huaxin gao  wrote:

> +1
>
> On Sat, Apr 13, 2024 at 4:36 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Sat, Apr 13, 2024 at 4:12 PM Hyukjin Kwon 
>> wrote:
>> >
>> > +1
>> >
>> > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun  wrote:
>> >>
>> >> +1.
>> >>
>> >> This feature is very helpful for guarding against correctness issues,
>> such as null results due to invalid input or math overflows. It’s been
>> there for a while now and it’s a good time to enable it by default as Spark
>> enters the next major release.
>> >>
>> >> On Sat, Apr 13, 2024 at 3:27 PM Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> I'll start from my +1.
>> >>>
>> >>> Dongjoon.
>> >>>
>> >>> On 2024/04/13 22:22:05 Dongjoon Hyun wrote:
>> >>> > Please vote on SPARK-4 to use ANSI SQL mode by default.
>> >>> > The technical scope is defined in the following PR which is
>> >>> > one line of code change and one line of migration guide.
>> >>> >
>> >>> > - DISCUSSION:
>> >>> > https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz
>> >>> > - JIRA: https://issues.apache.org/jira/browse/SPARK-4
>> >>> > - PR: https://github.com/apache/spark/pull/46013
>> >>> >
>> >>> > The vote is open until April 17th 1AM (PST) and passes
>> >>> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>> >
>> >>> > [ ] +1 Use ANSI SQL mode by default
>> >>> > [ ] -1 Do not use ANSI SQL mode by default because ...
>> >>> >
>> >>> > Thank you in advance.
>> >>> >
>> >>> > Dongjoon
>> >>> >
>> >>>
>> >>> -
>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Denny Lee
+1 (non-binding)


On Mon, Apr 1, 2024 at 9:24 AM Hussein Awala  wrote:

> +1(non-binding) I add to the difference will it make that it will also
> simplify package maintenance and easily release a bug fix/new feature
> without needing to wait for Pyspark to release.
>
> On Mon, Apr 1, 2024 at 4:56 PM Chao Sun  wrote:
>
>> +1
>>
>> On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon 
>> wrote:
>>
>>> Oh I didn't send the discussion thread out as it's pretty simple,
>>> non-invasive and the discussion was sort of done as part of the Spark
>>> Connect initial discussion ..
>>>
>>> On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan 
>>> wrote:
>>>

 Can you point me to the SPIP’s discussion thread please ?
 I was not able to find it, but I was on vacation, and so might have
 missed this …


 Regards,
 Mridul

>>>
 On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
  wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI
>> (Spark Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Denny Lee
+1 (non-binding)

On Sun, Mar 10, 2024 at 23:36 Gengliang Wang  wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Structured Logging Framework for
> Apache Spark
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
>
> Gengliang Wang
>


Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Denny Lee
+1

On Tue, Sep 26, 2023 at 10:52 Maciej  wrote:

> +1
>
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
> On 9/26/23 17:12, Michel Miotto Barbosa wrote:
>
> +1
>
> A disposição | At your disposal
>
> Michel Miotto Barbosa
> https://www.linkedin.com/in/michelmiottobarbosa/
> mmiottobarb...@gmail.com
> +55 11 984 342 347
>
>
>
>
> On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell
>   wrote:
>
>> +1
>>
>> On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
>>  wrote:
>>
>>> +1
>>>
>>>
>>>
>>> *发件人**: *Yikun Jiang 
>>> *日期**: *2023年9月26日 星期二 18:06
>>> *收件人**: *dev 
>>> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
>>> ruife...@apache.org>
>>> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and
>>> maintenance releases
>>>
>>>
>>>
>>> +1, I believe it is a wise choice to update the EOL policy of the
>>> document based on the real demands of community users.
>>>
>>>
>>> Regards,
>>>
>>> Yikun
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
>>> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to start the vote for updating documentation hosted for EOL
>>> and maintenance releases to improve the usability here, and in order for
>>> end users to read the proper and correct documentation.
>>>
>>>
>>> For discussion thread, please refer to
>>> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
>>> .
>>>
>>>
>>>
>>>
>>> Here is one example:
>>> - https://github.com/apache/spark/pull/42989
>>> 
>>>
>>> - https://github.com/apache/spark-website/pull/480
>>> 
>>>
>>>
>>>
>>> Starting with my own +1.
>>>
>>>


Re: First Time contribution.

2023-09-17 Thread Denny Lee
Hi Ram,

We have some good guidance at
https://spark.apache.org/contributing.html

HTH!
Denny


On Sun, Sep 17, 2023 at 17:18 ram manickam  wrote:

>
>
>
> Hello All,
> Recently, joined this community and would like to contribute. Is there a
> guideline or recommendation on tasks that can be picked up by a first timer
> or a started task?.
>
> Tried looking at stack overflow tag: apache-spark
> , couldn't find
> any information for first time contributors.
>
> Looking forward to learning and contributing.
>
> Thanks
> Ram
>


Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Denny Lee
+1 (non-binding)

On Fri, Jul 7, 2023 at 00:50 Maciej  wrote:

> +0
>
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
> On 7/6/23 17:41, Xiao Li wrote:
>
> +1
>
> Xiao
>
> Hyukjin Kwon  于2023年7月5日周三 17:28写道:
>
>> +1.
>>
>> See https://youtu.be/yj7XlTB1Jvc?t=604 :-).
>>
>> On Thu, 6 Jul 2023 at 09:15, Allison Wang
>> 
>>  wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Python Data Source API.
>>>
>>> The high-level summary for the SPIP is that it aims to introduce a
>>> simple API in Python for Data Sources. The idea is to enable Python
>>> developers to create data sources without learning Scala or dealing with
>>> the complexities of the current data source APIs. This would make Spark
>>> more accessible to the wider Python developer community.
>>>
>>> References:
>>>
>>>- SPIP doc
>>>
>>> 
>>>- JIRA ticket 
>>>- Discussion thread
>>>
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because __.
>>>
>>> Thanks,
>>> Allison
>>>
>>


Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Denny Lee
Slightly biased, but per my conversations - this would be awesome to have!

On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari 
wrote:

> I would definitely use it - is it's available :)
>
> On Mon, 19 Jun 2023, 21:56 Jacek Laskowski,  wrote:
>
>> Hi Allison and devs,
>>
>> Although I was against this idea at first sight (probably because I'm a
>> Scala dev), I think it could work as long as there are people who'd be
>> interested in such an API. Were there any? I'm just curious. I've seen no
>> emails requesting it.
>>
>> I also doubt that Python devs would like to work on new data sources but
>> support their wishes wholeheartedly :)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>>
>> On Fri, Jun 16, 2023 at 6:14 AM Allison Wang
>>  wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to start a discussion on “Python Data Source API”.
>>>
>>> This proposal aims to introduce a simple API in Python for Data Sources.
>>> The idea is to enable Python developers to create data sources without
>>> having to learn Scala or deal with the complexities of the current data
>>> source APIs. The goal is to make a Python-based API that is simple and easy
>>> to use, thus making Spark more accessible to the wider Python developer
>>> community. This proposed approach is based on the recently introduced
>>> Python user-defined table functions with extensions to support data sources.
>>>
>>> *SPIP Doc*:
>>> https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing
>>>
>>> *SPIP JIRA*: https://issues.apache.org/jira/browse/SPARK-44076
>>>
>>> Looking forward to your feedback.
>>>
>>> Thanks,
>>> Allison
>>>
>>


Re: JDK version support policy?

2023-06-06 Thread Denny Lee
+1 on dropping Java 8 in Spark 4.0, saying this as a fan of the fast-paced
(positive) updates to Arrow, eh?!

On Tue, Jun 6, 2023 at 4:02 PM Sean Owen  wrote:

> I haven't followed this discussion closely, but I think we could/should
> drop Java 8 in Spark 4.0, which is up next after 3.5?
>
> On Tue, Jun 6, 2023 at 2:44 PM David Li  wrote:
>
>> Hello Spark developers,
>>
>> I'm from the Apache Arrow project. We've discussed Java version support
>> [1], and crucially, whether to continue supporting Java 8 or not. As Spark
>> is a big user of Arrow in Java, I was curious what Spark's policy here was.
>>
>> If Spark intends to stay on Java 8, for instance, we may also want to
>> stay on Java 8 or otherwise provide some supported version of Arrow for
>> Java 8.
>>
>> We've seen dependencies dropping or planning to drop support. gRPC may
>> drop Java 8 at any time [2], possibly this September [3], which may affect
>> Spark (due to Spark Connect). And today we saw that Arrow had issues
>> running tests with Mockito on Java 20, but we couldn't update Mockito since
>> it had dropped Java 8 support. (We pinned the JDK version in that CI
>> pipeline for now.)
>>
>> So at least, I am curious if Arrow could start the long process of
>> migrating Java versions without impacting Spark, or if we should continue
>> to cooperate. Arrow Java doesn't see quite so much activity these days, so
>> it's not quite critical, but it's possible that these dependency issues
>> will start to affect us more soon. And looking forward, Java is working on
>> APIs that should also allow us to ditch the --add-opens flag requirement
>> too.
>>
>> [1]: https://lists.apache.org/thread/phpgpydtt3yrgnncdyv4qdq1gf02s0yj
>> [2]:
>> https://github.com/grpc/proposal/blob/master/P5-jdk-version-support.md
>> [3]: https://github.com/grpc/grpc-java/issues/9386
>>
>


Re: [CONNECT] New Clients for Go and Rust

2023-05-24 Thread Denny Lee
+1 on separate repo allowing different APIs to run at different speeds and
ensuring they get community support.

On Wed, May 24, 2023 at 00:37 Hyukjin Kwon  wrote:

> I think we can just start this with a separate repo.
> I am fine with the second option too but in this case we would have to
> triage which language to add into the main repo.
>
> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>
>> Hi,
>>
>> Personally, I'm strongly against the second option and have some
>> preference towards the third one (or maybe a mix of the first one and the
>> third one).
>>
>> The project is already pretty large as-is and, with an extremely
>> conservative approach towards removal of APIs, it only tends to grow over
>> time. Making it even larger is not going to make things more maintainable
>> and is likely to create an entry barrier for new contributors (that's
>> similar to Jia's arguments).
>>
>> Moreover, we've seen quite a few different language clients over the
>> years and all but one or two survived while none is particularly active, as
>> far as I'm aware.  Taking responsibility for more clients, without being
>> sure that we have resources to maintain them and there is enough community
>> around them to make such effort worthwhile, doesn't seem like a good idea.
>>
>> --
>> Best regards,
>> Maciej Szymkiewicz
>>
>> Web: https://zero323.net
>> PGP: A30CEF0C31A501EC
>>
>>
>>
>> On 5/19/23 14:57, Jia Fan wrote:
>>
>> Hi,
>>
>> Thanks for contribution!
>> I prefer (1). There are some reason:
>>
>> 1. Different repository can maintain independent versions, different
>> release times, and faster bug fix releases.
>>
>> 2. Different languages have different build tools. Putting them in one
>> repository will make the main repository more and more complicated, and it
>> will become extremely difficult to perform a complete build in the main
>> repository.
>>
>> 3. Different repository will make CI configuration and execute easier,
>> and the PR and commit lists will be clearer.
>>
>> 4. Other repository also have different client to governed, like
>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>> https://github.com/ClickHouse/clickhouse-java
>> https://github.com/ClickHouse/clickhouse-odbc
>> https://github.com/ClickHouse/clickhouse-cpp
>>
>> PS: I'm looking forward to the javascript connect client!
>>
>> Thanks Regards
>> Jia Fan
>>
>> Martin Grund  于2023年5月19日周五 20:03写道:
>>
>>> Hi folks,
>>>
>>> When Bo (thanks for the time and contribution) started the work on
>>> https://github.com/apache/spark/pull/41036 he started the Go client
>>> directly in the Spark repository. In the meantime, I was approached by
>>> other engineers who are willing to contribute to working on a Rust client
>>> for Spark Connect.
>>>
>>> Now one of the key questions is where should these connectors live and
>>> how we manage expectations most effectively.
>>>
>>> At the high level, there are two approaches:
>>>
>>> (1) "3rd party" (non-JVM / Python) clients should live in separate
>>> repositories owned and governed by the Apache Spark community.
>>>
>>> (2) All clients should live in the main Apache Spark repository in the
>>> `connector/connect/client` directory.
>>>
>>> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
>>> the Apache Spark repository and governance rules.
>>>
>>> Before we iron out how exactly, we mark these clients as experimental
>>> and how we align their release process etc with Spark, my suggestion would
>>> be to get a consensus on this first question.
>>>
>>> Personally, I'm fine with (1) and (2) with a preference for (2).
>>>
>>> Would love to get feedback from other members of the community!
>>>
>>> Thanks
>>> Martin
>>>
>>>
>>>
>>>
>>


Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Denny Lee
Thanks Dongjoon, but I don't think this is misleading insofar that this is
not a *self-service process* but an invite process which admittedly I did
not state explicitly in my previous thread.  And thanks for the invite to
the-ASF Slack - I just joined :)

Saying this, I do completely agree with your two assertions:

   - *Shall we narrow-down our focus on comparing the ASF Slack vs another
   3rd-party Slack because all of us agree that this is important? *
   - Yes, I do agree that is an important aspect, all else being equal.


   - *I'm wondering what ASF misses here if Apache Spark PMC invites all
   remaining subscribers of `user@spark` and `dev@spark` mailing lists.*
   - The key question here is that do PMC members have the bandwidth of
  inviting everyone in user@ and dev@?   There is a lot of overhead of
  maintaining this so that's my key concern is if we have the number of
  volunteers to manage this.  Note, I'm willing to help with this
process as
  well it was just more of a matter that there are a lot of folks
to approve
  - A reason why we may want to consider Spark's own Slack is because
  we can potentially create different channels within Slack to more easily
  group messages (e.g. different threads for troubleshooting, RDDs,
  streaming, etc.).  Again, we'd need someone to manage this so that way we
  don't have an out of control number of channels.

WDYT?



On Wed, Apr 5, 2023 at 10:50 PM Dongjoon Hyun 
wrote:

> Thank you so much, Denny.
> Yes, let me comment on a few things.
>
> >  - While there is an ASF Slack <https://infra.apache.org/slack.html>, it
> >requires an @apache.org email address
>
> 1. This sounds a little misleading because we can see `guest` accounts in
> the same link. People can be invited by "Invite people to ASF" link. I
> invited you, Denny, and attached the screenshots.
>
> >   using linen.dev as its Slack archive (so we can surpass the 90 days
> limit)
>
> 2. The official Foundation-supported Slack workspace preserves all
> messages.
> (the-asf.slack.com)
>
> > Why: Allows for the community to have the option to communicate with each
> > other using Slack; a pretty popular async communication.
>
> 3. ASF foundation not only allows but also provides the option to
> communicate with each other using Slack as of today.
>
> Given the above (1) and (3), I don't think we asked the right questions
> during most of the parts.
>
> 1. Shall we narrow-down our focus on comparing the ASF Slack vs another
> 3rd-party Slack because all of us agree that this is important?
> 2. I'm wondering what ASF misses here if Apache Spark PMC invites all
> remaining subscribers of `user@spark` and `dev@spark` mailing lists.
>
> Thanks,
> Dongjoon.
>
> [image: invitation.png]
> [image: invited.png]
>
> On Wed, Apr 5, 2023 at 7:23 PM Denny Lee  wrote:
>
>> There have been a number of threads discussing creating a Slack for the
>> Spark community that I'd like to try to help reconcile.
>>
>> Topic: Slack for Spark
>>
>> Why: Allows for the community to have the option to communicate with each
>> other using Slack; a pretty popular async communication.
>>
>> Discussion points:
>>
>>- There are other ASF projects that use Slack including Druid
>><https://druid.apache.org/community/>, Parquet
>><https://parquet.apache.org/community/>, Iceberg
>><https://iceberg.apache.org/community/>, and Hudi
>><https://hudi.apache.org/community/get-involved/>
>>- Flink <https://flink.apache.org/community/> is also using Slack and
>>using linen.dev as its Slack archive (so we can surpass the 90 days
>>limit) which is also Google searchable (Delta Lake
>><https://www.linen.dev/s/delta-lake/> is also using this service as
>>well)
>>- While there is an ASF Slack <https://infra.apache.org/slack.html>,
>>it requires an @apache.org email address to use which is quite
>>limiting which is why these (and many other) OSS projects are using the
>>free-tier Slack
>>- It does require managing Slack properly as Slack free edition
>>limits you to approx 100 invites.  One of the ways to resolve this is to
>>create a bit.ly link so we can manage the invites without regularly
>>updating the website with the new invite link.
>>
>> Are there any other points of discussion that we should add here?  I'm
>> glad to work with whomever to help manage the various aspects of Slack
>> (code of conduct, linen.dev and search/archive process, invite
>> management, etc.).
>>
>> HTH!
>> Denny
>>
>>
>>


Slack for Spark Community: Merging various threads

2023-04-05 Thread Denny Lee
There have been a number of threads discussing creating a Slack for the
Spark community that I'd like to try to help reconcile.

Topic: Slack for Spark

Why: Allows for the community to have the option to communicate with each
other using Slack; a pretty popular async communication.

Discussion points:

   - There are other ASF projects that use Slack including Druid
   , Parquet
   , Iceberg
   , and Hudi
   
   - Flink  is also using Slack and
   using linen.dev as its Slack archive (so we can surpass the 90 days
   limit) which is also Google searchable (Delta Lake
    is also using this service as well)
   - While there is an ASF Slack , it
   requires an @apache.org email address to use which is quite limiting
   which is why these (and many other) OSS projects are using the free-tier
   Slack
   - It does require managing Slack properly as Slack free edition limits
   you to approx 100 invites.  One of the ways to resolve this is to create a
   bit.ly link so we can manage the invites without regularly updating the
   website with the new invite link.

Are there any other points of discussion that we should add here?  I'm glad
to work with whomever to help manage the various aspects of Slack (code of
conduct, linen.dev and search/archive process, invite management, etc.).

HTH!
Denny


Re: Slack for PySpark users

2023-04-03 Thread Denny Lee
;
>>>>> On Thu, Mar 30, 2023 at 9:10 PM Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>
>>>>>> I'm reading through the page "Briefing: The Apache Way", and in the
>>>>>> section of "Open Communications", restriction of communication inside ASF
>>>>>> INFRA (mailing list) is more about code and decision-making.
>>>>>>
>>>>>> https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define
>>>>>>
>>>>>> It's unavoidable if "users" prefer to use an alternative
>>>>>> communication mechanism rather than the user mailing list. Before Stack
>>>>>> Overflow days, there had been a meaningful number of questions around 
>>>>>> user@.
>>>>>> It's just impossible to let them go back and post to the user mailing 
>>>>>> list.
>>>>>>
>>>>>> We just need to make sure it is not the purpose of employing Slack to
>>>>>> move all discussions about developments, direction of the project, etc
>>>>>> which must happen in dev@/private@. The purpose of Slack thread here
>>>>>> does not seem to aim to serve the purpose.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 31, 2023 at 7:00 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Good discussions and proposals.all around.
>>>>>>>
>>>>>>> I have used slack in anger on a customer site before. For small and
>>>>>>> medium size groups it is good and affordable. Alternatives have been
>>>>>>> suggested as well so those who like investigative search can agree and 
>>>>>>> come
>>>>>>> up with a freebie one.
>>>>>>> I am inclined to agree with Bjorn that this slack has more social
>>>>>>> dimensions than the mailing list. It is akin to a sports club using
>>>>>>> WhatsApp groups for communication. Remember we were originally looking 
>>>>>>> for
>>>>>>> space for webinars, including Spark on Linkedin that Denney Lee 
>>>>>>> suggested.
>>>>>>> I think Slack and mailing groups can coexist happily. On a more serious
>>>>>>> note, when I joined the user group back in 2015-2016, there was a lot of
>>>>>>> traffic. Currently we hardly get many mails daily <> less than 5. So 
>>>>>>> having
>>>>>>> a slack type medium may improve members participation.
>>>>>>>
>>>>>>> so +1 for me as well.
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>> Palantir Technologies Limited
>>>>>>>
>>>>>>>
>>>>>>>view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 30 Mar 2023 at 22:19, Denny Lee 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1.
>>>>>>>>
>>>>>>>> To Shani’s point, there are multiple OSS projects that use the free
>>>>>>>> Slack version - top of mind include Delta, Presto, Flink, Trino, 
>>>>>>>> Datahub,
>>>>>>>> MLflow, etc.
>>>>>>>>
>>>>>>>> On Thu, Mar 30, 2023 at 14:15  wrote:
>>>>>>

Re: Slack for PySpark users

2023-03-30 Thread Denny Lee
ng
>>>> list because we didn't set up any rule here yet.
>>>>
>>>> To Xiao. I understand what you mean. That's the reason why I added
>>>> Matei from your side.
>>>> > I did not see an objection from the ASF board.
>>>>
>>>> There is on-going discussion about the communication channels outside
>>>> ASF email which is specifically concerning Slack.
>>>> Please hold on any official action for this topic. We will know how to
>>>> support it seamlessly.
>>>>
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>>>
>>>>> Hi, Dongjoon,
>>>>>
>>>>> The other communities (e.g., Pinot, Druid, Flink) created their own
>>>>> Slack workspaces last year. I did not see an objection from the ASF board.
>>>>> At the same time, Slack workspaces are very popular and useful in most
>>>>> non-ASF open source communities. TBH, we are kind of late. I think we can
>>>>> do the same in our community?
>>>>>
>>>>> We can follow the guide when the ASF has an official process for ASF
>>>>> archiving. Since our PMC are the owner of the slack workspace, we can make
>>>>> a change based on the policy. WDYT?
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>> Dongjoon Hyun  于2023年3月30日周四 09:03写道:
>>>>>
>>>>>> Hi, Xiao and all.
>>>>>>
>>>>>> (cc Matei)
>>>>>>
>>>>>> Please hold on the vote.
>>>>>>
>>>>>> There is a concern expressed by ASF board because recent Slack
>>>>>> activities created an isolated silo outside of ASF mailing list archive.
>>>>>>
>>>>>> We need to establish a way to embrace it back to ASF archive before
>>>>>> starting anything official.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li 
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> + @dev@spark.apache.org 
>>>>>>>
>>>>>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>>>>>>> Flink) have created their own dedicated Slack workspaces for faster
>>>>>>> communication. We can do the same in Apache Spark. The Slack workspace 
>>>>>>> will
>>>>>>> be maintained by the Apache Spark PMC. I propose to initiate a vote for 
>>>>>>> the
>>>>>>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh  于2023年3月28日周二 07:07写道:
>>>>>>>
>>>>>>>> I created one at slack called pyspark
>>>>>>>>
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>>> Palantir Technologies Limited
>>>>>>>>
>>>>>>>>
>>>>>>>>view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 good idea, I d like to join as well.
>>>>>>>>>
>>>>>>>>> Le mar. 28 mars 2023 à 04:09, Winston Lai 
>>>>>>>>> a écrit :
>>>>>>>>>
>>>>>>>>>> Please let us know when the channel is created. I'd like to join
>>>>>>>>>> :)
>>>>>>>>>>
>>>>>>>>>> Thank You & Best Regards
>>>>>>>>>> Winston Lai
>>>>>>>>>> --
>>>>>>>>>> *From:* Denny Lee 
>>>>>>>>>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>>>>>>>>>> *To:* Hyukjin Kwon 
>>>>>>>>>> *Cc:* keen ; u...@spark.apache.org <
>>>>>>>>>> u...@spark.apache.org>
>>>>>>>>>> *Subject:* Re: Slack for PySpark users
>>>>>>>>>>
>>>>>>>>>> +1 I think this is a great idea!
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Yeah, actually I think we should better have a slack channel so
>>>>>>>>>> we can easily discuss with users and developers.
>>>>>>>>>>
>>>>>>>>>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I really like *Slack *as communication channel for a tech
>>>>>>>>>> community.
>>>>>>>>>> There is a Slack workspace for *delta lake users* (
>>>>>>>>>> https://go.delta.io/slack) that I enjoy a lot.
>>>>>>>>>> I was wondering if there is something similar for PySpark users.
>>>>>>>>>>
>>>>>>>>>> If not, would there be anything wrong with creating a new
>>>>>>>>>> Slack workspace for PySpark users? (when explicitly mentioning that 
>>>>>>>>>> this is
>>>>>>>>>> *not* officially part of Apache Spark)?
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Asma ZGOLLI
>>>>>>>>>
>>>>>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>>>>>
>>>>>>>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4
>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail=g>,
>> 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>


Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
What we can do is get into the habit of compiling the list on LinkedIn but
making sure this list is shared and broadcast here, eh?!

As well, when we broadcast the videos, we can do this using zoom/jitsi/
riverside.fm as well as simulcasting this on LinkedIn. This way you can
view directly on the former without ever logging in with a user ID.

HTH!!

On Wed, Mar 15, 2023 at 4:30 PM Mich Talebzadeh 
wrote:

> Understood Nitin It would be wrong to act against one's conviction. I am
> sure we can find a way around providing the contents
>
> Regards
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
> wrote:
>
>> Hi Mich,
>>
>> Thanks for your prompt response ... much appreciated. I know how to and
>> can create login IDs on such sites but I had taken conscious decision some
>> 20 years ago ( and i will be going against my principles) not to be on such
>> sites. Hence I had asked for is there any other way I can join/view
>> recording of webinar.
>>
>> Anyways not to worry.
>>
>> Thanks & Regards
>>
>> Nitin.
>>
>>
>> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi Nitin,
>>
>> Linkedin is more of a professional media.  FYI, I am only a member of
>> Linkedin, no facebook, etc.There is no reason for you NOT to create a
>> profile for yourself  in linkedin :)
>>
>>
>> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>>
>> see you there as well.
>>
>> Best of luck.
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
>> wrote:
>>
>> Hello Mich,
>>
>> My apologies  ...  but I am not on any of such social/professional sites?
>> Any other way to access such webinars/classes?
>>
>> Thanks & Regards
>> Nitin.
>>
>> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
>> denny.g@gmail.com> wrote:
>>
>>
>> Thanks Mich for tackling this!  I encourage everyone to add to the list
>> so we can have a comprehensive list of topics, eh?!
>>
>> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
>> wrote:
>>
>> Hi all,
>>
>> Thanks to @Denny Lee   to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli 
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
Thanks Mich for tackling this!  I encourage everyone to add to the list so
we can have a comprehensive list of topics, eh?!

On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
wrote:

> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
>> Hi Denny,
>>
>> That Apache Spark Linkedin page
>> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
>> a wider audience to benefit from it.
>>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>>
>>> In the past, we've been using the Apache Spark LinkedIn page
>>> <https://www.linkedin.com/company/apachespark/> and group to broadcast
>>> these type of events - if you're cool with this?  Or we could go through
>>> the process of submitting and updating the current
>>> https://spark.apache.org or request to leverage the original Spark
>>> confluence page <https://cwiki.apache.org/confluence/display/SPARK>.
>>>  WDYT?
>>>
>>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Well that needs to be created first for this purpose. The appropriate
>>>> name etc. to be decided. Maybe @Denny Lee   can
>>>> facilitate this as he offered his help.
>>>>
>>>>
>>>> cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>>>>
>>>>> Hello Mich,
>>>>>
>>>>> Can you please provide the link for the confluence page?
>>>>>
>>>>> Many thanks
>>>>> Asma
>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>
>>>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> a écrit :
>>>>>
>>>>>> Apologies I missed the list.
>>>>>>
>>>>>> To move forward I selected these topics from the thread "Online
>>>>>> classes for spark topics".
>>>>>>
>>>>>> To take this further I propose a confluence page to be seup.
>>>>>>
>>>>>>
>>>>>>1. Spark UI
>>>>>>2. Dynam

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
In the past, we've been using the Apache Spark LinkedIn page
<https://www.linkedin.com/company/apachespark/> and group to broadcast
these type of events - if you're cool with this?  Or we could go through
the process of submitting and updating the current https://spark.apache.org
or request to leverage the original Spark confluence page
<https://cwiki.apache.org/confluence/display/SPARK>.WDYT?

On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
wrote:

> Well that needs to be created first for this purpose. The appropriate name
> etc. to be decided. Maybe @Denny Lee   can
> facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>
>> Hello Mich,
>>
>> Can you please provide the link for the confluence page?
>>
>> Many thanks
>> Asma
>> Ph.D. in Big Data - Applied Machine Learning
>>
>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
>> a écrit :
>>
>>> Apologies I missed the list.
>>>
>>> To move forward I selected these topics from the thread "Online classes
>>> for spark topics".
>>>
>>> To take this further I propose a confluence page to be seup.
>>>
>>>
>>>1. Spark UI
>>>2. Dynamic allocation
>>>3. Tuning of jobs
>>>4. Collecting spark metrics for monitoring and alerting
>>>5.  For those who prefer to use Pandas API on Spark since the
>>>release of Spark 3.2, What are some important notes for those users? For
>>>example, what are the additional factors affecting the Spark performance
>>>using Pandas API on Spark? How to tune them in addition to the 
>>> conventional
>>>Spark tuning methods applied to Spark SQL users.
>>>6. Spark internals and/or comparing spark 3 and 2
>>>7. Spark Streaming & Spark Structured Streaming
>>>8. Spark on notebooks
>>>9. Spark on serverless (for example Spark on Google Cloud)
>>>10. Spark on k8s
>>>
>>> Opinions and how to is welcome
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
>>> wrote:
>>>
>>>> Hi guys
>>>>
>>>> To move forward I selected these topics from the thread "Online classes
>>>> for spark topics".
>>>>
>>>> To take this further I propose a confluence page to be seup.
>>>>
>>>> Opinions and how to is welcome
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>
>>
>>
>>


Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Denny Lee
Woohoo, Holden!  I’m in (hopefully my schedule allows me) - PST.

On Tue, Feb 7, 2023 at 15:34 Andrew Melo  wrote:

> I'm Central US time (AKA UTC -6:00)
>
> On Tue, Feb 7, 2023 at 5:32 PM Holden Karau  wrote:
> >
> > Awesome, I guess I should have asked folks for timezones that they’re in.
> >
> > On Tue, Feb 7, 2023 at 3:30 PM Andrew Melo 
> wrote:
> >>
> >> Hello Holden,
> >>
> >> We are interested in Spark on k8s and would like the opportunity to
> >> speak with devs about what we're looking for slash better ways to use
> >> spark.
> >>
> >> Thanks!
> >> Andrew
> >>
> >> On Tue, Feb 7, 2023 at 5:24 PM Holden Karau 
> wrote:
> >> >
> >> > Hi Folks,
> >> >
> >> > It seems like we could maybe use some additional shared context
> around Spark on Kube so I’d like to try and schedule a virtual coffee
> session.
> >> >
> >> > Who all would be interested in virtual adventures around Spark on
> Kube development?
> >> >
> >> > No pressure if the idea of hanging out in a virtual chat with coffee
> and Spark devs does not sound like your thing, just trying to make
> something informal so we can have a better understanding of everyone’s
> goals here.
> >> >
> >> > Cheers,
> >> >
> >> > Holden :)
> >> > --
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-22 Thread Denny Lee
+1 (non-binding)

On Wed, Sep 21, 2022 at 10:33 PM Ankit Gupta  wrote:

> +1
>
> Regards,
>
> Ankit Prakash Gupta
>
> On Thu, Sep 22, 2022 at 10:38 AM Yang,Jie(INF) 
> wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Regards,
>>
>> Yang Jie
>>
>>
>>
>> *发件人**: *Gengliang Wang 
>> *日期**: *2022年9月22日 星期四 12:22
>> *收件人**: *Xiangrui Meng 
>> *抄送**: *Kent Yao , Hyukjin Kwon ,
>> dev 
>> *主题**: *Re: [VOTE] SPIP: Support Docker Official Image for Spark
>>
>>
>>
>> +1
>>
>>
>>
>> On Wed, Sep 21, 2022 at 7:26 PM Xiangrui Meng  wrote:
>>
>> +1
>>
>>
>>
>> On Wed, Sep 21, 2022 at 6:53 PM Kent Yao  wrote:
>>
>> +1
>>
>>
>>
>> *Kent Yao *
>>
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>
>> *a spark enthusiast*
>>
>> *kyuubi
>> **is
>> a unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of **Apache Spark
>> *
>> *.*
>> *spark-authorizer
>> **A
>> Spark SQL extension which provides SQL Standard Authorization for **Apache
>> Spark
>> *
>> *.*
>> *spark-postgres
>> 
>>  **A
>> library for reading data from and transferring data to Postgres / Greenplum
>> with Spark SQL and DataFrames, 10~100x faster.*
>> *itatchi
>> **A
>> library that brings useful functions from various modern database
>> management systems to **Apache Spark
>> *
>> *.*
>>
>>
>>
>>
>>
>>  Replied Message 
>>
>> From
>>
>> Hyukjin Kwon 
>>
>> Date
>>
>> 09/22/2022 09:43
>>
>> To
>>
>> dev 
>>
>> Subject
>>
>> Re: [VOTE] SPIP: Support Docker Official Image for Spark
>>
>> Starting with my +1.
>>
>>
>>
>> On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon  wrote:
>>
>> Hi all,
>>
>> I would like to start a vote for SPIP: "Support Docker Official Image for
>> Spark"
>>
>> The goal of the SPIP is to add Docker Official Image(DOI)
>> 
>> to ensure the Spark Docker images
>> meet the quality standards for Docker images, to provide these Docker
>> images for users
>> who want to use Apache Spark via Docker image.
>>
>> Please also refer to:
>>
>> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support
>> Docker Official Image for Spark
>> 
>>
>> - SPIP doc: SPIP: Support Docker Official Image for Spark
>> 
>>
>> - JIRA: SPARK-40513
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>>
>>
>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Denny Lee
+1 (non-binding).

This is a great idea and we should definitely do this.  Count me in to help
as well, eh?! :)

On Sun, Sep 18, 2022 at 7:24 PM bo zhaobo 
wrote:

> +1 (non-binding)
>
> This will bring the good experience to customers. So excited about this.
> ;-)
>
> Yuming Wang  于2022年9月19日周一 10:18写道:
>
>> +1.
>>
>> On Mon, Sep 19, 2022 at 9:44 AM Kent Yao  wrote:
>>
>>> +1
>>>
>>> Gengliang Wang  于2022年9月19日周一 09:23写道:
>>> >
>>> > +1, thanks for the work!
>>> >
>>> > On Sun, Sep 18, 2022 at 6:20 PM Hyukjin Kwon 
>>> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On Mon, 19 Sept 2022 at 09:15, Yikun Jiang 
>>> wrote:
>>> >>>
>>> >>> Hi, all
>>> >>>
>>> >>>
>>> >>> I would like to start the discussion for supporting Docker Official
>>> Image for Spark.
>>> >>>
>>> >>>
>>> >>> This SPIP is proposed to add Docker Official Image(DOI) to ensure
>>> the Spark Docker images meet the quality standards for Docker images, to
>>> provide these Docker images for users who want to use Apache Spark via
>>> Docker image.
>>> >>>
>>> >>>
>>> >>> There are also several Apache projects that release the Docker
>>> Official Images, such as: flink, storm, solr, zookeeper, httpd (with 50M+
>>> to 1B+ download for each). From the huge download statistics, we can see
>>> the real demands of users, and from the support of other apache projects,
>>> we should also be able to do it.
>>> >>>
>>> >>>
>>> >>> After support:
>>> >>>
>>> >>> The Dockerfile will still be maintained by the Apache Spark
>>> community and reviewed by Docker.
>>> >>>
>>> >>> The images will be maintained by the Docker community to ensure the
>>> quality standards for Docker images of the Docker community.
>>> >>>
>>> >>>
>>> >>> It will also reduce the extra docker images maintenance effort (such
>>> as frequently rebuilding, image security update) of the Apache Spark
>>> community.
>>> >>>
>>> >>>
>>> >>> See more in SPIP DOC:
>>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>>> >>>
>>> >>>
>>> >>> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>> Yikun
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-28 Thread Denny Lee
+1 (non-binding)

On Sun, Mar 28, 2021 at 9:06 PM 郑瑞峰  wrote:

> +1 (non-binding)
>
>
> -- 原始邮件 --
> *发件人:* "Maxim Gekk" ;
> *发送时间:* 2021年3月29日(星期一) 凌晨2:08
> *收件人:* "Matei Zaharia";
> *抄送:* "Gengliang Wang";"Mridul Muralidharan"<
> mri...@gmail.com>;"Xiao Li";"Spark dev list"<
> dev@spark.apache.org>;"Takeshi Yamamuro";
> *主题:* Re: [VOTE] SPIP: Support pandas API layer on PySpark
>
> +1 (non-binding)
>
> On Sun, Mar 28, 2021 at 8:53 PM Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Mar 28, 2021, at 1:45 AM, Gengliang Wang  wrote:
>>
>> +1 (non-binding)
>>
>> On Sun, Mar 28, 2021 at 11:12 AM Mridul Muralidharan 
>> wrote:
>>
>>> +1
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Mar 27, 2021 at 6:09 PM Xiao Li  wrote:
>>>
 +1

 Xiao

 Takeshi Yamamuro  于2021年3月26日周五 下午4:14写道:

> +1 (non-binding)
>
> On Sat, Mar 27, 2021 at 4:53 AM Liang-Chi Hsieh 
> wrote:
>
>> +1 (non-binding)
>>
>>
>> rxin wrote
>> > +1. Would open up a huge persona for Spark.
>> >
>> > On Fri, Mar 26 2021 at 11:30 AM, Bryan Cutler <
>>
>> > cutlerb@
>>
>> >  > wrote:
>> >
>> >>
>> >> +1 (non-binding)
>> >>
>> >>
>> >> On Fri, Mar 26, 2021 at 9:49 AM Maciej <
>>
>> > mszymkiewicz@
>>
>> >  > wrote:
>> >>
>> >>
>> >>> +1 (nonbinding)
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>

>>


Re: I'm going to be out starting Nov 5th

2020-10-31 Thread Denny Lee
Best wishes Holden! :)

On Sat, Oct 31, 2020 at 11:00 Dongjoon Hyun  wrote:

> Take care, Holden! I believe everything goes well.
>
> Bests,
> Dongjoon.
>
> On Sat, Oct 31, 2020 at 10:24 AM Reynold Xin  wrote:
>
>> Take care Holden and best of luck with everything!
>>
>>
>> On Sat, Oct 31 2020 at 10:21 AM, Holden Karau 
>> wrote:
>>
>>> Hi Folks,
>>>
>>> Just a heads up so folks working on decommissioning or other areas I've
>>> been active in don't block on me, I'm going to be out for at least a week
>>> and possibly more starting on November 5th. If there is anything that folks
>>> want me to review before then please let me know and I'll make the time for
>>> it. If you are curious I've got more details at
>>> http://blog.holdenkarau.com/2020/10/taking-break-surgery.html
>>>
>>> Happy Sparking Everyone,
>>>
>>> Holden :)
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Denny Lee
+1 (non-binding)

On Sun, Jun 7, 2020 at 3:21 PM Jungtaek Lim 
wrote:

> I'm seeing the effort of including the correctness issue SPARK-28067 [1]
> to 3.0.0 via SPARK-31894 [2]. That doesn't seem to be a regression so
> technically doesn't block the release, so while it'd be good to weigh its
> worth (it requires some SS users to discard the state so might bring less
> frightened requiring it in major version upgrade), it looks to be optional
> to include SPARK-28067 to 3.0.0.
>
> Besides, I see all blockers look to be resolved, thanks all for the
> amazing efforts!
>
> +1 (non-binding) if the decision of SPARK-28067 is "later".
>
> 1. https://issues.apache.org/jira/browse/SPARK-28067
> 2. https://issues.apache.org/jira/browse/SPARK-31894
>
> On Mon, Jun 8, 2020 at 5:23 AM Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Jun 7, 2020, at 6:53 AM, Maxim Gekk  wrote:
>>
>> +1 (non-binding)
>>
>> On Sun, Jun 7, 2020 at 2:34 PM Takeshi Yamamuro 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> I don't see any ongoing PR to fix critical bugs in my area.
>>> Bests,
>>> Takeshi
>>>
>>> On Sun, Jun 7, 2020 at 7:24 PM Mridul Muralidharan 
>>> wrote:
>>>
 +1

 Regards,
 Mridul

 On Sat, Jun 6, 2020 at 1:20 PM Reynold Xin  wrote:

> Apologies for the mistake. The vote is open till 11:59pm Pacific time
> on Mon June 9th.
>
> On Sat, Jun 6, 2020 at 1:08 PM Reynold Xin 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 3.0.0.
>>
>> The vote is open until [DUE DAY] and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.0.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.0.0-rc3 (commit
>> 3fdfce3120f307147244e5eaf46d61419a723d50):
>> https://github.com/apache/spark/tree/v3.0.0-rc3
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1350/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/
>>
>> The list of bug fixes going into 3.0.0 can be found at the following
>> URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>>
>> This release is using the release script of the tag v3.0.0-rc3.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.0.0?
>> ===
>>
>> The current list of open tickets targeted at 3.0.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.0.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>


Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Denny Lee
+1 (non-binding)

On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:

> The proposal itself seems good as the factors to consider, Thanks Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver,
> the delta compared to semver will have to be clarified to avoid different
> personal interpretations of the somewhat general principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.w...@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
>>> wrote:
>>>
 +1 as well.

 Matei

 On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:

 +1 (binding), assuming that this is for public stable APIs, not APIs
 that are marked as unstable, evolving, etc.

 On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an API
> are one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
> wrote:
> >
> > This new policy has a good indention, but can we narrow down on the
> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark
> 1.4 and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level
> difficulty, and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older
> APIs as much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should
> aim not to mislead the users and 3rd party library developers to say 
> "older
> is better".
> >
> > Technically, I'm wondering who will use new APIs in their examples
> (of books and StackOverflow) if they need to write an additional warning
> like `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we
> are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on
> 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic
> Versioning policy and adopt it as the rubric that should be used when
> deciding to break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
> this is a procedural vote, the measure will pass if there are more
> favourable votes than unfavourable ones. PMC votes are binding, but the
> community is encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark 

Re: Block a user from spark-website who repeatedly open the invalid same PR

2020-01-26 Thread Denny Lee
+1

On Sun, Jan 26, 2020 at 09:59 Nicholas Chammas 
wrote:

> +1
>
> I think y'all have shown this person more patience than is merited by
> their behavior.
>
> On Sun, Jan 26, 2020 at 5:16 AM Takeshi Yamamuro 
> wrote:
>
>> +1
>>
>> Bests,
>> Takeshi
>>
>> On Sun, Jan 26, 2020 at 3:05 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I am thinking about opening an infra ticket to block @DataWanderer
>>>  user from spark-website
>>> repository, who repeatedly opens the invalid PRs.
>>>
>>> The PR is about fix a documentation in the released version 2.4.4, and
>>> it should be fixed in spark
>>> repository. It was explained multiple times by me and Sean but this user
>>> opens the same PR
>>> repeatedly which brings overhead to the dev.
>>>
>>> See the PRs below:
>>>
>>> https://github.com/apache/spark-website/pull/257
>>> https://github.com/apache/spark-website/pull/256
>>> https://github.com/apache/spark-website/pull/255
>>> https://github.com/apache/spark-website/pull/254
>>> https://github.com/apache/spark-website/pull/250
>>> https://github.com/apache/spark-website/pull/249
>>>
>>> If there is no objection, and this guy opens the PR again, I am going to
>>> open an infra ticket to block
>>> this guy from spark-webiste repo to prevent such behaviours.
>>>
>>> Please let me know if you guys have any concerns.
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee
+1

On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:

> +1
>
> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>
>> +1 and the draft sounds good
>>
>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>
>>> Here is the draft announcement:
>>>
>>> ===
>>> Plan for dropping Python 2 support
>>>
>>> As many of you already knew, Python core development team and many
>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>> compatibility is an increasing burden and it essentially limits the use of
>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>> coming, we plan to eventually drop Python 2 support as well. The current
>>> plan is as follows:
>>>
>>> * In the next major release in 2019, we will deprecate Python 2 support.
>>> PySpark users will see a deprecation warning if Python 2 is used. We will
>>> publish a migration guide for PySpark users to migrate to Python 3.
>>> * We will drop Python 2 support in a future release in 2020, after
>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>> used.
>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>>> might not take patches that are specific to Python 2.
>>> ===
>>>
>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>> Spark website and announce it here. Let me know if you think we should do a
>>> VOTE instead.
>>>
>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>>
 I created https://issues.apache.org/jira/browse/SPARK-27884 to track
 the work.

 On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
 wrote:

> We don’t usually reference a future release on website
>
> > Spark website and state that Python 2 is deprecated in Spark 3.0
>
> I suspect people will then ask when is Spark 3.0 coming out then.
> Might need to provide some clarity on that.
>

 We can say the "next major release in 2019" instead of Spark 3.0. Spark
 3.0 timeline certainly requires a new thread to discuss.


>
>
> --
> *From:* Reynold Xin 
> *Sent:* Thursday, May 30, 2019 12:59:14 AM
> *To:* shane knapp
> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
> Fen; Xiangrui Meng; dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1 on Xiangrui’s plan.
>
> On Thu, May 30, 2019 at 7:55 AM shane knapp 
> wrote:
>
>> I don't have a good sense of the overhead of continuing to support
>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>
>>> from the build/test side, it will actually be pretty easy to
>> continue support for python2.7 for spark 2.x as the feature sets won't be
>> expanding.
>>
>
>> that being said, i will be cracking a bottle of champagne when i can
>> delete all of the ansible and anaconda configs for python2.x.  :)
>>
>
 On the development side, in a future release that drops Python 2
 support we can remove code that maintains python 2/3 compatibility and
 start using python 3 only features, which is also quite exciting.


>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-29 Thread Denny Lee
+1

yay - let's do it!

On Tue, Jan 29, 2019 at 6:28 AM Xiangrui Meng  wrote:

> Hi all,
>
> I want to call for a vote of SPARK-25994
> . It introduces a new
> DataFrame-based component to Spark, which supports property graph
> construction, Cypher queries, and graph algorithms. The proposal
> 
> was made available on user@
> 
> and dev@
> 
>  to
> collect input. You can also find a sketch design doc attached to
> SPARK-26028 .
>
> The vote will be up for the next 72 hours. Please reply with your vote:
>
> +1: Yeah, let's go forward and implement the SPIP.
> +0: Don't really care.
> -1: I don't think this is a good idea because of the following technical
> reasons.
>
> Best,
> Xiangrui
>


Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Denny Lee
+1


On Wed, Jan 9, 2019 at 4:30 AM Dongjoon Hyun 
wrote:

> +1
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 8, 2019 at 6:30 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Wed, Jan 9, 2019 at 3:37 AM DB Tsai  wrote:
>>
>>> +1
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> --
>>> Web: https://www.dbtsai.com
>>> PGP Key ID: 0x5CED8B896A6BDFA0
>>>
>>> On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.3.
>>> >
>>> > The vote is open until January 11 11:30AM (PST) and passes if a
>>> majority +1 PMC votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.2.3
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.2.3-rc1 (commit
>>> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>>> > https://github.com/apache/spark/tree/v2.2.3-rc1
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1295
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>>> >
>>> > The list of bug fixes going into 2.2.3 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.2.3?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.2.3 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.2.3
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Denny Lee
+1

On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma 
wrote:

> +1
>
> On Wed, 31 Oct 2018 at 11:56, Reynold Xin  wrote:
>
>> +1
>>
>> Look forward to the release!
>>
>>
>>
>> On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.4.0.
>>>
>>> The vote is open until November 1 PST and passes if a majority +1 PMC
>>> votes are cast, with
>>> a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.4.0-rc5 (commit
>>> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
>>> https://github.com/apache/spark/tree/v2.4.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1291
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>>>
>>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.0?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>


Re: welcome a new batch of committers

2018-10-03 Thread Denny Lee
Congratulations!

On Wed, Oct 3, 2018 at 05:26 Dongjin Lee  wrote:

> Congratulations to ALL!!
>
> - Dongjin
>
> On Wed, Oct 3, 2018 at 7:48 PM Jack Kolokasis 
> wrote:
>
>> Congratulations to all !!
>>
>> -Iacovos
>>
>> On 03/10/2018 12:54 μμ, Ted Yu wrote:
>>
>> Congratulations to all !
>>
>>  Original message 
>> From: Jungtaek Lim  
>> Date: 10/3/18 2:41 AM (GMT-08:00)
>> To: Marco Gaido  
>> Cc: dev  
>> Subject: Re: welcome a new batch of committers
>>
>> Congrats all! You all deserved it.
>> On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido  wrote:
>>
>>> Congrats you all!
>>>
>>> Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh <
>>> vii...@gmail.com> ha scritto:
>>>

 Congratulations to all new committers!


 rxin wrote
 > Hi all,
 >
 > The Apache Spark PMC has recently voted to add several new committers
 to
 > the project, for their contributions:
 >
 > - Shane Knapp (contributor to infra)
 > - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
 > - Kazuaki Ishizaki (contributor to Spark SQL)
 > - Xingbo Jiang (contributor to Spark Core and SQL)
 > - Yinan Li (contributor to Spark on Kubernetes)
 > - Takeshi Yamamuro (contributor to Spark SQL)
 >
 > Please join me in welcoming them!





 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>> --
>> Iacovos Kolokasis
>> Email: koloka...@ics.forth.gr
>> Postgraduate Student CSD, University of Crete
>> Researcher in CARV Lab ICS FORTH
>>
>> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare: 
> www.slideshare.net/dongjinleekr
> *
>


Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-30 Thread Denny Lee
+1 (non-binding)


On Sat, Sep 29, 2018 at 10:24 AM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> +1
>
> Stavros
>
> On Sat, Sep 29, 2018 at 5:59 AM, Sean Owen  wrote:
>
>> +1, with comments:
>>
>> There are 5 critical issues for 2.4, and no blockers:
>> SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
>> SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
>> SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
>> SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
>> SPARK-25323 ML 2.4 QA: API: Python API coverage
>>
>> Xiangrui, is SPARK-25378 important enough we need to get it into 2.4?
>>
>> I found two issues resolved for 2.4.1 that got into this RC, so marked
>> them as resolved in 2.4.0.
>>
>> I checked the licenses and notice and they look correct now in source
>> and binary builds.
>>
>> The 2.12 artifacts are as I'd expect.
>>
>> I ran all tests for 2.11 and 2.12 and they pass with -Pyarn
>> -Pkubernetes -Pmesos -Phive -Phadoop-2.7 -Pscala-2.12.
>>
>>
>>
>>
>> On Thu, Sep 27, 2018 at 10:00 PM Wenchen Fan  wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.4.0.
>> >
>> > The vote is open until October 1 PST and passes if a majority +1 PMC
>> votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.0-rc2 (commit
>> 42f25f309e91c8cde1814e3720099ac1e64783da):
>> > https://github.com/apache/spark/tree/v2.4.0-rc2
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1287
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>> >
>> > The list of bug fixes going into 2.4.0 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.4.0?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.4.0 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.0
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-20 Thread Denny Lee
+1

On Thu, Sep 20, 2018 at 9:55 AM Xiao Li  wrote:

> +1
>
>
> John Zhuge  于2018年9月19日周三 下午1:17写道:
>
>> +1 (non-binding)
>>
>> Built on Ubuntu 16.04 with Maven flags: -Phadoop-2.7 -Pmesos -Pyarn
>> -Phive-thriftserver -Psparkr -Pkinesis-asl -Phadoop-provided
>>
>> java version "1.8.0_181"
>> Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
>>
>>
>> On Wed, Sep 19, 2018 at 2:31 AM Takeshi Yamamuro 
>> wrote:
>>
>>> +1
>>>
>>> I also checked `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
>>> -Phive-thriftserve` on the openjdk below/macOSv10.12.6
>>>
>>> $ java -version
>>> java version "1.8.0_181"
>>> Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
>>>
>>>
>>> On Wed, Sep 19, 2018 at 10:45 AM Dongjoon Hyun 
>>> wrote:
>>>
 +1.

 I tested with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
 -Phive-thriftserve` on OpenJDK(1.8.0_181)/CentOS 7.5.

 I hit the following test case failure once during testing, but it's not
 persistent.

 KafkaContinuousSourceSuite
 ...
 subscribing topic by name from earliest offsets (failOnDataLoss:
 false) *** FAILED ***

 Thank you, Saisai.

 Bests,
 Dongjoon.

 On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao 
 wrote:

> +1 from my own side.
>
> Thanks
> Saisai
>
> Wenchen Fan  于2018年9月18日周二 上午9:34写道:
>
>> +1. All the blocker issues are all resolved in 2.3.2 AFAIK.
>>
>> On Tue, Sep 18, 2018 at 9:23 AM Sean Owen  wrote:
>>
>>> +1 . Licenses and sigs check out as in previous 2.3.x releases. A
>>> build from source with most profiles passed for me.
>>> On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.3.2.
>>> >
>>> > The vote is open until September 21 PST and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.3.2
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.3.2-rc6 (commit
>>> 02b510728c31b70e6035ad541bfcdc2b59dcd79a):
>>> > https://github.com/apache/spark/tree/v2.3.2-rc6
>>> >
>>> > The release files, including signatures, digests, etc. can be
>>> found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc6-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1286/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc6-docs/
>>> >
>>> > The list of bug fixes going into 2.3.2 can be found at the
>>> following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>> >
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by
>>> taking
>>> > an existing Spark workload and running on this release candidate,
>>> then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and
>>> install
>>> > the current RC and see if anything important breaks, in the
>>> Java/Scala
>>> > you can add the staging repository to your projects resolvers and
>>> test
>>> > with the RC (make sure to clean up the artifact cache before/after
>>> so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.3.2?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.3.2 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for
>>> "Target Version/s" = 2.3.2
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility
>>> should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release 

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-18 Thread Denny Lee
+1 (non-binding)
On Tue, Jul 17, 2018 at 23:04 John Zhuge  wrote:

> +1 (non-binding)
>
> On Mon, Jul 16, 2018 at 8:04 PM Saisai Shao 
> wrote:
>
>> I will put my +1 on this RC.
>>
>> For the test failure fix, I will include it if there's another RC.
>>
>> Sean Owen  于2018年7月16日周一 下午10:47写道:
>>
> OK, hm, will try to get to the bottom of it. But if others can build this
>>> module successfully, I give a +1 . The test failure is inevitable here and
>>> should not block release.
>>>
>>> On Sun, Jul 15, 2018 at 9:39 PM Saisai Shao 
>>> wrote:
>>>
>> Hi Sean,

 I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the
 errors you pasted here. I'm not sure how it happens.

 Sean Owen  于2018年7月16日周一 上午6:30写道:

>>> Looks good to me, with the following caveats.
>
> First see the discussion on
> https://issues.apache.org/jira/browse/SPARK-24813 ; the
> flaky HiveExternalCatalogVersionsSuite will probably fail all the time
> right now. That's not a regression and is a test-only issue, so don't 
> think
> it must block the release. However if this fix holds up, and we need
> another RC, worth pulling in for sure.
>
> Also is anyone seeing this while building and testing the Spark SQL +
> Kafka module? I see this error even after a clean rebuild. I sort of get
> what the error is saying but can't figure out why it would only happen at
> test/runtime. Haven't seen it before.
>
> [error] missing or invalid dependency detected while loading class
> file 'MetricsSystem.class'.
>
> [error] Could not access term eclipse in package org,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.
>
> [error] missing or invalid dependency detected while loading class
> file 'MetricsSystem.class'.
>
> [error] Could not access term jetty in value org.eclipse,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.eclipse
>
> On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao 
> wrote:
>
 Please vote on releasing the following candidate as Apache Spark
>> version 2.3.2.
>>
>> The vote is open until July 20 PST and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.3.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.3.2-rc3
>> (commit b3726dadcf2997f20231873ec6e057dba433ae64):
>> https://github.com/apache/spark/tree/v2.3.2-rc3
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1278/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-docs/
>>
>> The list of bug fixes going into 2.3.2 can be found at the following
>> URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>
>> Note. RC2 was cancelled because of one blocking issue SPARK-24781
>> during release preparation.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.2?
>> ===
>>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Denny Lee
+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas 
wrote:

> I'll give that a try, but I'll still have to figure out what to do if none
> of the release builds work with hadoop-aws, since Flintrock deploys Spark
> release builds to set up a cluster. Building Spark is slow, so we only do
> it if the user specifically requests a Spark version by git hash. (This is
> basically how spark-ec2 did things, too.)
>
>
> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin  wrote:
>
>> If you're building your own Spark, definitely try the hadoop-cloud
>> profile. Then you don't even need to pull anything at runtime,
>> everything is already packaged with Spark.
>>
>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>  wrote:
>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
>> > either (even building with -Phadoop-2.7). I guess I’ve been relying on
>> an
>> > unsupported pattern and will need to figure something else out going
>> forward
>> > in order to use s3a://.
>> >
>> >
>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin 
>> wrote:
>> >>
>> >> I have personally never tried to include hadoop-aws that way. But at
>> >> the very least, I'd try to use the same version of Hadoop as the Spark
>> >> build (2.7.3 IIRC). I don't really expect a different version to work,
>> >> and if it did in the past it definitely was not by design.
>> >>
>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> >>  wrote:
>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>> release,
>> >> > so
>> >> > it appears something has changed since then.
>> >> >
>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >> >
>> >> > My goal here is simply to confirm that this release of Spark works
>> with
>> >> > hadoop-aws like past releases did, particularly for Flintrock users
>> who
>> >> > use
>> >> > Spark with S3A.
>> >> >
>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>> builds
>> >> > with
>> >> > every Spark release. If the -hadoop2.7 release build won’t work with
>> >> > hadoop-aws anymore, are there plans to provide a new build type that
>> >> > will?
>> >> >
>> >> > Apologies if the question is poorly formed. I’m batting a bit
>> outside my
>> >> > league here. Again, my goal is simply to confirm that I/my users
>> still
>> >> > have
>> >> > a way to use s3a://. In the past, that way was simply to call pyspark
>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>> similar.
>> >> > If
>> >> > that will no longer work, I’m trying to confirm that the change of
>> >> > behavior
>> >> > is intentional or acceptable (as a review for the Spark project) and
>> >> > figure
>> >> > out what I need to change (as due diligence for Flintrock’s users).
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
>> >> > wrote:
>> >> >>
>> >> >> Using the hadoop-aws package is probably going to be a little more
>> >> >> complicated than that. The best bet is to use a custom build of
>> Spark
>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> >> looking at some nasty dependency issues, especially if you end up
>> >> >> mixing different versions of Hadoop.
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> >>  wrote:
>> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
>> RC4
>> >> >> > using
>> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> >> > some
>> >> >> > errors.
>> >> >> >
>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >> >
>> >> >> > 
>> >> >> >
>> >> >> > :: problems summary ::
>> >> >> >  WARNINGS
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >> >
>> >> >> > I’d guess I’m 

Re: Revisiting Online serving of Spark models?

2018-05-30 Thread Denny Lee
I most likely will not be able to join SF next week but definitely up for a
session after Summit in Seattle to dive further into this, eh?!

On Wed, May 30, 2018 at 9:32 AM Felix Cheung 
wrote:

> Hi!
>
> Thank you! Let’s meet then
>
> June 6 4pm
>
> Moscone West Convention Center
> 800 Howard Street, San Francisco, CA 94103
> 
>
> Ground floor (outside of conference area - should be available for all) -
> we will meet and decide where to go
>
> (Would not send invite because that would be too much noise for dev@)
>
> To paraphrase Joseph, we will use this to kick off the discusssion and
> post notes after and follow up online. As for Seattle, I would be very
> interested to meet in person lateen and discuss ;)
>
>
> _
> From: Saikat Kanjilal 
> Sent: Tuesday, May 29, 2018 11:46 AM
>
> Subject: Re: Revisiting Online serving of Spark models?
> To: Maximiliano Felice 
> Cc: Felix Cheung , Holden Karau <
> hol...@pigscanfly.ca>, Joseph Bradley , Leif Walsh
> , dev 
>
>
>
> Would love to join but am in Seattle, thoughts on how to make this work?
>
> Regards
>
> Sent from my iPhone
>
> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> maximilianofel...@gmail.com> wrote:
>
> Big +1 to a meeting with fresh air.
>
> Could anyone send the invites? I don't really know which is the place
> Holden is talking about.
>
> 2018-05-29 14:27 GMT-03:00 Felix Cheung :
>
>> You had me at blue bottle!
>>
>> _
>> From: Holden Karau 
>> Sent: Tuesday, May 29, 2018 9:47 AM
>> Subject: Re: Revisiting Online serving of Spark models?
>> To: Felix Cheung 
>> Cc: Saikat Kanjilal , Maximiliano Felice <
>> maximilianofel...@gmail.com>, Joseph Bradley ,
>> Leif Walsh , dev 
>>
>>
>>
>> I'm down for that, we could all go for a walk maybe to the mint plazaa
>> blue bottle and grab coffee (if the weather holds have our design meeting
>> outside :p)?
>>
>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung 
>> wrote:
>>
>>> Bump.
>>>
>>> --
>>> *From:* Felix Cheung 
>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>>> *Cc:* Leif Walsh; Holden Karau; dev
>>>
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> Hi! How about we meet the community and discuss on June 6 4pm at (near)
>>> the Summit?
>>>
>>> (I propose we meet at the venue entrance so we could accommodate people
>>> might not be in the conference)
>>>
>>> --
>>> *From:* Saikat Kanjilal 
>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
>>> *To:* Maximiliano Felice
>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> I’m in the same exact boat as Maximiliano and have use cases as well for
>>> model serving and would love to join this discussion.
>>>
>>> Sent from my iPhone
>>>
>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
>>> maximilianofel...@gmail.com> wrote:
>>>
>>> Hi!
>>>
>>> I'm don't usually write a lot on this list but I keep up to date with
>>> the discussions and I'm a heavy user of Spark. This topic caught my
>>> attention, as we're currently facing this issue at work. I'm attending to
>>> the summit and was wondering if it would it be possible for me to join that
>>> meeting. I might be able to share some helpful usecases and ideas.
>>>
>>> Thanks,
>>> Maximiliano Felice
>>>
>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh 
>>> escribió:
>>>
 I’m with you on json being more readable than parquet, but we’ve had
 success using pyarrow’s parquet reader and have been quite happy with it so
 far. If your target is python (and probably if not now, then soon, R), you
 should look in to it.

 On Mon, May 21, 2018 at 16:52 Joseph Bradley 
 wrote:

> Regarding model reading and writing, I'll give quick thoughts here:
> * Our approach was to use the same format but write JSON instead of
> Parquet.  It's easier to parse JSON without Spark, and using the same
> format simplifies architecture.  Plus, some people want to check files 
> into
> version control, and JSON is nice for that.
> * The reader/writer APIs could be extended to take format parameters
> (just like DataFrame reader/writers) to handle JSON (and maybe, 
> eventually,
> handle Parquet in the online serving setting).
>
> This would be a big project, so proposing a SPIP might be best.  If
> people are around at the Spark Summit, that could be a good time to meet 
> up
> & then post notes back to the dev list.
>
> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
> felixcheun...@hotmail.com> wrote:
>
>> Specifically I’d like bring part of the discussion to Model and
>> PipelineModel, and various ModelReader and SharedReadWrite 
>> implementations
>> 

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Denny Lee
Awesome - congrats Zhenhua!

On Sun, Apr 1, 2018 at 10:33 PM 叶先进  wrote:

> Big congs.
>
> > On Apr 2, 2018, at 1:28 PM, Wenchen Fan  wrote:
> >
> > Hi all,
> >
> > The Spark PMC recently added Zhenhua Wang as a committer on the project.
> Zhenhua is the major contributor of the CBO project, and has been
> contributing across several areas of Spark for a while, focusing especially
> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua!
> >
> > Wenchen
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Denny Lee
+1 (non-binding)

On Fri, Feb 23, 2018 at 07:08 Josh Goldsborough <
joshgoldsboroughs...@gmail.com> wrote:

> New to testing out Spark RCs for the community but I was able to run some
> of the basic unit tests without error so for what it's worth, I'm a +1.
>
> On Thu, Feb 22, 2018 at 4:23 PM, Sameer Agarwal 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC
>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc5:
>> https://github.com/apache/spark/tree/v2.3.0-rc5
>> (992447fb30ee9ebb3cf794f2d06f4d63a2d792db)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1266/
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>
>


Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-18 Thread Denny Lee
+1 (non-binding)

Built and tested on macOS and Ubuntu.


On Sun, Feb 18, 2018 at 3:19 PM Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> Built and tested on macOS 10.12.6 Java 8 (build 1.8.0_111). No
> regressions detected so far.
>
>
> On 18 February 2018 at 16:12, Sean Owen  wrote:
>
>> +1 from me as last time, same outcome.
>>
>> I saw one test fail, but passed on a second run, so just seems flaky.
>>
>> - subscribing topic by name from latest offsets (failOnDataLoss: true)
>> *** FAILED ***
>>   Error while stopping stream:
>>   query.exception() is not empty after clean stop:
>> org.apache.spark.sql.streaming.StreamingQueryException: Writing job failed.
>>   === Streaming Query ===
>>   Identifier: [id = cdd647ec-d7f0-437b-9950-ce9d79d691d1, runId =
>> 3a7cf7ec-670a-48b6-8185-8b6cd7e27f96]
>>   Current Committed Offsets: {KafkaSource[Subscribe[topic-4]]:
>> {"topic-4":{"2":1,"4":1,"1":0,"3":0,"0":2}}}
>>   Current Available Offsets: {}
>>
>>   Current State: TERMINATED
>>   Thread State: RUNNABLE
>>
>> On Sat, Feb 17, 2018 at 3:41 PM Sameer Agarwal 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
>>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc4:
>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1265/
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>>> currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala you
>>> can add the staging repository to your projects resolvers and test with the
>>> RC (make sure to clean up the artifact cache before/after so you don't end
>>> up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>>> appropriate.
>>>
>>> ===
>>> Why is my bug not fixed?
>>> ===
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.2.0. That being said, if
>>> there is something which is a regression from 2.2.0 and has not been
>>> correctly targeted please ping me or a committer to help target the issue
>>> (you can see the open issues listed as impacting Spark 2.3.0 at
>>> https://s.apache.org/WmoI).
>>>
>>
>


Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-05 Thread Denny Lee
+1 (non-binding)


On Wed, Oct 4, 2017 at 11:08 PM Holden Karau  wrote:

> Awesome, thanks for digging into the packaging on the R side in more
> detail. I'll look into how to update the keys file as well.
>
> On Wed, Oct 4, 2017 at 10:46 PM Felix Cheung 
> wrote:
>
>> +1
>>
>> Tested SparkR package manually on multiple platforms and checked
>> different Hadoop release jar.
>>
>> And previously tested the last RC on different R releases (see the last
>> RC vote thread)
>>
>> I found some differences in bin release jars created by the different
>> options when running the make-release script, created this JIRA to track
>> https://issues.apache.org/jira/browse/SPARK-22202
>>
>> I've checked to confirm these exist in 2.1.1 release so this isn't a
>> regression, and hence my +1.
>>
>> btw, I think we need to update this file for the new keys used in signing
>> this release https://www.apache.org/dist/spark/KEYS
>>
>>
>> _
>> From: Liwei Lin 
>> Sent: Wednesday, October 4, 2017 6:51 PM
>>
>> Subject: Re: [VOTE] Spark 2.1.2 (RC4)
>> To: Spark dev list 
>>
>>
>> +1 (non-binding)
>>
>>
>> Cheers,
>> Liwei
>>
>> On Wed, Oct 4, 2017 at 4:03 PM, Nick Pentreath 
>> wrote:
>>
>>> Ah right! Was using a new cloud instance and didn't realize I was logged
>>> in as root! thanks
>>>
>>> On Tue, 3 Oct 2017 at 21:13 Marcelo Vanzin  wrote:
>>>
 Maybe you're running as root (or the admin account on your OS)?

 On Tue, Oct 3, 2017 at 12:12 PM, Nick Pentreath
  wrote:
 > Hmm I'm consistently getting this error in core tests:
 >
 > - SPARK-3697: ignore directories that cannot be read. *** FAILED ***
 >   2 was not equal to 1 (FsHistoryProviderSuite.scala:146)
 >
 >
 > Anyone else? Any insight? Perhaps it's my set up.
 >
 >>>
 >>>
 >>> On Tue, Oct 3, 2017 at 7:24 AM Holden Karau 
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version
  2.1.2. The vote is open until Saturday October 7th at 9:00 PST and
 passes if
  a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 2.1.2
  [ ] -1 Do not release this package because ...
 
 
  To learn more about Apache Spark, please see
 https://spark.apache.org/
 
  The tag to be voted on is v2.1.2-rc4
  (2abaea9e40fce81cd4626498e0f5c28a70917499)
 
  List of JIRA tickets resolved in this release can be found with
 this
  filter.
 
  The release files, including signatures, digests, etc. can be
 found at:
  https://home.apache.org/~holden/spark-2.1.2-rc4-bin/
 
  Release artifacts are signed with a key from:
  https://people.apache.org/~holden/holdens_keys.asc
 
  The staging repository for this release can be found at:
 
 https://repository.apache.org/content/repositories/orgapachespark-1252
 
  The documentation corresponding to this release can be found at:
  https://people.apache.org/~holden/spark-2.1.2-rc4-docs/
 
 
  FAQ
 
  How can I help test this release?
 
  If you are a Spark user, you can help us test this release by
 taking an
  existing Spark workload and running on this release candidate, then
  reporting any regressions.
 
  If you're working in PySpark you can set up a virtual env and
 install
  the current RC and see if anything important breaks, in the
 Java/Scala you
  can add the staging repository to your projects resolvers and test
 with the
  RC (make sure to clean up the artifact cache before/after so you
 don't end
  up building with a out of date RC going forward).
 
  What should happen to JIRA tickets still targeting 2.1.2?
 
  Committers should look at those and triage. Extremely important bug
  fixes, documentation, and API tweaks that impact compatibility
 should be
  worked on immediately. Everything else please retarget to 2.1.3.
 
  But my bug isn't fixed!??!
 
  In order to make timely releases, we will typically not hold the
 release
  unless the bug in question is a regression from 2.1.1. That being
 said if
  there is something which is a regression form 2.1.1 that has not
 been
  correctly targeted please ping a committer to help target the
 issue (you can
  see the open issues listed as impacting Spark 2.1.1 & 2.1.2)
 
  What are the unresolved issues targeted for 2.1.2?
 
 

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-27 Thread Denny Lee
+1 (non-binding)


On Wed, Sep 27, 2017 at 6:54 AM Sean Owen  wrote:

> +1
>
> I tested the source release.
> Hashes and signature (your signature) check out, project builds and tests
> pass with -Phadoop-2.7 -Pyarn -Phive -Pmesos on Debian 9.
> List of issues look good and there are no open issues at all for 2.1.2.
>
> Great work on improving the build process and docs.
>
>
> On Wed, Sep 27, 2017 at 5:47 AM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.2. The vote is open until Wednesday October 4th at 23:59 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.1.2-rc2
>>  (
>> fabbb7f59e47590114366d14e15fbbff8c88593c)
>>
>> List of JIRA tickets resolved in this release can be found with this
>> filter.
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://home.apache.org/~holden/spark-2.1.2-rc2-bin/
>>
>> Release artifacts are signed with a key from:
>> https://people.apache.org/~holden/holdens_keys.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1251
>>
>> The documentation corresponding to this release can be found at:
>> https://people.apache.org/~holden/spark-2.1.2-rc2-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you
>> can add the staging repository to your projects resolvers and test with the
>> RC (make sure to clean up the artifact cache before/after so you don't
>> end up building with a out of date RC going forward).
>>
>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.3.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1. That being said
>> if there is something which is a regression form 2.1.1 that has not been
>> correctly targeted please ping a committer to help target the issue (you
>> can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>> 
>> )
>>
>> *What are the unresolved* issues targeted for 2.1.2
>> 
>> ?
>>
>> At this time there are no open unresolved issues.
>>
>> *Is there anything different about this release?*
>>
>> This is the first release in awhile not built on the AMPLAB Jenkins. This
>> is good because it means future releases can more easily be built and
>> signed securely (and I've been updating the documentation in
>> https://github.com/apache/spark-website/pull/66 as I progress), however
>> the chances of a mistake are higher with any change like this. If there
>> something you normally take for granted as correct when checking a release,
>> please double check this time :)
>>
>> *Should I be committing code to branch-2.1?*
>>
>> Thanks for asking! Please treat this stage in the RC process as "code
>> freeze" so bug fixes only. If you're uncertain if something should be back
>> ported please reach out. If you do commit to branch-2.1 please tag your
>> JIRA issue fix version for 2.1.3 and if we cut another RC I'll move the
>> 2.1.3 fixed into 2.1.2 as appropriate.
>>
>> *Why the longer voting window?*
>>
>> Since there is a large industry big data conference this week I figured
>> I'd add a little bit of extra buffer time just to make sure everyone has a
>> chance to take a look.
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>


Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-21 Thread Denny Lee
+1

On Thu, Sep 21, 2017 at 11:15 Sean Owen  wrote:

> Am I right that this doesn't mean other packages would use this
> representation, but that they could?
>
> The representation looked fine to me w.r.t. what DL frameworks need.
>
> My previous comment was that this is actually quite lightweight. It's kind
> of like how I/O support is provided for CSV and JSON, so makes enough sense
> to add to Spark. It doesn't really preclude other solutions.
>
> For those reasons I think it's fine. +1
>
> On Thu, Sep 21, 2017 at 6:32 PM Tim Hunter 
> wrote:
>
>> Hello community,
>>
>> I would like to call for a vote on SPARK-21866. It is a short proposal
>> that has important applications for image processing and deep learning.
>> Joseph Bradley has offered to be the shepherd.
>>
>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866
>> PDF version: https://issues.apache.org/jira/secure/attachment/12884792/
>> SPIP%20-%20Image%20support%20for%20Apache%20Spark%20V1.1.pdf
>>
>> Background and motivation
>>
>> As Apache Spark is being used more and more in the industry, some new use
>> cases are emerging for different data formats beyond the traditional SQL
>> types or the numerical types (vectors and matrices). Deep Learning
>> applications commonly deal with image processing. A number of projects add
>> some Deep Learning capabilities to Spark (see list below), but they
>> struggle to communicate with each other or with MLlib pipelines because
>> there is no standard way to represent an image in Spark DataFrames. We
>> propose to federate efforts for representing images in Spark by defining a
>> representation that caters to the most common needs of users and library
>> developers.
>>
>> This SPIP proposes a specification to represent images in Spark
>> DataFrames and Datasets (based on existing industrial standards), and an
>> interface for loading sources of images. It is not meant to be a
>> full-fledged image processing library, but rather the core description that
>> other libraries and users can rely on. Several packages already offer
>> various processing facilities for transforming images or doing more complex
>> operations, and each has various design tradeoffs that make them better as
>> standalone solutions.
>>
>> This project is a joint collaboration between Microsoft and Databricks,
>> which have been testing this design in two open source packages: MMLSpark
>> and Deep Learning Pipelines.
>>
>> The proposed image format is an in-memory, decompressed representation
>> that targets low-level applications. It is significantly more liberal in
>> memory usage than compressed image representations such as JPEG, PNG, etc.,
>> but it allows easy communication with popular image processing libraries
>> and has no decoding overhead.
>> Targets users and personas:
>>
>> Data scientists, data engineers, library developers.
>> The following libraries define primitives for loading and representing
>> images, and will gain from a common interchange format (in alphabetical
>> order):
>>
>>- BigDL
>>- DeepLearning4J
>>- Deep Learning Pipelines
>>- MMLSpark
>>- TensorFlow (Spark connector)
>>- TensorFlowOnSpark
>>- TensorFrames
>>- Thunder
>>
>> Goals:
>>
>>- Simple representation of images in Spark DataFrames, based on
>>pre-existing industrial standards (OpenCV)
>>- This format should eventually allow the development of
>>high-performance integration points with image processing libraries such 
>> as
>>libOpenCV, Google TensorFlow, CNTK, and other C libraries.
>>- The reader should be able to read popular formats of images from
>>distributed sources.
>>
>> Non-Goals:
>>
>> Images are a versatile medium and encompass a very wide range of formats
>> and representations. This SPIP explicitly aims at the most common use
>> case in the industry currently: multi-channel matrices of binary, int32,
>> int64, float or double data that can fit comfortably in the heap of the JVM:
>>
>>- the total size of an image should be restricted to less than 2GB
>>(roughly)
>>- the meaning of color channels is application-specific and is not
>>mandated by the standard (in line with the OpenCV standard)
>>- specialized formats used in meteorology, the medical field, etc.
>>are not supported
>>- this format is specialized to images and does not attempt to solve
>>the more general problem of representing n-dimensional tensors in Spark
>>
>> Proposed API changes
>>
>> We propose to add a new package in the package structure, under the MLlib
>> project:
>> org.apache.spark.image
>> Data format
>>
>> We propose to add the following structure:
>>
>> imageSchema = StructType([
>>
>>- StructField("mode", StringType(), False),
>>   - The exact representation of the data.
>>   - The values are described in the following OpenCV convention.
>>   Basically, the type has both "depth" and "number 

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-15 Thread Denny Lee
+1 (non-binding)

On Thu, Sep 14, 2017 at 10:57 PM Felix Cheung 
wrote:

> +1 tested SparkR package on Windows, r-hub, Ubuntu.
>
> _
> From: Sean Owen 
> Sent: Thursday, September 14, 2017 3:12 PM
> Subject: Re: [VOTE] Spark 2.1.2 (RC1)
> To: Holden Karau , 
>
>
>
> +1
> Very nice. The sigs and hashes look fine, it builds fine for me on Debian
> Stretch with Java 8, yarn/hive/hadoop-2.7 profiles, and passes tests.
>
> Yes as you say, no outstanding issues except for this which doesn't look
> critical, as it's not a regression.
>
> SPARK-21985 PySpark PairDeserializer is broken for double-zipped RDDs
>
>
> On Thu, Sep 14, 2017 at 7:47 PM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.2. The vote is open until Friday September 22nd at 18:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.1.2-rc1
>>  (
>> 6f470323a0363656999dd36cb33f528afe627c12)
>>
>> List of JIRA tickets resolved in this release can be found with this
>> filter.
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://home.apache.org/~pwendell/spark-releases/spark-2.1.2-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1248/
>>
>> The documentation corresponding to this release can be found at:
>> https://people.apache.org/~pwendell/spark-releases/spark-2.1.2-rc1-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.3.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1. That being said if
>> there is something which is a regression form 2.1.1 that has not been
>> correctly targeted please ping a committer to help target the issue (you
>> can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>> 
>> )
>>
>> *What are the unresolved* issues targeted for 2.1.2
>> 
>> ?
>>
>> At the time of the writing, there is one in progress major issue
>> SPARK-21985 , I
>> believe Andrew Ray & HyukjinKwon are looking into this one.
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>


Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Denny Lee
Congrats!!

On Mon, Aug 7, 2017 at 17:39 Yanbo Liang  wrote:

> Great.
> Congratulations, Hyukjin and Sameer!
>
> On Tue, Aug 8, 2017 at 7:53 AM, Holden Karau  wrote:
>
>> Congrats!
>>
>> On Mon, Aug 7, 2017 at 3:54 PM Bryan Cutler  wrote:
>>
>>> Great work Hyukjin and Sameer!
>>>
>>> On Mon, Aug 7, 2017 at 10:22 AM, Mridul Muralidharan 
>>> wrote:
>>>
 Congratulations Hyukjin, Sameer !

 Regards,
 Mridul

 On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia 
 wrote:
 > Hi everyone,
 >
 > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal
 as committers. Join me in congratulating both of them and thanking them for
 their contributions to the project!
 >
 > Matei
 > -
 > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-03 Thread Denny Lee
+1 (non-binding)

On Mon, Jul 3, 2017 at 6:45 PM Liang-Chi Hsieh  wrote:

> +1
>
>
> Sameer Agarwal wrote
> > +1
> >
> > On Mon, Jul 3, 2017 at 6:08 AM, Wenchen Fan 
>
> > cloud0fan@
>
> >  wrote:
> >
> >> +1
> >>
> >> On 3 Jul 2017, at 8:22 PM, Nick Pentreath 
>
> > nick.pentreath@
>
> > 
> >> wrote:
> >>
> >> +1 (binding)
> >>
> >> On Mon, 3 Jul 2017 at 11:53 Yanbo Liang 
>
> > ybliang8@
>
> >  wrote:
> >>
> >>> +1
> >>>
> >>> On Mon, Jul 3, 2017 at 5:35 AM, Herman van Hövell tot Westerflier <
> >>>
>
> > hvanhovell@
>
> >> wrote:
> >>>
>  +1
> 
>  On Sun, Jul 2, 2017 at 11:32 PM, Ricardo Almeida <
> 
>
> > ricardo.almeida@
>
> >> wrote:
> 
> > +1 (non-binding)
> >
> > Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
> > -Phive -Phive-thriftserver -Pscala-2.11 on
> >
> >- macOS 10.12.5 Java 8 (build 1.8.0_131)
> >- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
> >
> >
> >
> >
> >
> > On 1 Jul 2017 02:45, "Michael Armbrust" 
>
> > michael@
>
> >  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark
> > version 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00
> > PST and passes if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.2.0
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >
> > The tag to be voted on is v2.2.0-rc6
> > https://github.com/apache/spark/tree/v2.2.0-rc6;
> > (a2c7b2133cfee7f
> > a9abfaa2bfbfb637155466783)
> >
> > List of JIRA tickets resolved can be found with this filter
> > 
> https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0
> ;
> > .
> >
> > The release files, including signatures, digests, etc. can be found
> > at:
> >
> https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc6-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachespark-1245/
> >
> > The documentation corresponding to this release can be found at:
> > https://people.apache.org/~pwendell/spark-releases/spark-
> > 2.2.0-rc6-docs/
> >
> >
> > *FAQ*
> >
> > *How can I help test this release?*
> >
> > If you are a Spark user, you can help us test this release by taking
> > an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > *What should happen to JIRA tickets still targeting 2.2.0?*
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be
> > worked on immediately. Everything else please retarget to 2.3.0 or
> > 2.2.1.
> >
> > *But my bug isn't fixed!??!*
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from 2.1.1.
> >
> >
> >
> 
> 
> >>>
> >>
> >
> >
> > --
> > Sameer Agarwal
> > Software Engineer | Databricks Inc.
> > http://cs.berkeley.edu/~sameerag
>
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC6-tp21902p21914.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Denny Lee
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML,
GraphFrames, Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan  wrote:

> +1 non-binding
>
> Regards,
> vaquar khan
>
> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
> wrote:
>
> +1 (non-binding)
>
> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
> -Phive-thriftserver -Pscala-2.11 on
>
>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>
>
> On 5 June 2017 at 21:14, Michael Armbrust  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (
>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>
>
>


Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Denny Lee
+1

On Fri, Apr 28, 2017 at 9:17 AM Kazuaki Ishizaki 
wrote:

> +1 (non-binding)
>
> I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for
> core have passed..
>
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build
> 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
> $ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7
> package install
> $ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
> ...
> Total number of tests run: 1788
> Suites: completed 198, aborted 0
> Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
> All tests passed.
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 16:30 min
> [INFO] Finished at: 2017-04-29T01:02:29+09:00
> [INFO] Final Memory: 54M/576M
> [INFO]
> 
>
> Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Michael Armbrust 
> To:"dev@spark.apache.org" 
> Date:2017/04/27 09:30
> Subject:[VOTE] Apache Spark 2.1.1 (RC4)
> --
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see *http://spark.apache.org/*
> 
>
> The tag to be voted on is *v2.1.1-rc4*
>  (
> 267aca5bd5042303a718d10635bc0d1a1596853f)
>
> List of JIRA tickets resolved can be found *with this filter*
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> *http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-bin/*
> 
>
> Release artifacts are signed with the following key:
> *https://people.apache.org/keys/committer/pwendell.asc*
> 
>
> The staging repository for this release can be found at:
> *https://repository.apache.org/content/repositories/orgapachespark-1232/*
> 
>
> The documentation corresponding to this release can be found at:
> *http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-docs/*
> 
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Denny Lee
+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun 
wrote:

> +1
>
> I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
> with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> –Psparkr`
>
> At the end of R test, I saw `Had CRAN check errors; see logs.`,
> but tests passed and log file looks good.
>
> Bests,
> Dongjoon.
>
> From: Reynold Xin 
> Date: Wednesday, April 19, 2017 at 3:41 PM
> To: Marcelo Vanzin 
> Cc: Michael Armbrust , "dev@spark.apache.org" <
> dev@spark.apache.org>
> Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)
>
> +1
>
> On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin 
> wrote:
>
>> +1 (non-binding).
>>
>> Ran the hadoop-2.6 binary against our internal tests and things look good.
>>
>> On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
>>  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.1
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.1-rc3
>> > (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>> >
>> > List of JIRA tickets resolved can be found with this filter.
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1230/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>> >
>> >
>> > FAQ
>> >
>> > How can I help test this release?
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > What should happen to JIRA tickets still targeting 2.1.1?
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>> >
>> > But my bug isn't fixed!??!
>> >
>> > In order to make timely releases, we will typically not hold the release
>> > unless the bug in question is a regression from 2.1.0.
>> >
>> > What happened to RC1?
>> >
>> > There were issues with the release packaging and as a result was
>> skipped.
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: welcoming Burak and Holden as committers

2017-01-24 Thread Denny Lee
Awesome! Congrats Burak & Holden!!

On Tue, Jan 24, 2017 at 10:39 Joseph Bradley  wrote:

> Congratulations Burak & Holden!
>
> On Tue, Jan 24, 2017 at 10:33 AM, Dongjoon Hyun 
> wrote:
>
> Great! Congratulations, Burak and Holden.
>
> Bests,
> Dongjoon.
>
> On 2017-01-24 10:29 (-0800), Nicholas Chammas 
> wrote:
> >  
> >
> > Congratulations, Burak and Holden.
> >
> > On Tue, Jan 24, 2017 at 1:27 PM Russell Spitzer <
> russell.spit...@gmail.com>
> > wrote:
> >
> > > Great news! Congratulations!
> > >
> > > On Tue, Jan 24, 2017 at 10:25 AM Dean Wampler 
> > > wrote:
> > >
> > > Congratulations to both of you!
> > >
> > > dean
> > >
> > > *Dean Wampler, Ph.D.*
> > > Author: Programming Scala, 2nd Edition
> > > , Fast Data
> > > Architectures for Streaming Applications
> > > <
> http://www.oreilly.com/data/free/fast-data-architectures-for-streaming-applications.csp
> >,
> > > Functional Programming for Java Developers
> > > , and Programming
> Hive
> > >  (O'Reilly)
> > > Lightbend 
> > > @deanwampler 
> > > http://polyglotprogramming.com
> > > https://github.com/deanwampler
> > >
> > > On Tue, Jan 24, 2017 at 6:14 PM, Xiao Li  wrote:
> > >
> > > Congratulations! Burak and Holden!
> > >
> > > 2017-01-24 10:13 GMT-08:00 Reynold Xin :
> > >
> > > Hi all,
> > >
> > > Burak and Holden have recently been elected as Apache Spark committers.
> > >
> > > Burak has been very active in a large number of areas in Spark,
> including
> > > linear algebra, stats/maths functions in DataFrames, Python/R APIs for
> > > DataFrames, dstream, and most recently Structured Streaming.
> > >
> > > Holden has been a long time Spark contributor and evangelist. She has
> > > written a few books on Spark, as well as frequent contributions to the
> > > Python API to improve its usability and performance.
> > >
> > > Please join me in welcoming the two!
> > >
> > >
> > >
> > >
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-18 Thread Denny Lee
+1 (non-binding)


On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin  wrote:

> +1
>
> Cheers,
> Liwei
>
> On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang  wrote:
>
> I hope https://github.com/apache/spark/pull/16252 can be fixed until
> release 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley 
> wrote:
>
> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li  wrote:
>
> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung :
>
> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
> --
> *From:* Sean Owen 
> *Sent:* Friday, December 16, 2016 9:57:15 AM
> *To:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to do for SparkR. For Pyspark, which
> packages a bunch of dependencies, it does include the licenses (good) but I
> think it should include the NOTICE file.
>
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>
>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhov...@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] 
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>
>
>
>


Re: Handling questions in the mailing lists

2016-11-12 Thread Denny Lee
Hey Reynold,

Looks like we all of the proposed changes into Proposed Community Mailing
Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.
Anything else we can do to update the Spark Community page / welcome email?


Meanwhile, let's all start answering questions on SO, eh?! :)
Denny

On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <hol...@pigscanfly.ca> wrote:

> That's a good question, looking at
> http://stackoverflow.com/tags/apache-spark/topusers shows a few
> contributors who have already been active on SO including some committers
> and  PMC members with very high overall SO reputations for any
> administrative needs (as well as a number of other contributors besides
> just PMC/committers).
>
> On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <assaf.mendel...@rsa.com>
> wrote:
>
> I was just wondering, before we move on to SO.
>
> Do we have enough contributors with enough reputation do manage things in
> SO?
>
> We would need contributors with enough reputation to have relevant
> privilages.
>
> For example: creating tags (requires 1500 reputation), edit questions and
> answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
> access to moderator tools (1, this is required to delete questions
> etc.), protect questions (15000).
>
> All of these are important if we plan to have SO as a main resource.
>
> I know I originally suggested SO, however, if we do not have contributors
> with the required privileges and the willingness to help manage everything
> then I am not sure this is a good fit.
>
> Assaf.
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node=19800=0>]
> *Sent:* Wednesday, November 09, 2016 9:54 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> Agreed that by simply just moving the questions to SO will not solve
> anything but I think the call out about the meta-tags is that we need to
> abide by SO rules and if we were to just jump in and start creating
> meta-tags, we would be violating at minimum the spirit and at maximum the
> actual conventions around SO.
>
>
>
> Saying this, perhaps we could suggest tags that we place in the header of
> the question whether it be SO or the mailing lists that will help us sort
> through all of these questions faster just as you suggested.  The Proposed
> Community Mailing Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
>  has
> been updated to include suggested tags.  WDYT?
>
>
>
> On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node=19799=0>> wrote:
>
> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -   4658 unanswered questions under apache-spark tag.
>
> -  394 unanswered questions under spark-dataframe tag.
>
> -  639 unanswered questions under apache-spark-sql
>
> -  859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for 

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-09 Thread Denny Lee
+1 (non binding)



On Tue, Nov 8, 2016 at 10:14 PM vaquar khan  wrote:

> *+1 (non binding)*
>
> On Tue, Nov 8, 2016 at 10:21 PM, Weiqing Yang 
> wrote:
>
>  +1 (non binding)
>
>
> Environment: CentOS Linux release 7.0.1406 (Core) / openjdk version
> "1.8.0_111"
>
>
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr -DskipTests clean package
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr test
>
>
>
> On Tue, Nov 8, 2016 at 7:38 PM, Liwei Lin  wrote:
>
> +1 (non-binding)
>
> Cheers,
> Liwei
>
> On Tue, Nov 8, 2016 at 9:50 PM, Ricardo Almeida <
> ricardo.alme...@actnowib.com> wrote:
>
> +1 (non-binding)
>
> over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3,
> YARN, Hive
>
>
> On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
> +1
>
> On Tue, Nov 8, 2016 at 7:09 AM, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.2. The vote is open until Thu, Nov 10, 2016 at 22:00 PDT and passes if
> a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.2
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.2-rc3
> (584354eaac02531c9584188b143367ba694b0c34)
>
> This release candidate resolves 84 issues:
> https://s.apache.org/spark-2.0.2-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1214/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc3-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.1.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series. Bugs already present
> in 2.0.1, missing features, or bugs related to new features will not
> necessarily block this release.
>
> Q: What fix version should I use for patches merging into branch-2.0 from
> now on?
> A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> (i.e. RC4) is cut, I will change the fix version of those patches to 2.0.2.
>
>
>
>
>
>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
>
> IT Architect / Lead Consultant
> Greater Chicago
>


Re: Handling questions in the mailing lists

2016-11-09 Thread Denny Lee
Here here! :)  Completely agree with you - here's the latest updates
to Proposed
Community Mailing Lists / StackOverflow Changes
.
Keep them coming though at this point, I'd like to limit new verbiage to
prevent it from being too long hence not being read.  Modifications and
suggestions are absolutely welcome - just asking that we don't make it too
much longer.  Thanks!


On Wed, Nov 9, 2016 at 5:36 AM Gerard Maas  wrote:

> Great discussion. Glad to see it happening and lucky to have seen it on
> the mailing list due to its high volume.
>
> I had this same conversation with Patrick Wendell few Spark Summits ago.
> At the time, SO was not even listed as a resource and the idea was to make
> it the primary "go-to" place for questions.
>
> Having contributed to both the list (in its early days) and SO, the
> biggest hurdle IMO is how to deal with lazy people. These days, at SO, I
> spend more time leaving comments than answering in an attempt to moderate
> the requirement of "show some effort" and clarify unclear questions.
>
> It's my impression that the mailing list is much more friendly with "plz
> send me da code" folk and indeed would answer questions that would
> otherwise get down-voted or closed at SO. That also shows in the high email
> volume, which at the same time lowers its value for many of us who get
> overwhelmed. It's hard to separate authentic efforts in getting started,
> which deserve help and encouraging vs moderating "work dumpers" that abuse
> resources to get their thing done. Also, beginner questions always repeat
> and a mailing list has no features to help with that.
>
> The model I had in imagined roughly follows the "Odersky scale":
>  - Users new with the technology and basic "how to" questions belong in
> Stack Overflow. => The search and de-duplication features should help in
> getting an answer if already present, reducing the load.
>  - Advanced discussions and troubleshooting belong in users@
>  - Library bugs, new features and improvements belong in dev@
>
> Off course, there's no hard line between these levels and it would require
> contributor discretion aided with some routing procedure:
>
> - Spark documentation should establish Stack Overflow as the main go-to
> resource.
> - Contributors on the list should friendly redirect "intro level
> questions" to Stack Overflow.
> - SO contributors should redirect potential bugs and questions deserving a
> deeper discussion to @users or @dev as needed
> - @users -> @dev as today
> - Cross-posting SO + @users should be discouraged. The idea is to create
> efficient channels.
>
> A good resource on how and where to ask questions would be a great routing
> channel between the levels above.
> I'm willing to help with moderation efforts on "Spark Overflow" :-) to get
> this going.
>
> The Spark community has always been very welcoming and that spirit should
> be preserved. We just need to channel the efforts in a more efficient way.
>
> my 2c,
>
> Gerard.
>
>
> On Mon, Nov 7, 2016 at 11:24 PM, Maciej Szymkiewicz <
> mszymkiew...@gmail.com> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>- If we are thinking about shifting focus towards SO all attempts of
>micromanaging should be discarded right in the beginning. Especially things
>like meta tags, which are discouraged and "burninated" (
>https://meta.stackoverflow.com/tags/burninate-request/info) , or
>thread bumping. Depending on a context these won't be manageable, go
>against community guidelines or simply obsolete.
>- Lack of expertise is unlikely an issue. Even now there is a number
>of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>- Identifying, improving and promoting canonical questions and
>answers. It means closing duplicate, suggesting edits to improve existing
>answers, providing alternative solutions. This can be also used to identify
>gaps in the documentation.
>- Providing a set of clear posting guidelines to reduce effort
>required to identify the problem (think about
>http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>reproducible example?)
>- Helping users decide if question is a good fit for SO (see below).
>API questions are great fit, debugging problems like "my cluster is slow"
>are not.
>- Actively cleaning (closing, deleting) off-topic and low quality
>questions. The less junk to sieve through the better chance of good
>questions being answered.
>- Repurposing and actively moderating SO docs (
>https://stackoverflow.com/documentation/apache-spark/topics). Right
>now most of the stuff that goes there is useless, duplicated or
>plagiarized, or border case SPAM.
>- Encouraging community to monitor featured (
>

Re: Handling questions in the mailing lists

2016-11-08 Thread Denny Lee
Agreed that by simply just moving the questions to SO will not solve
anything but I think the call out about the meta-tags is that we need to
abide by SO rules and if we were to just jump in and start creating
meta-tags, we would be violating at minimum the spirit and at maximum the
actual conventions around SO.

Saying this, perhaps we could suggest tags that we place in the header of
the question whether it be SO or the mailing lists that will help us sort
through all of these questions faster just as you suggested.  The Proposed
Community Mailing Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
has
been updated to include suggested tags.  WDYT?

On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <assaf.mendel...@rsa.com>
wrote:

> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -   4658 unanswered questions under apache-spark tag.
>
> -  394 unanswered questions under spark-dataframe tag.
>
> -  639 unanswered questions under apache-spark-sql
>
> -  859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for meta tags (the burnination link is about
> removing tags completely so I am not sure how it applies, I believe this
> link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO (
> http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level
> ).
>
>
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be used
> to ask questions for development of spark. There are already tags for some
> spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a
> spark-streaming tag etc.). The main issue I see and the one we can’t seem
> to get around is dividing between simple questions that the community
> should answer and hard questions which only advanced users can answer.
>
>
>
> Maybe SO isn’t the correct platform for that but even within it we can try
> to find a non meta name for spark beginner questions vs. spark advanced
> questions.
>
> Assaf.
>
>
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node=19798=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
>
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work with:
> https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>
>
> Hope this will help us collaborate on this stuff a little faster.
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> <http:///user/SendEmail.jtp?type=node=19770=0>> wrote:
>
> Just a couple of random thoughts regarding Stack Overflo

Re: Handling questions in the mailing lists

2016-11-07 Thread Denny Lee
To help track and get the verbiage for the Spark community page and welcome
email jump started, here's a working document for us to work with:
https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#

Hope this will help us collaborate on this stuff a little faster.

On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz 
wrote:

> Just a couple of random thoughts regarding Stack Overflow...
>
>- If we are thinking about shifting focus towards SO all attempts of
>micromanaging should be discarded right in the beginning. Especially things
>like meta tags, which are discouraged and "burninated" (
>https://meta.stackoverflow.com/tags/burninate-request/info) , or
>thread bumping. Depending on a context these won't be manageable, go
>against community guidelines or simply obsolete.
>- Lack of expertise is unlikely an issue. Even now there is a number
>of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>- Identifying, improving and promoting canonical questions and
>answers. It means closing duplicate, suggesting edits to improve existing
>answers, providing alternative solutions. This can be also used to identify
>gaps in the documentation.
>- Providing a set of clear posting guidelines to reduce effort
>required to identify the problem (think about
>http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>reproducible example?)
>- Helping users decide if question is a good fit for SO (see below).
>API questions are great fit, debugging problems like "my cluster is slow"
>are not.
>- Actively cleaning (closing, deleting) off-topic and low quality
>questions. The less junk to sieve through the better chance of good
>questions being answered.
>- Repurposing and actively moderating SO docs (
>https://stackoverflow.com/documentation/apache-spark/topics). Right
>now most of the stuff that goes there is useless, duplicated or
>plagiarized, or border case SPAM.
>- Encouraging community to monitor featured (
>https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>and active & upvoted & unanswered (
>https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>- Implementing some procedure to identify questions which are likely
>to be bugs or a material for feature requests. Personally I am quite often
>tempted to simply send a link to dev list, but I don't think it is really
>acceptable.
>- Animating Spark related chat room. I tried this a couple of times
>but to no avail. Without a certain critical mass of users it just won't
>work.
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
> Sure, I'll be happy to help if I can.
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz  > wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz 
> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016 

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Denny Lee
Oh, I forgot to note that when downloading and running against the Spark
2.0.2 without Hadoop binaries, I got a JNI error due to an exception with
org / slf4j / logger  (i.e. org.slf4j.Logger class is not found).


On Mon, Oct 31, 2016 at 4:35 PM Reynold Xin  wrote:

> OK I will cut a new RC tomorrow. Any other issues people have seen?
>
>
> On Fri, Oct 28, 2016 at 2:58 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
> -1.
>
> The history server is broken because of some refactoring work in
> Structured Streaming: https://issues.apache.org/jira/browse/SPARK-18143
>
> On Fri, Oct 28, 2016 at 12:58 PM, Weiqing Yang 
> wrote:
>
> +1 (non binding)
>
>
>
> Environment: CentOS Linux release 7.0.1406 / openjdk version "1.8.0_111"/
> R version 3.3.1
>
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr -DskipTests clean package
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr test
>
>
> Best,
>
> Weiqing
>
> On Fri, Oct 28, 2016 at 10:06 AM, Ryan Blue 
> wrote:
>
> +1 (non-binding)
>
> Checksums and build are fine. The tarball matches the release tag except
> that .gitignore is missing. It would be nice if the tarball were created
> using git archive so that the commit ref is present, but otherwise
> everything looks fine.
> ​
>
> On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin  wrote:
>
> Greetings from Spark Summit Europe at Brussels.
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if
> a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.2
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.2-rc1
> (1c2908eeb8890fdc91413a3f5bad2bb3d114db6c)
>
> This release candidate resolves 75 issues:
> https://s.apache.org/spark-2.0.2-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1208/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.1.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series. Bugs already present
> in 2.0.1, missing features, or bugs related to new features will not
> necessarily block this release.
>
> Q: What fix version should I use for patches merging into branch-2.0 from
> now on?
> A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> (i.e. RC2) is cut, I will change the fix version of those patches to 2.0.2.
>
>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>
>


Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Denny Lee
+1 (non-binding)



On Thu, Oct 27, 2016 at 3:36 PM Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> built and tested without regressions from 2.0.1.
>
>
>
> On 27 October 2016 at 19:07, vaquar khan  wrote:
>
> +1
>
>
>
> On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu 
> wrote:
>
> +1
>
> On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin  wrote:
> > Greetings from Spark Summit Europe at Brussels.
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes
> if a
> > majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.0.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v2.0.2-rc1
> > (1c2908eeb8890fdc91413a3f5bad2bb3d114db6c)
> >
> > This release candidate resolves 75 issues:
> > https://s.apache.org/spark-2.0.2-jira
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1208/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-docs/
> >
> >
> > Q: How can I help test this release?
> > A: If you are a Spark user, you can help us test this release by taking
> an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 2.0.1.
> >
> > Q: What justifies a -1 vote for this release?
> > A: This is a maintenance release in the 2.0.x series. Bugs already
> present
> > in 2.0.1, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
> > Q: What fix version should I use for patches merging into branch-2.0 from
> > now on?
> > A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> > (i.e. RC2) is cut, I will change the fix version of those patches to
> 2.0.2.
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783
>
> IT Architect / Lead Consultant
> Greater Chicago
>
>
>


Re: welcoming Xiao Li as a committer

2016-10-04 Thread Denny Lee
Congrats, Xiao!
On Tue, Oct 4, 2016 at 00:00 Takeshi Yamamuro  wrote:

> congrats, xiao!
>
> On Tue, Oct 4, 2016 at 3:59 PM, Hyukjin Kwon  wrote:
>
> Congratulations!
>
> 2016-10-04 15:51 GMT+09:00 Dilip Biswal :
>
> Hi Xiao,
>
> Congratulations Xiao !!  This is indeed very well deserved !!
>
> Regards,
> Dilip Biswal
> Tel: 408-463-4980
> dbis...@us.ibm.com
>
>
>
> From:Reynold Xin 
> To:"dev@spark.apache.org" , Xiao Li <
> gatorsm...@gmail.com>
> Date:10/03/2016 10:47 PM
> Subject:welcoming Xiao Li as a committer
> --
>
>
>
> Hi all,
>
> Xiao Li, aka gatorsmile, has recently been elected as an Apache Spark
> committer. Xiao has been a super active contributor to Spark SQL. Congrats
> and welcome, Xiao!
>
> - Reynold
>
>
>
>
>
>
> --
> ---
> Takeshi Yamamuro
>


Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Denny Lee
+1 (non-binding)

On Thu, Sep 29, 2016 at 9:43 PM Jeff Zhang  wrote:

> +1
>
> On Fri, Sep 30, 2016 at 9:27 AM, Burak Yavuz  wrote:
>
>> +1
>>
>> On Sep 29, 2016 4:33 PM, "Kyle Kelley"  wrote:
>>
>>> +1
>>>
>>> On Thu, Sep 29, 2016 at 4:27 PM, Yin Huai  wrote:
>>>
 +1

 On Thu, Sep 29, 2016 at 4:07 PM, Luciano Resende 
 wrote:

> +1 (non-binding)
>
> On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and
>> passes if a majority of at least 3+1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.1
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.1-rc4
>> (933d2c1ea4e5f5c4ec8d375b5ccaa4577ba4be38)
>>
>> This release candidate resolves 301 issues:
>> https://s.apache.org/spark-2.0.1-jira
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1203/
>>
>> The documentation corresponding to this release can be found at:
>>
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-docs/
>>
>>
>> Q: How can I help test this release?
>> A: If you are a Spark user, you can help us test this release by
>> taking an existing Spark workload and running on this release candidate,
>> then reporting any regressions from 2.0.0.
>>
>> Q: What justifies a -1 vote for this release?
>> A: This is a maintenance release in the 2.0.x series.  Bugs already
>> present in 2.0.0, missing features, or bugs related to new features will
>> not necessarily block this release.
>>
>> Q: What fix version should I use for patches merging into branch-2.0
>> from now on?
>> A: Please mark the fix version as 2.0.2, rather than 2.0.1. If a new
>> RC (i.e. RC5) is cut, I will change the fix version of those patches to
>> 2.0.1.
>>
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


>>>
>>>
>>> --
>>> Kyle Kelley (@rgbkrk ; lambdaops.com)
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee
+1 on testing with Python2.

On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar <ksanka...@gmail.com> wrote:

> I do run both Python and Scala. But via iPython/Python2 with my own test
> code. Not running the tests from the distribution.
> Cheers
> 
>
> On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> I'm seeing some test failures with Python 3 that could definitely be
>> environmental (going to rebuild my virtual env and double check), I'm just
>> wondering if other people are also running the Python tests on this release
>> or if everyone is focused on the Scala tests?
>>
>> On Mon, Sep 26, 2016 at 11:48 AM, Maciej Bryński <mac...@brynski.pl>
>> wrote:
>>
>>> +1
>>> At last :)
>>>
>>> 2016-09-26 19:56 GMT+02:00 Sameer Agarwal <sam...@databricks.com>:
>>>
>>>> +1 (non-binding)
>>>>
>>>> On Mon, Sep 26, 2016 at 9:54 AM, Davies Liu <dav...@databricks.com>
>>>> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley <jos...@databricks.com>
>>>>> wrote:
>>>>> > +1
>>>>> >
>>>>> > On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee <denny.g@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> +1 (non-binding)
>>>>> >> On Sun, Sep 25, 2016 at 23:20 Jeff Zhang <zjf...@gmail.com> wrote:
>>>>> >>>
>>>>> >>> +1
>>>>> >>>
>>>>> >>> On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu
>>>>> >>> <shixi...@databricks.com> wrote:
>>>>> >>>>
>>>>> >>>> +1
>>>>> >>>>
>>>>> >>>> On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee <petermax...@gmail.com
>>>>> >
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> +1
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot
>>>>> Westerflier
>>>>> >>>>> <hvanhov...@databricks.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> +1 (non-binding)
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida
>>>>> >>>>>> <ricardo.alme...@actnowib.com> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> +1 (non-binding)
>>>>> >>>>>>>
>>>>> >>>>>>> Built and tested on
>>>>> >>>>>>> - Ubuntu 16.04 / OpenJDK 1.8.0_91
>>>>> >>>>>>> - CentOS / Oracle Java 1.7.0_55
>>>>> >>>>>>> (-Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver
>>>>> >>>>>>> -Pyarn)
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> On 25 September 2016 at 22:35, Matei Zaharia
>>>>> >>>>>>> <matei.zaha...@gmail.com> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> +1
>>>>> >>>>>>>>
>>>>> >>>>>>>> Matei
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Sep 25, 2016, at 1:25 PM, Josh Rosen <
>>>>> joshro...@databricks.com>
>>>>> >>>>>>>> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> +1
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Sun, Sep 25, 2016 at 1:16 PM Yin Huai <
>>>>> yh...@databricks.com>
>>>>> >>>>>>>> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> +1
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun
>>>>> >>>>&g

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee
+1 (non-binding)
On Sun, Sep 25, 2016 at 23:20 Jeff Zhang  wrote:

> +1
>
> On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> +1
>>
>> On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee  wrote:
>>
>>> +1
>>>
>>>
>>> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot Westerflier <
>>> hvanhov...@databricks.com> wrote:
>>>
 +1 (non-binding)

 On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida <
 ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> Built and tested on
> - Ubuntu 16.04 / OpenJDK 1.8.0_91
> - CentOS / Oracle Java 1.7.0_55
> (-Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver -Pyarn)
>
>
> On 25 September 2016 at 22:35, Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Sep 25, 2016, at 1:25 PM, Josh Rosen 
>> wrote:
>>
>> +1
>>
>> On Sun, Sep 25, 2016 at 1:16 PM Yin Huai 
>> wrote:
>>
>>> +1
>>>
>>> On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun >> > wrote:
>>>
 +1 (non binding)

 RC3 is compiled and tested on the following two systems, too. All
 tests passed.

 * CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1
with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
 -Phive-thriftserver -Dsparkr
 * CentOS 7.2 / Open JDK 1.8.0_102
with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver

 Cheers,
 Dongjoon



 On Saturday, September 24, 2016, Reynold Xin 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.0.1. The vote is open until Tue, Sep 27, 2016 at 15:30 PDT 
> and
> passes if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.1
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.1-rc3
> (9d28cc10357a8afcfb2fa2e6eecb5c2cc2730d17)
>
> This release candidate resolves 290 issues:
> https://s.apache.org/spark-2.0.1-jira
>
> The release files, including signatures, digests, etc. can be
> found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1201/
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc3-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by
> taking an existing Spark workload and running on this release 
> candidate,
> then reporting any regressions from 2.0.0.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series.  Bugs
> already present in 2.0.0, missing features, or bugs related to new 
> features
> will not necessarily block this release.
>
> Q: What fix version should I use for patches merging into
> branch-2.0 from now on?
> A: Please mark the fix version as 2.0.2, rather than 2.0.1. If a
> new RC (i.e. RC4) is cut, I will change the fix version of those 
> patches to
> 2.0.1.
>
>
>
>>>
>>
>

>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Denny Lee
Awesome - congrats Felix!

On Mon, Aug 8, 2016 at 9:44 PM Felix Cheung 
wrote:

> Thank you!
> Looking forward to work with you all!
>
>
>
>
>
> On Mon, Aug 8, 2016 at 7:41 PM -0700, "Yanbo Liang" 
> wrote:
>
> Congrats Felix!
>
> 2016-08-08 18:21 GMT-07:00 Kai Jiang :
>
>> Congrats Felix!
>>
>> On Mon, Aug 8, 2016, 18:14 Jeff Zhang  wrote:
>>
>>> Congrats Felix!
>>>
>>> On Tue, Aug 9, 2016 at 8:49 AM, Hyukjin Kwon 
>>> wrote:
>>>
 Congratulations!

 2016-08-09 7:47 GMT+09:00 Xiao Li :

> Congrats Felix!
>
> 2016-08-08 15:04 GMT-07:00 Herman van Hövell tot Westerflier
> :
> > Congrats Felix!
> >
> > On Mon, Aug 8, 2016 at 11:57 PM, dhruve ashar 
> wrote:
> >>
> >> Congrats Felix!
> >>
> >> On Mon, Aug 8, 2016 at 2:28 PM, Tarun Kumar 
> wrote:
> >>>
> >>> Congrats Felix!
> >>>
> >>> Tarun
> >>>
> >>> On Tue, Aug 9, 2016 at 12:57 AM, Timothy Chen 
> wrote:
> 
>  Congrats Felix!
> 
>  Tim
> 
>  On Mon, Aug 8, 2016 at 11:15 AM, Matei Zaharia <
> matei.zaha...@gmail.com>
>  wrote:
>  > Hi all,
>  >
>  > The PMC recently voted to add Felix Cheung as a committer.
> Felix has
>  > been a major contributor to SparkR and we're excited to have
> him join
>  > officially. Congrats and welcome, Felix!
>  >
>  > Matei
>  >
> -
>  > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>  >
> 
> 
> -
>  To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> >>
> >>
> >>
> >> --
> >> -Dhruve Ashar
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>


Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Denny Lee
BTW, we recently had a webinar on GraphFrames at
http://go.databricks.com/graphframes-dataframe-based-graphs-for-apache-spark

On Thu, Apr 21, 2016 at 14:30 Dimitris Kouzis - Loukas 
wrote:

> This thread is good. Maybe it should make it to doc or the users group
>
> On Thu, Apr 21, 2016 at 9:25 PM, Zhan Zhang 
> wrote:
>
>>
>> You can take a look at this blog from data bricks about GraphFrames
>>
>> https://databricks.com/blog/2016/03/03/introducing-graphframes.html
>>
>> Thanks.
>>
>> Zhan Zhang
>>
>> On Apr 21, 2016, at 12:53 PM, Robin East  wrote:
>>
>> Hi
>>
>> Aside from LDA, which is implemented in MLLib, GraphX has the following
>> built-in algorithms:
>>
>>
>>- PageRank/Personalised PageRank
>>- Connected Components
>>- Strongly Connected Components
>>- Triangle Count
>>- Shortest Paths
>>- Label Propagation
>>
>>
>> It also implements a version of Pregel framework, a form of
>> bulk-synchronous parallel processing that is the foundation of most of the
>> above algorithms. We cover other algorithms in our book and if you search
>> on google you will find a number of other examples.
>>
>>
>> ---
>> Robin East
>> *Spark GraphX in Action* Michael Malak and Robin East
>> Manning Publications Co.
>> http://www.manning.com/books/spark-graphx-in-action
>>
>>
>>
>>
>>
>> On 21 Apr 2016, at 19:47, tgensol  wrote:
>>
>> Hi there,
>>
>> I am working in a group of the University of Michigan, and we are trying
>> to
>> make (and find first) some Distributed graph algorithms.
>>
>> I know spark, and I found GraphX. I read the docs, but I only found Latent
>> Dirichlet Allocation algorithms working with GraphX, so I was wondering
>> why
>> ?
>>
>> Basically, the groupe wants to implement Minimal Spanning Tree, kNN,
>> shortest path at first.
>>
>> So my askings are :
>> Is graphX enough stable for developing this kind of algorithms on it ?
>> Do you know some algorithms like these working on top of GraphX ? And if
>> no,
>> why do you think, nobody tried to do it ? Is this too hard ? Or just
>> because
>> nobody needs it ?
>>
>> Maybe, it is only my knowledge about GraphX which is weak, and it is not
>> possible to make these algorithms with GraphX.
>>
>> Thanking you in advance,
>> Best regards,
>>
>> Thibaut
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/GRAPHX-Graph-Algorithms-and-Spark-tp17301.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com .
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>>
>>
>


Re: Welcoming two new committers

2016-02-08 Thread Denny Lee
Awesome - congratulations Herman and Wenchan!

On Mon, Feb 8, 2016 at 10:26 AM Dilip Biswal  wrote:

> Congratulations Wenchen and Herman !!
>
> Regards,
> Dilip Biswal
> Tel: 408-463-4980
> dbis...@us.ibm.com
>
>
>
> From:Xiao Li 
> To:Corey Nolet 
> Cc:Ted Yu , Matei Zaharia <
> matei.zaha...@gmail.com>, dev 
> Date:02/08/2016 09:39 AM
> Subject:Re: Welcoming two new committers
> --
>
>
>
> Congratulations! Herman and Wenchen!  I am just so happy for you! You
> absolutely deserve it!
>
> 2016-02-08 9:35 GMT-08:00 Corey Nolet <*cjno...@gmail.com*
> >:
> Congrats guys!
>
> On Mon, Feb 8, 2016 at 12:23 PM, Ted Yu <*yuzhih...@gmail.com*
> > wrote:
> Congratulations, Herman and Wenchen.
>
> On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia <*matei.zaha...@gmail.com*
> > wrote:
> Hi all,
>
> The PMC has recently added two new Spark committers -- Herman van Hovell
> and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten,
> adding new features, optimizations and APIs. Please join me in welcoming
> Herman and Wenchen.
>
> Matei
> -
> To unsubscribe, e-mail: *dev-unsubscr...@spark.apache.org*
> 
> For additional commands, e-mail: *dev-h...@spark.apache.org*
> 
>
>
>
>
>
>


Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Denny Lee
+1

On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson  wrote:

> +1
>
> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen 
> wrote:
>
>> +1
>>
>> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang  wrote:
>>
>>> +1
>>>
>>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra 
>>> wrote:
>>>
 +1

 On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 1.6.0!
>
> The vote is open until Friday, December 25, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc4
> (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1176/
>
> The test repository (versioned as v1.6.0-rc4) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1175/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1
> votes should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will 
> not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go
> into branch-1.6, since documentations will be published separately from 
> the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
> target version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC3
>
>   - SPARK-12404 - Fix serialization error for Datasets with
> Timestamps/Arrays/Decimal
>   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>   - SPARK-12413 - Fix mesos HA
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  
> correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  
> Parquet
>Performance - Improve Parquet scan performance when using flat
>schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even
>on shared clusters.
>- SPARK-   
> Dataset
>API - A type-safe API (similar to RDDs) that performs many
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Denny Lee
+1 (non-binding)

Tested a number of tests surrounding DataFrames, Datasets, and ML.


On Wed, Dec 16, 2015 at 1:32 PM Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and 

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-07 Thread Denny Lee
+1


On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra 
wrote:

> +1
>
> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.5.2. The vote is open until Sat Nov 7, 2015 at 00:00 UTC and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.5.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> The release fixes 59 known issues in Spark 1.5.1, listed here:
>> http://s.apache.org/spark-1.5.2
>>
>> The tag to be voted on is v1.5.2-rc2:
>> https://github.com/apache/spark/releases/tag/v1.5.2-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.2-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> - as version 1.5.2-rc2:
>> https://repository.apache.org/content/repositories/orgapachespark-1153
>> - as version 1.5.2:
>> https://repository.apache.org/content/repositories/orgapachespark-1152
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.2-rc2-docs/
>>
>>
>> ===
>> How can I help test this release?
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> What justifies a -1 vote for this release?
>> 
>> -1 vote should occur for regressions from Spark 1.5.1. Bugs already
>> present in 1.5.1 will not block this release.
>>
>>
>>


Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-03 Thread Denny Lee
+1

Distinct count test is blazing fast - awesome!,

On Thu, Sep 3, 2015 at 8:21 PM Krishna Sankar  wrote:

> +?
>
> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 26:09 min
>  mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
> 2. Tested pyspark, mllib
> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
> 2.2. Linear/Ridge/Laso Regression OK
> 2.3. Decision Tree, Naive Bayes OK
> 2.4. KMeans OK
>Center And Scale OK
> 2.5. RDD operations OK
>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>Model evaluation/optimization (rank, numIter, lambda) with
> itertools OK
> 3. Scala - MLlib
> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
> 3.2. LinearRegressionWithSGD OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
> 3.6. saveAsParquetFile OK
> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
> registerTempTable, sql OK
> 3.8. result = sqlContext.sql("SELECT
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.0. Spark SQL from Python OK
> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
> 5.0. Packages
> 5.1. com.databricks.spark.csv - read/write OK
> (--packages com.databricks:spark-csv_2.11:1.2.0-s_2.11 didn’t work. But
> com.databricks:spark-csv_2.11:1.2.0 worked)
> 6.0. DataFrames
> 6.1. cast,dtypes OK
> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
> 6.3. All joins,sql,set operations,udf OK
>
> Two Problems:
>
> 1. The synthetic column names are lowercase ( i.e. now ‘sum(OrderPrice)’;
> previously ‘SUM(OrderPrice)’, now ‘avg(Total)’; previously 'AVG(Total)').
> So programs that depend on the case of the synthetic column names would
> fail.
> 2. orders_3.groupBy("Year","Month").sum('Total').show()
> fails with the error ‘java.io.IOException: Unable to acquire 4194304
> bytes of memory’
> orders_3.groupBy("CustomerID","Year").sum('Total').show() - fails with
> the same error
> Is this a known bug ?
> Cheers
> 
> P.S: Sorry for the spam, forgot Reply All
>
> On Tue, Sep 1, 2015 at 1:41 PM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.5.0. The vote is open until Friday, Sep 4, 2015 at 21:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.5.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>>
>> The tag to be voted on is v1.5.0-rc3:
>>
>> https://github.com/apache/spark/commit/908e37bcc10132bb2aa7f80ae694a9df6e40f31a
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release (published as 1.5.0-rc3) can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1143/
>>
>> The staging repository for this release (published as 1.5.0) can be found
>> at:
>> https://repository.apache.org/content/repositories/orgapachespark-1142/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/
>>
>>
>> ===
>> How can I help test this release?
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>>
>> 
>> What justifies a -1 vote for this release?
>> 
>> This vote is happening towards the end of the 1.5 QA period, so -1 votes
>> should only occur for significant regressions from 1.4. Bugs already
>> present in 1.4, minor regressions, or bugs related to new features will not
>> block this release.
>>
>>
>> ===
>> What should happen to JIRA tickets still targeting 1.5.0?
>> ===
>> 1. It is OK for documentation patches to target 1.5.0 and still go into
>> branch-1.5, since documentations will be packaged separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.6+.
>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop the target
>> version.
>>
>>
>> ==
>> Major changes to help you focus your testing
>> ==
>>
>> As of today, Spark 1.5 contains 

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread Denny Lee
+1

On Mon, Jun 8, 2015 at 17:51 Wang, Daoyuan daoyuan.w...@intel.com wrote:

 +1

 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Wednesday, June 03, 2015 1:47 PM
 To: dev@spark.apache.org
 Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

 He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The exact
 commit and all other information is correct. (thanks Shivaram who pointed
 this out).

 On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
 1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  22596c534a38cfdda91aef18aa9037ab101e4251
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-
  /
  [published as version: 1.4.0-rc4]
  https://repository.apache.org/content/repositories/orgapachespark-1112
  /
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc4-docs
  /
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Saturday, June 06, at 05:00 UTC and passes if a
  majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not
  release this package because ...
 
  To learn more about Apache Spark, please see http://spark.apache.org/
 
  == What has changed since RC3 ==
  In addition to may smaller fixes, three blocker issues were fixed:
  4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
  metadataHive get constructed too early
  6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
  78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be
  singleton
 
  == How can I help test this release? == If you are a Spark user, you
  can help us test this release by taking a Spark 1.3 workload and
  running on this release candidate, then reporting any regressions.
 
  == What justifies a -1 vote for this release? == This vote is
  happening towards the end of the 1.4 QA period, so -1 votes should
  only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related to
  new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional
 commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-11 Thread Denny Lee
+1 (non-binding)


On Sat, Apr 11, 2015 at 11:48 AM Krishna Sankar ksanka...@gmail.com wrote:

 +1. All tests OK (same as RC2)
 Cheers
 k/

 On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Please vote on releasing the following candidate as Apache Spark version
  1.3.1!
 
  The tag to be voted on is v1.3.1-rc2 (commit 3e83913):
 
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e8391327ba586eaf54447043bd526d919043a44
 
  The list of fixes present in this release can be found at:
  http://bit.ly/1C2nVPY
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc3/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1088/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc3-docs/
 
  The patches on top of RC2 are:
  [SPARK-6851] [SQL] Create new instance for each converted parquet
 relation
  [SPARK-5969] [PySpark] Fix descending pyspark.rdd.sortByKey.
  [SPARK-6343] Doc driver-worker network reqs
  [SPARK-6767] [SQL] Fixed Query DSL error in spark sql Readme
  [SPARK-6781] [SQL] use sqlContext in python shell
  [SPARK-6753] Clone SparkConf in ShuffleSuite tests
  [SPARK-6506] [PySpark] Do not try to retrieve SPARK_HOME when not
 needed...
 
  Please vote on releasing this package as Apache Spark 1.3.1!
 
  The vote is open until Tuesday, April 14, at 07:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.3.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Denny Lee
The RC2 bits are lacking Hadoop 2.4 and Hadoop 2.6 - was that intended
(they were included in RC1)?


On Wed, Apr 8, 2015 at 9:01 AM Tom Graves tgraves...@yahoo.com.invalid
wrote:

 +1. Tested spark on yarn against hadoop 2.6.
 Tom


  On Wednesday, April 8, 2015 6:15 AM, Sean Owen so...@cloudera.com
 wrote:


  Still a +1 from me; same result (except that now of course the
 UISeleniumSuite test does not fail)

 On Wed, Apr 8, 2015 at 1:46 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
 1.3.1!
 
  The tag to be voted on is v1.3.1-rc2 (commit 7c4473a):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 7c4473aa5a7f5de0323394aaedeefbf9738e8eb5
 
  The list of fixes present in this release can be found at:
  http://bit.ly/1C2nVPY
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc2/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1083/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc2-docs/
 
  The patches on top of RC1 are:
 
  [SPARK-6737] Fix memory leak in OutputCommitCoordinator
  https://github.com/apache/spark/pull/5397
 
  [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
  https://github.com/apache/spark/pull/5302
 
  [SPARK-6205] [CORE] UISeleniumSuite fails for Hadoop 2.x test with
  NoClassDefFoundError
  https://github.com/apache/spark/pull/4933
 
  Please vote on releasing this package as Apache Spark 1.3.1!
 
  The vote is open until Saturday, April 11, at 07:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.3.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Denny Lee
Oh, it appears the 2.4 bits without hive are there but not the 2.4 bits
with hive. Cool stuff on the 2.6.
On Wed, Apr 8, 2015 at 12:30 Patrick Wendell pwend...@gmail.com wrote:

 Hey Denny,

 I beleive the 2.4 bits are there. The 2.6 bits I had done specially
 (we haven't merge that into our upstream build script). I'll do it
 again now for RC2.

 - Patrick

 On Wed, Apr 8, 2015 at 1:53 PM, Timothy Chen tnac...@gmail.com wrote:
  +1 Tested on 4 nodes Mesos cluster with fine-grain and coarse-grain mode.
 
  Tim
 
  On Wed, Apr 8, 2015 at 9:32 AM, Denny Lee denny.g@gmail.com wrote:
  The RC2 bits are lacking Hadoop 2.4 and Hadoop 2.6 - was that intended
  (they were included in RC1)?
 
 
  On Wed, Apr 8, 2015 at 9:01 AM Tom Graves tgraves...@yahoo.com.invalid
 
  wrote:
 
  +1. Tested spark on yarn against hadoop 2.6.
  Tom
 
 
   On Wednesday, April 8, 2015 6:15 AM, Sean Owen 
 so...@cloudera.com
  wrote:
 
 
   Still a +1 from me; same result (except that now of course the
  UISeleniumSuite test does not fail)
 
  On Wed, Apr 8, 2015 at 1:46 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Please vote on releasing the following candidate as Apache Spark
 version
  1.3.1!
  
   The tag to be voted on is v1.3.1-rc2 (commit 7c4473a):
   https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  7c4473aa5a7f5de0323394aaedeefbf9738e8eb5
  
   The list of fixes present in this release can be found at:
   http://bit.ly/1C2nVPY
  
   The release files, including signatures, digests, etc. can be found
 at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   https://repository.apache.org/content/repositories/
 orgapachespark-1083/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2-docs/
  
   The patches on top of RC1 are:
  
   [SPARK-6737] Fix memory leak in OutputCommitCoordinator
   https://github.com/apache/spark/pull/5397
  
   [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
   https://github.com/apache/spark/pull/5302
  
   [SPARK-6205] [CORE] UISeleniumSuite fails for Hadoop 2.x test with
   NoClassDefFoundError
   https://github.com/apache/spark/pull/4933
  
   Please vote on releasing this package as Apache Spark 1.3.1!
  
   The vote is open until Saturday, April 11, at 07:00 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.3.1
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   
 -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 



Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Denny Lee
+1 (non-binding)

Tested Scala, SparkSQL, and MLLib on OSX against Hadoop 2.6

On Wed, Apr 8, 2015 at 5:35 PM Joseph Bradley jos...@databricks.com wrote:

 +1 tested ML-related items on Mac OS X

 On Wed, Apr 8, 2015 at 7:59 PM, Krishna Sankar ksanka...@gmail.com
 wrote:

  +1 (non-binding, of course)
 
  1. Compiled OSX 10.10 (Yosemite) OK Total time: 14:16 min
   mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
  -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
  2. Tested pyspark, mlib - running as well as compare results with 1.3.0
 pyspark works well with the new iPython 3.0.0 release
  2.1. statistics (min,max,mean,Pearson,Spearman) OK
  2.2. Linear/Ridge/Laso Regression OK
  2.3. Decision Tree, Naive Bayes OK
  2.4. KMeans OK
 Center And Scale OK
  2.5. RDD operations OK
State of the Union Texts - MapReduce, Filter,sortByKey (word count)
  2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
 Model evaluation/optimization (rank, numIter, lambda) with
 itertools
  OK
  3. Scala - MLlib
  3.1. statistics (min,max,mean,Pearson,Spearman) OK
  3.2. LinearRegressionWithSGD OK
  3.3. Decision Tree OK
  3.4. KMeans OK
  3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
  4.0. Spark SQL from Python OK
  4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA')
 OK
 
  On Tue, Apr 7, 2015 at 10:46 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Please vote on releasing the following candidate as Apache Spark
 version
   1.3.1!
  
   The tag to be voted on is v1.3.1-rc2 (commit 7c4473a):
  
  
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 7c4473aa5a7f5de0323394aaedeefbf9738e8eb5
  
   The list of fixes present in this release can be found at:
   http://bit.ly/1C2nVPY
  
   The release files, including signatures, digests, etc. can be found at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   https://repository.apache.org/content/repositories/
 orgapachespark-1083/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2-docs/
  
   The patches on top of RC1 are:
  
   [SPARK-6737] Fix memory leak in OutputCommitCoordinator
   https://github.com/apache/spark/pull/5397
  
   [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
   https://github.com/apache/spark/pull/5302
  
   [SPARK-6205] [CORE] UISeleniumSuite fails for Hadoop 2.x test with
   NoClassDefFoundError
   https://github.com/apache/spark/pull/4933
  
   Please vote on releasing this package as Apache Spark 1.3.1!
  
   The vote is open until Saturday, April 11, at 07:00 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.3.1
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 



Re: [VOTE] Release Apache Spark 1.3.1

2015-04-05 Thread Denny Lee
+1 (non-binding)  Verified various DataFrame functions, Hive integration,
MLlib, etc. on OSX.

On Sun, Apr 5, 2015 at 9:16 PM Xiangrui Meng men...@gmail.com wrote:

 +1 Verified some MLlib bug fixes on OS X. -Xiangrui

 On Sun, Apr 5, 2015 at 1:24 AM, Sean Owen so...@cloudera.com wrote:
  Signatures and hashes are good.
  LICENSE, NOTICE still check out.
  Compiles for a Hadoop 2.6 + YARN + Hive profile.
 
  I still see the UISeleniumSuite test failure observed in 1.3.0, which
  is minor and already fixed. I don't know why I didn't back-port it:
  https://issues.apache.org/jira/browse/SPARK-6205
 
  If we roll another, let's get this easy fix in, but it is only an
  issue with tests.
 
 
  On JIRA, I checked open issues with Fix Version = 1.3.0 or 1.3.1 and
  all look legitimate (e.g. reopened or in progress)
 
 
  There is 1 open Blocker for 1.3.1 per Andrew:
  https://issues.apache.org/jira/browse/SPARK-6673 spark-shell.cmd can't
  start even when spark was built in Windows
 
  I believe this can be resolved quickly but as a matter of hygiene
  should be fixed or demoted before release.
 
 
  FYI there are 16 Critical issues marked for 1.3.0 / 1.3.1; worth
  examining before release to see how critical they are:
 
  SPARK-6701,Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python
  application,,Open,4/3/15
  SPARK-6484,Ganglia metrics xml reporter doesn't escape
  correctly,Josh Rosen,Open,3/24/15
  SPARK-6270,Standalone Master hangs when streaming job
 completes,,Open,3/11/15
  SPARK-6209,ExecutorClassLoader can leak connections after failing to
  load classes from the REPL class server,Josh Rosen,In Progress,4/2/15
  SPARK-5113,Audit and document use of hostnames and IP addresses in
  Spark,,Open,3/24/15
  SPARK-5098,Number of running tasks become negative after tasks
  lost,,Open,1/14/15
  SPARK-4925,Publish Spark SQL hive-thriftserver maven artifact,Patrick
  Wendell,Reopened,3/23/15
  SPARK-4922,Support dynamic allocation for coarse-grained
 Mesos,,Open,3/31/15
  SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge
  instances,,Open,1/27/15
  SPARK-4879,Missing output partitions after job completes with
  speculative execution,Josh Rosen,Open,3/5/15
  SPARK-4751,Support dynamic allocation for standalone mode,Andrew
  Or,Open,12/22/14
  SPARK-4454,Race condition in DAGScheduler,Josh Rosen,Reopened,2/18/15
  SPARK-4452,Shuffle data structures can starve others on the same
  thread for memory,Tianshuo Deng,Open,1/24/15
  SPARK-4352,Incorporate locality preferences in dynamic allocation
  requests,,Open,1/26/15
  SPARK-4227,Document external shuffle service,,Open,3/23/15
  SPARK-3650,Triangle Count handles reverse edges incorrectly,,Open,2/23/15
 
  On Sun, Apr 5, 2015 at 1:09 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark
 version 1.3.1!
 
  The tag to be voted on is v1.3.1-rc1 (commit 0dcb5d9f):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 0dcb5d9f31b713ed90bcec63ebc4e530cbb69851
 
  The list of fixes present in this release can be found at:
  http://bit.ly/1C2nVPY
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1080
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.3.1-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.3.1!
 
  The vote is open until Wednesday, April 08, at 01:10 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.3.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Starting sparkthrift server

2015-03-23 Thread Denny Lee
When you say the job has access, do you mean that when you run spark-submit
or spark-shell (for example), it is able to write to the /tmp/spark-events
folder?


On Mon, Mar 23, 2015 at 1:02 PM Neil Dev neilk...@gmail.com wrote:

 we are running this right now as root user and the folder /tmp/spark-events
 was manually created and the Job has access to this folder

 On Mon, Mar 23, 2015 at 3:38 PM, Denny Lee denny.g@gmail.com wrote:

 It appears that you are running the thrift-server using the spark-events
 account but the /tmp/spark-events folder doesn't exist or the user running
 thrift-server does not have access to it.  Have you been able to run Hive
 using the spark-events user so that way the /tmp/spark-events folder has
 been created.  If you need to reassign the scratch dir / log dir to another
 folder (instead of /tmp/spark-events), you could use  --hiveconf to assign
 those to another folder.


 On Mon, Mar 23, 2015 at 8:39 AM Neil Dev neilk...@gmail.com wrote:

 Hi,

 I am having issues starting spark-thriftserver. I'm running spark 1.3.o
 with Hadoop 2.4.0. I would like to be able to change its port too so, I
 can
 hive hive-thriftserver as well as spark-thriftserver running at the same
 time.

 Starting sparkthrift server:-
 sudo ./start-thriftserver.sh --master spark://ip-172-31-10-124:7077
 --executor-memory 2G

 Error:-
 I created the folder manually but still getting the following error
 Exception in thread main java.lang.IllegalArgumentException: Log
 directory /tmp/spark-events does not exist.


 I am getting the following error
 15/03/23 15:07:02 ERROR thrift.ThriftCLIService: Error:
 org.apache.thrift.transport.TTransportException: Could not create
 ServerSocket on address0.0.0.0/0.0.0.0:1.
 at
 org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:93)
 at
 org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:79)
 at
 org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(
 HiveAuthFactory.java:236)
 at
 org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.
 run(ThriftBinaryCLIService.java:69)
 at java.lang.Thread.run(Thread.java:745)

 Thanks
 Neil





Re: PySpark SPARK_CLASSPATH doesn't distribute jars to executors

2015-02-24 Thread Denny Lee
Can you try extraClassPath or driver-class-path and see if that helps with
the distribution?
On Tue, Feb 24, 2015 at 14:54 Michael Nazario mnaza...@palantir.com wrote:

 Has anyone experienced a problem with the SPARK_CLASSPATH not distributing
 jars for PySpark? I have a detailed description of what I tried in the
 ticket below, and this seems like a problem that is not a configuration
 problem. The only other case I can think of is that configuration changed
 between Spark 1.1.1 and Spark 1.2.1 about distributing jars for PySpark.

 https://issues.apache.org/jira/browse/SPARK-5977

 Thanks,
 Michael



Spark 1.3 RC1 Generate schema based on string of schema

2015-02-20 Thread Denny Lee
In the Spark SQL 1.2 Programmers Guide, we can generate the schema based on
the string of schema via

val schema =
  StructType(
schemaString.split( ).map(fieldName = StructField(fieldName,
StringType, true)))

But when running this on Spark 1.3.0 (RC1), I get the error:

val schema =  StructType(schemaString.split( ).map(fieldName =
StructField(fieldName, StringType, true)))

console:26: error: not found: value StringType

   val schema =  StructType(schemaString.split( ).map(fieldName =
StructField(fieldName, StringType, true)))

I'm looking through the various datatypes within
org.apache.spark.sql.types.DataType
but thought I'd ask to see if I was missing something obvious here.

Thanks!
Denny


Re: Spark 1.3 RC1 Generate schema based on string of schema

2015-02-20 Thread Denny Lee
Oh, I just realized that I never imported all of sql._ .  My bad!


On Fri Feb 20 2015 at 7:51:32 AM Denny Lee denny.g@gmail.com wrote:

 In the Spark SQL 1.2 Programmers Guide, we can generate the schema based
 on the string of schema via

 val schema =
   StructType(
 schemaString.split( ).map(fieldName = StructField(fieldName,
 StringType, true)))

 But when running this on Spark 1.3.0 (RC1), I get the error:

 val schema =  StructType(schemaString.split( ).map(fieldName =
 StructField(fieldName, StringType, true)))

 console:26: error: not found: value StringType

val schema =  StructType(schemaString.split( ).map(fieldName =
 StructField(fieldName, StringType, true)))

 I'm looking through the various datatypes within 
 org.apache.spark.sql.types.DataType
 but thought I'd ask to see if I was missing something obvious here.

 Thanks!


 Denny



Re: Powered by Spark: Concur

2015-02-09 Thread Denny Lee
Thanks Matei - much appreciated!

On Mon Feb 09 2015 at 10:23:57 PM Matei Zaharia matei.zaha...@gmail.com
wrote:

 Thanks Denny; added you.

 Matei

  On Feb 9, 2015, at 10:11 PM, Denny Lee denny.g@gmail.com wrote:
 
  Forgot to add Concur to the Powered by Spark wiki:
 
  Concur
  https://www.concur.com
  Spark SQL, MLLib
  Using Spark for travel and expenses analytics and personalization
 
  Thanks!
  Denny




Re: Welcoming three new committers

2015-02-03 Thread Denny Lee
Awesome stuff - congratulations! :)

On Tue Feb 03 2015 at 5:34:06 PM Chao Chen crazy...@gmail.com wrote:

 Congratulations guys, well done!

 在 15-2-4 上午9:26, Nan Zhu 写道:
  Congratulations!
 
  --
  Nan Zhu
  http://codingcat.me
 
 
  On Tuesday, February 3, 2015 at 8:08 PM, Xuefeng Wu wrote:
 
  Congratulations!well done.
 
  Yours, Xuefeng Wu 吴雪峰 敬上
 
  On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com
 (mailto:matei.zaha...@gmail.com) wrote:
 
  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
 dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
 dev-h...@spark.apache.org)
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
 dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
 dev-h...@spark.apache.org)
 
 
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Starting with Spark

2014-12-25 Thread Denny Lee
Thanks for the catch Naveen!

On Thu Dec 25 2014 at 10:47:18 AM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 Thanks for the pointer. This will be fixed in this PR
 https://github.com/apache/spark/pull/3802.

 On Thu Dec 25 2014 at 10:35:20 AM Naveen Madhire vmadh...@umail.iu.edu
 wrote:

  Thanks. I will work on this today and try to setup.
 
  The bad link is present in the below github REAMME file,
 
  https://github.com/apache/spark
 
  Search with Build Spark with Maven
 
  On Thu, Dec 25, 2014 at 1:49 AM, Nicholas Chammas 
  nicholas.cham...@gmail.com wrote:
 
  The correct docs link is:
  https://spark.apache.org/docs/1.2.0/building-spark.html
 
  Where did you get that bad link from?
 
  Nick
 
 
 
  On Thu Dec 25 2014 at 12:00:53 AM Naveen Madhire vmadh...@umail.iu.edu
 
  wrote:
 
  Hi All,
 
  I am starting to use Spark. I am having trouble getting the latest code
  from git.
  I am using Intellij as suggested in the below link,
 
  https://cwiki.apache.org/confluence/display/SPARK/
 Contributing+to+Spark#
  ContributingtoSpark-StarterTasks
 
 
  The below link isn't working as well,
 
  http://spark.apache.org/building-spark.html
 
 
  Does anyone know any useful links to get spark running on local laptop
 
  Please help.
 
 
  Thanks
 
 
 



Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-12 Thread Denny Lee
+1 Tested on OSX

Tested Scala 2.10.3, SparkSQL with Hive 0.12 / Hadoop 2.5, Thrift Server,
MLLib SVD


On Fri Dec 12 2014 at 8:57:16 PM Mark Hamstra m...@clearstorydata.com
wrote:

 +1

 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
 
  +1.  Tested using spark-perf and the Spark EC2 scripts.  I didn’t notice
  any performance regressions that could not be attributed to changes of
  default configurations.  To be more specific, when running Spark 1.2.0
 with
  the Spark 1.1.0 settings of spark.shuffle.manager=hash and
  spark.shuffle.blockTransferService=nio, there was no performance
 regression
  and, in fact, there were significant performance improvements for some
  workloads.
 
  In Spark 1.2.0, the new default settings are spark.shuffle.manager=sort
  and spark.shuffle.blockTransferService=netty.  With these new settings,
 I
  noticed a performance regression in the scala-sort-by-key-int spark-perf
  test.  However, Spark 1.1.0 and 1.1.1 exhibit a similar performance
  regression for that same test when run with spark.shuffle.manager=sort,
 so
  this regression seems explainable by the change of defaults.  Besides
 this,
  most of the other tests ran at the same speeds or faster with the new
 1.2.0
  defaults.  Also, keep in mind that this is a somewhat artificial micro
  benchmark; I have heard anecdotal reports from many users that their real
  workloads have run faster with 1.2.0.
 
  Based on these results, I’m comfortable giving a +1 on 1.2.0 RC2.
 
  - Josh
 
  On December 11, 2014 at 9:52:39 AM, Sandy Ryza (sandy.r...@cloudera.com)
  wrote:
 
  +1 (non-binding). Tested on Ubuntu against YARN.
 
  On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com
 wrote:
 
   +1
  
   Tested on OS X.
  
   On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com
   wrote:
  
Please vote on releasing the following candidate as Apache Spark
  version
1.2.0!
   
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
   
   
  
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 a428c446e23e628b746e0626cc02b7b3cadf588e
   
The release files, including signatures, digests, etc. can be found
 at:
http://people.apache.org/~pwendell/spark-1.2.0-rc2/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
   
  https://repository.apache.org/content/repositories/orgapachespark-1055/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/
   
Please vote on releasing this package as Apache Spark 1.2.0!
   
The vote is open until Saturday, December 13, at 21:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 1.2.0
[ ] -1 Do not release this package because ...
   
To learn more about Apache Spark, please see
http://spark.apache.org/
   
== What justifies a -1 vote for this release? ==
This vote is happening relatively late into the QA period, so
-1 votes should only occur for significant regressions from
1.0.2. Bugs already present in 1.1.X, minor
regressions, or bugs related to new features will not block this
release.
   
== What default changes should I be aware of? ==
1. The default value of spark.shuffle.blockTransferService has
 been
changed to netty
-- Old behavior can be restored by switching to nio
   
2. The default value of spark.shuffle.manager has been changed to
   sort.
-- Old behavior can be restored by setting spark.shuffle.manager
 to
hash.
   
== How does this differ from RC1 ==
This has fixes for a handful of issues identified - some of the
notable fixes are:
   
[Core]
SPARK-4498: Standalone Master can fail to recognize completed/failed
applications
   
[SQL]
SPARK-4552: Query for empty parquet table in spark sql hive get
IllegalArgumentException
SPARK-4753: Parquet2 does not prune based on OR filters on partition
columns
SPARK-4761: With JDBC server, set Kryo as default serializer and
disable reference tracking
SPARK-4785: When called with arguments referring column fields, PMOD
throws NPE
   
- Patrick
   

 -
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  javascript:;
For additional commands, e-mail: dev-h...@spark.apache.org
   javascript:;
   
   
  
 



Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Denny Lee
+1 (non-binding)

Verified on OSX 10.10.2, built from source,
spark-shell / spark-submit jobs
ran various simple Spark / Scala queries
ran various SparkSQL queries (including HiveContext)
ran ThriftServer service and connected via beeline
ran SparkSVD


On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com
wrote:

 Hey All,

 Just an update. Josh, Andrew, and others are working to reproduce
 SPARK-4498 and fix it. Other than that issue no serious regressions
 have been reported so far. If we are able to get a fix in for that
 soon, we'll likely cut another RC with the patch.

 Continued testing of RC1 is definitely appreciated!

 I'll leave this vote open to allow folks to continue posting comments.
 It's fine to still give +1 from your own testing... i.e. you can
 assume at this point SPARK-4498 will be fixed before releasing.

 - Patrick

 On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
 while things work, I noticed a few recent scripts don't have Windows
 equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and
 https://issues.apache.org/jira/browse/SPARK-4684. The first one at least
 would be good to fix if we do another RC. Not blocking the release but
 useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685.
 
  Matei
 
 
  On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote:
 
  Hi everyone,
 
  There's an open bug report related to Spark standalone which could be a
 potential release-blocker (pending investigation / a bug fix):
 https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
 non-deterministc and only affects long-running Spark standalone
 deployments, so it may be hard to reproduce.  I'm going to work on a patch
 to add additional logging in order to help with debugging.
 
  I just wanted to give an early head's up about this issue and to get
 more eyes on it in case anyone else has run into it or wants to help with
 debugging.
 
  - Josh
 
  On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com)
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version 1.2.0!
 
  The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 1056e9ec13203d0c51564265e94d77a054498fdb
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1048/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.2.0!
 
  The vote is open until Tuesday, December 02, at 05:15 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.1.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What justifies a -1 vote for this release? ==
  This vote is happening very late into the QA period compared with
  previous votes, so -1 votes should only occur for significant
  regressions from 1.0.2. Bugs already present in 1.1.X, minor
  regressions, or bugs related to new features will not block this
  release.
 
  == What default changes should I be aware of? ==
  1. The default value of spark.shuffle.blockTransferService has been
  changed to netty
  -- Old behavior can be restored by switching to nio
 
  2. The default value of spark.shuffle.manager has been changed to
 sort.
  -- Old behavior can be restored by setting spark.shuffle.manager to
 hash.
 
  == Other notes ==
  Because this vote is occurring over a weekend, I will likely extend
  the vote if this RC survives until the end of the vote period.
 
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Building and Running Spark on OS X

2014-10-20 Thread Denny Lee
+1 
huge fan of sbt with OSX


 On Oct 20, 2014, at 17:00, Reynold Xin r...@databricks.com wrote:
 
 I usually use SBT on Mac and that one doesn't require any setup ...
 
 
 On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:
 
 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?
 
 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh
 
 Nick
 ​
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently 
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 
without any issues.


On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:

I see the binary packages include hadoop 1, 2.3 and 2.4.  
Does Spark 1.1.0 support hadoop 2.5.0 at below address?  

http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available
  

-Original Message-  
From: Patrick Wendell [mailto:pwend...@gmail.com]  
Sent: Friday, September 12, 2014 8:13 AM  
To: dev@spark.apache.org; u...@spark.apache.org  
Subject: Announcing Spark 1.1.0!  

I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is  
the second release on the API-compatible 1.X line. It is Spark's  
largest release ever, with contributions from 171 developers!  

This release brings operational and performance improvements in Spark  
core including a new implementation of the Spark shuffle designed for  
very large scale workloads. Spark 1.1 adds significant extensions to  
the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a  
JDBC server, byte code generation for fast expression evaluation, a  
public types API, JSON support, and other features and optimizations.  
MLlib introduces a new statistics library along with several new  
algorithms and optimizations. Spark 1.1 also builds out Spark's Python  
support and adds new components to the Spark Streaming module.  

Visit the release notes [1] to read about the new features, or  
download [2] the release today.  

[1] http://spark.eu.apache.org/releases/spark-release-1-1-0.html  
[2] http://spark.eu.apache.org/downloads.html  

NOTE: SOME ASF DOWNLOAD MIRRORS WILL NOT CONTAIN THE RELEASE FOR SEVERAL HOURS. 
 

Please e-mail me directly for any type-o's in the release notes or name 
listing.  

Thanks, and congratulations!  
- Patrick  

-  
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org  
For additional commands, e-mail: user-h...@spark.apache.org  



RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Please correct me if I’m wrong but I was under the impression as per the maven 
repositories that it was just to stay more in sync with the various version of 
Hadoop.  Looking at the latest documentation 
(https://spark.apache.org/docs/latest/building-with-maven.html), there are 
multiple Hadoop versions called out.

As for the potential differences in Spark, this is more about ensuring the 
various jars and library dependencies of the correct version of Hadoop are 
included so there can be proper connectivity to Hadoop from Spark vs. any 
differences in Spark itself.   Another good reference on this topic is call out 
for Hadoop versions within github: https://github.com/apache/spark

HTH!


On September 11, 2014 at 18:39:10, Haopu Wang (hw...@qilinsoft.com) wrote:

Danny, thanks for the response.

 

I raise the question because in Spark 1.0.2, I saw one binary package for 
hadoop2, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4.

That implies some difference in Spark according to hadoop version.

 

From:Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 9:35 AM
To: u...@spark.apache.org; Haopu Wang; dev@spark.apache.org; Patrick Wendell
Subject: RE: Announcing Spark 1.1.0!

 

I’m not sure if I’m completely answering your question here but I’m currently 
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 
without any issues.

 

 

On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:

I see the binary packages include hadoop 1, 2.3 and 2.4.
Does Spark 1.1.0 support hadoop 2.5.0 at below address?

http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available

-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Friday, September 12, 2014 8:13 AM
To: dev@spark.apache.org; u...@spark.apache.org
Subject: Announcing Spark 1.1.0!

I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is
the second release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 171 developers!

This release brings operational and performance improvements in Spark
core including a new implementation of the Spark shuffle designed for
very large scale workloads. Spark 1.1 adds significant extensions to
the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a
JDBC server, byte code generation for fast expression evaluation, a
public types API, JSON support, and other features and optimizations.
MLlib introduces a new statistics library along with several new
algorithms and optimizations. Spark 1.1 also builds out Spark's Python
support and adds new components to the Spark Streaming module.

Visit the release notes [1] to read about the new features, or
download [2] the release today.

[1] http://spark.eu.apache.org/releases/spark-release-1-1-0.html
[2] http://spark.eu.apache.org/downloads.html

NOTE: SOME ASF DOWNLOAD MIRRORS WILL NOT CONTAIN THE RELEASE FOR SEVERAL HOURS.

Please e-mail me directly for any type-o's in the release notes or name listing.

Thanks, and congratulations!
- Patrick

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Yes, atleast for my query scenarios, I have been able to use Spark 1.1 with 
Hadoop 2.4 against Hadoop 2.5.  Note, Hadoop 2.5 is considered a relatively 
minor release 
(http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available)
 where Hadoop 2.4 and 2.3 were considered more significant releases.



On September 11, 2014 at 19:22:05, Haopu Wang (hw...@qilinsoft.com) wrote:

From the web page 
(https://spark.apache.org/docs/latest/building-with-maven.html) which is 
pointed out by you, it’s saying “Because HDFS is not protocol-compatible across 
versions, if you want to read from HDFS, you’ll need to build Spark against the 
specific HDFS version in your environment.”

 

Did you try to read a hadoop 2.5.0 file using Spark 1.1 with hadoop 2.4?

 

Thanks!

 

From:Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 10:00 AM
To: Patrick Wendell; Haopu Wang; dev@spark.apache.org; u...@spark.apache.org
Subject: RE: Announcing Spark 1.1.0!

 

Please correct me if I’m wrong but I was under the impression as per the maven 
repositories that it was just to stay more in sync with the various version of 
Hadoop.  Looking at the latest documentation 
(https://spark.apache.org/docs/latest/building-with-maven.html), there are 
multiple Hadoop versions called out.

 

As for the potential differences in Spark, this is more about ensuring the 
various jars and library dependencies of the correct version of Hadoop are 
included so there can be proper connectivity to Hadoop from Spark vs. any 
differences in Spark itself.   Another good reference on this topic is call out 
for Hadoop versions within github: https://github.com/apache/spark

 

HTH!

 

 

On September 11, 2014 at 18:39:10, Haopu Wang (hw...@qilinsoft.com) wrote:

Danny, thanks for the response.

 

I raise the question because in Spark 1.0.2, I saw one binary package for 
hadoop2, but in Spark 1.1.0, there are separate packages for hadoop 2.3 and 2.4.

That implies some difference in Spark according to hadoop version.

 

From:Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 9:35 AM
To: u...@spark.apache.org; Haopu Wang; dev@spark.apache.org; Patrick Wendell
Subject: RE: Announcing Spark 1.1.0!

 

I’m not sure if I’m completely answering your question here but I’m currently 
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 
without any issues.

 

 

On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:

I see the binary packages include hadoop 1, 2.3 and 2.4.
Does Spark 1.1.0 support hadoop 2.5.0 at below address?

http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available

-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Friday, September 12, 2014 8:13 AM
To: dev@spark.apache.org; u...@spark.apache.org
Subject: Announcing Spark 1.1.0!

I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is
the second release on the API-compatible 1.X line. It is Spark's
largest release ever, with contributions from 171 developers!

This release brings operational and performance improvements in Spark
core including a new implementation of the Spark shuffle designed for
very large scale workloads. Spark 1.1 adds significant extensions to
the newest Spark modules, MLlib and Spark SQL. Spark SQL introduces a
JDBC server, byte code generation for fast expression evaluation, a
public types API, JSON support, and other features and optimizations.
MLlib introduces a new statistics library along with several new
algorithms and optimizations. Spark 1.1 also builds out Spark's Python
support and adds new components to the Spark Streaming module.

Visit the release notes [1] to read about the new features, or
download [2] the release today.

[1] http://spark.eu.apache.org/releases/spark-release-1-1-0.html
[2] http://spark.eu.apache.org/downloads.html

NOTE: SOME ASF DOWNLOAD MIRRORS WILL NOT CONTAIN THE RELEASE FOR SEVERAL HOURS.

Please e-mail me directly for any type-o's in the release notes or name listing.

Thanks, and congratulations!
- Patrick

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Denny Lee
+1

on OSX Yosemite, built with Hadoop 2.4.1, Hive 0.12 testing SparkSQL,
Thrift, MySQL metastore



On Wed, Sep 3, 2014 at 4:02 PM, Jeremy Freeman freeman.jer...@gmail.com
wrote:

 +1



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC4-tp8219p8254.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Denny Lee
+1  Tested on Mac OSX, Thrift Server, SparkSQL


On September 2, 2014 at 17:29:29, Michael Armbrust (mich...@databricks.com) 
wrote:

+1  


On Tue, Sep 2, 2014 at 5:18 PM, Matei Zaharia matei.zaha...@gmail.com  
wrote:  

 +1  
  
 Tested on Mac OS X.  
  
 Matei  
  
 On September 2, 2014 at 5:03:19 PM, Kan Zhang (kzh...@apache.org) wrote:  
  
 +1  
  
 Verified PySpark InputFormat/OutputFormat examples.  
  
  
 On Tue, Sep 2, 2014 at 4:10 PM, Reynold Xin r...@databricks.com wrote:  
  
  +1  
   
   
  On Tue, Sep 2, 2014 at 3:08 PM, Cheng Lian lian.cs@gmail.com  
 wrote:  
   
   +1  

   - Tested Thrift server and SQL CLI locally on OSX 10.9.  
   - Checked datanucleus dependencies in distribution tarball built by  
   make-distribution.sh without SPARK_HIVE defined.  

   ​  


   On Tue, Sep 2, 2014 at 2:30 PM, Will Benton wi...@redhat.com wrote:  

+1  
 
Tested Scala/MLlib apps on Fedora 20 (OpenJDK 7) and OS X 10.9  
 (Oracle  
   JDK  
8).  
 
 
best,  
wb  
 
 
- Original Message -  
 From: Patrick Wendell pwend...@gmail.com  
 To: dev@spark.apache.org  
 Sent: Saturday, August 30, 2014 5:07:52 PM  
 Subject: [VOTE] Release Apache Spark 1.1.0 (RC3)  
  
 Please vote on releasing the following candidate as Apache Spark  
   version  
 1.1.0!  
  
 The tag to be voted on is v1.1.0-rc3 (commit b2d0493b):  
  
 

   
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b2d0493b223c5f98a593bb6d7372706cc02bebad
   
  
 The release files, including signatures, digests, etc. can be found  
  at:  
 http://people.apache.org/~pwendell/spark-1.1.0-rc3/  
  
 Release artifacts are signed with the following key:  
 https://people.apache.org/keys/committer/pwendell.asc  
  
 The staging repository for this release can be found at:  
  

 https://repository.apache.org/content/repositories/orgapachespark-1030/  
  
 The documentation corresponding to this release can be found at:  
 http://people.apache.org/~pwendell/spark-1.1.0-rc3-docs/  
  
 Please vote on releasing this package as Apache Spark 1.1.0!  
  
 The vote is open until Tuesday, September 02, at 23:07 UTC and  
 passes  
   if  
 a majority of at least 3 +1 PMC votes are cast.  
  
 [ ] +1 Release this package as Apache Spark 1.1.0  
 [ ] -1 Do not release this package because ...  
  
 To learn more about Apache Spark, please see  
 http://spark.apache.org/  
  
 == Regressions fixed since RC1 ==  
 - Build issue for SQL support:  
 https://issues.apache.org/jira/browse/SPARK-3234  
 - EC2 script version bump to 1.1.0.  
  
 == What justifies a -1 vote for this release? ==  
 This vote is happening very late into the QA period compared with  
 previous votes, so -1 votes should only occur for significant  
 regressions from 1.0.2. Bugs already present in 1.0.X will not  
 block  
 this release.  
  
 == What default changes should I be aware of? ==  
 1. The default value of spark.io.compression.codec is now  
 snappy  
 -- Old behavior can be restored by switching to lzf  
  
 2. PySpark now performs external spilling during aggregations.  
 -- Old behavior can be restored by setting spark.shuffle.spill  
 to  
false.  
  
  
 -  
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
 For additional commands, e-mail: dev-h...@spark.apache.org  
  
  
 
-  
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
For additional commands, e-mail: dev-h...@spark.apache.org