Re: My curation of pending structured streaming PRs to review

2019-08-18 Thread Jungtaek Lim
As a reminder, the list contains two correctness bugs: stream-stream outer
join, and multiple stateful operations with watermark.

Regarding common theme, yes that's somewhat I'd rather avoid to say, but
honestly I feel there's shortage on active committers on 'structured
streaming'.

Many of them I know as relevant to SS area didn't show up themselves in
Spark community for around half a year (maybe even more), and unfortunately
even active committers seem to have struggled with shortage of time doing
their own works (that's natural) and haven't found time to focus reviewing
other PRs (provide valuable comments but not leading PRs as shepherd to be
merged). I hoped that's temporary issue for some important events like
Spark+AI summit, and turned out it's not.

Spark has no replacement of SS, DStream is now even cared less than SS.
Does Spark community not feeling important from streaming area? I might not
agree, as there're reports from end users and patches proposed so far from
contributors. I wouldn't the right one to say how can solve the issue, but
I hope we would handle the main issue nicely and less painful way.


On Tue, Aug 13, 2019 at 10:42 PM Sean Owen  wrote:

> General tips:
>
> - dev@ is not usually the right place to discuss _specific_ changes
> except once in a while to call attention
> - Ping the authors of the code being changed directly
> - Tighten the change if possible
> - Tests, reproductions, docs, etc help prove the change
> - Bugs are more important than new marginal features
>
> If there has been some feedback that's just skeptical about the
> approach or value, that may be the answer, it won't be merged.
> If there is no feedback and it seems important (correctness bugs) it's
> OK to raise that here once in a while.
>
> One common theme here is 'structured streaming' -- who amongst the
> committers feels they are able to review these changes? I sense we
> have a shortage there.
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


Re: Release Spark 2.3.4

2019-08-18 Thread Saisai Shao
+1

Wenchen Fan  于2019年8月19日周一 上午10:28写道:

> +1
>
> On Sat, Aug 17, 2019 at 3:37 PM Hyukjin Kwon  wrote:
>
>> +1 too
>>
>> 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성:
>>
>>> +1
>>>
>>> Regards,
>>> Dilip Biswal
>>> Tel: 408-463-4980
>>> dbis...@us.ibm.com
>>>
>>>
>>>
>>> - Original message -
>>> From: John Zhuge 
>>> To: Xiao Li 
>>> Cc: Takeshi Yamamuro , Spark dev list <
>>> dev@spark.apache.org>, Kazuaki Ishizaki 
>>> Subject: [EXTERNAL] Re: Release Spark 2.3.4
>>> Date: Fri, Aug 16, 2019 4:33 PM
>>>
>>> +1
>>>
>>> On Fri, Aug 16, 2019 at 4:25 PM Xiao Li  wrote:
>>>
>>> +1
>>>
>>> On Fri, Aug 16, 2019 at 4:11 PM Takeshi Yamamuro 
>>> wrote:
>>>
>>> +1, too
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Sat, Aug 17, 2019 at 7:25 AM Dongjoon Hyun 
>>> wrote:
>>>
>>> +1 for 2.3.4 release as the last release for `branch-2.3` EOL.
>>>
>>> Also, +1 for next week release.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Fri, Aug 16, 2019 at 8:19 AM Sean Owen  wrote:
>>>
>>> I think it's fine to do these in parallel, yes. Go ahead if you are
>>> willing.
>>>
>>> On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki 
>>> wrote:
>>> >
>>> > Hi, All.
>>> >
>>> > Spark 2.3.3 was released six months ago (15th February, 2019) at
>>> http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18
>>> months have been passed after Spark 2.3.0 has been released (28th February,
>>> 2018).
>>> > As of today (16th August), there are 103 commits (69 JIRAs) in
>>> `branch-23` since 2.3.3.
>>> >
>>> > It would be great if we can have Spark 2.3.4.
>>> > If it is ok, shall we start `2.3.4 RC1` concurrent with 2.4.4 or after
>>> 2.4.4 will be released?
>>> >
>>> > A issue list in jira:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>>> > A commit list in github from the last release:
>>> https://github.com/apache/spark/compare/66fd9c34bf406a4b5f86605d06c9607752bd637a...branch-2.3
>>> > The 8 correctness issues resolved in branch-2.3:
>>> >
>>> https://issues.apache.org/jira/browse/SPARK-26873?jql=project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012344844%20AND%20labels%20in%20(%27correctness%27)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
>>> >
>>> > Best Regards,
>>> > Kazuaki Ishizaki
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>>
>>>
>>> --
>>> [image: Databricks Summit - Watch the talks]
>>> 
>>>
>>>
>>>
>>> --
>>> John Zhuge
>>>
>>>
>>>
>>> - To
>>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Release Spark 2.3.4

2019-08-18 Thread Wenchen Fan
+1

On Sat, Aug 17, 2019 at 3:37 PM Hyukjin Kwon  wrote:

> +1 too
>
> 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성:
>
>> +1
>>
>> Regards,
>> Dilip Biswal
>> Tel: 408-463-4980
>> dbis...@us.ibm.com
>>
>>
>>
>> - Original message -
>> From: John Zhuge 
>> To: Xiao Li 
>> Cc: Takeshi Yamamuro , Spark dev list <
>> dev@spark.apache.org>, Kazuaki Ishizaki 
>> Subject: [EXTERNAL] Re: Release Spark 2.3.4
>> Date: Fri, Aug 16, 2019 4:33 PM
>>
>> +1
>>
>> On Fri, Aug 16, 2019 at 4:25 PM Xiao Li  wrote:
>>
>> +1
>>
>> On Fri, Aug 16, 2019 at 4:11 PM Takeshi Yamamuro 
>> wrote:
>>
>> +1, too
>>
>> Bests,
>> Takeshi
>>
>> On Sat, Aug 17, 2019 at 7:25 AM Dongjoon Hyun 
>> wrote:
>>
>> +1 for 2.3.4 release as the last release for `branch-2.3` EOL.
>>
>> Also, +1 for next week release.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Fri, Aug 16, 2019 at 8:19 AM Sean Owen  wrote:
>>
>> I think it's fine to do these in parallel, yes. Go ahead if you are
>> willing.
>>
>> On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki 
>> wrote:
>> >
>> > Hi, All.
>> >
>> > Spark 2.3.3 was released six months ago (15th February, 2019) at
>> http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18
>> months have been passed after Spark 2.3.0 has been released (28th February,
>> 2018).
>> > As of today (16th August), there are 103 commits (69 JIRAs) in
>> `branch-23` since 2.3.3.
>> >
>> > It would be great if we can have Spark 2.3.4.
>> > If it is ok, shall we start `2.3.4 RC1` concurrent with 2.4.4 or after
>> 2.4.4 will be released?
>> >
>> > A issue list in jira:
>> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>> > A commit list in github from the last release:
>> https://github.com/apache/spark/compare/66fd9c34bf406a4b5f86605d06c9607752bd637a...branch-2.3
>> > The 8 correctness issues resolved in branch-2.3:
>> >
>> https://issues.apache.org/jira/browse/SPARK-26873?jql=project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012344844%20AND%20labels%20in%20(%27correctness%27)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
>> >
>> > Best Regards,
>> > Kazuaki Ishizaki
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>>
>>
>> --
>> [image: Databricks Summit - Watch the talks]
>> 
>>
>>
>>
>> --
>> John Zhuge
>>
>>
>>
>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


RE: Release Spark 2.3.4

2019-08-18 Thread Kazuaki Ishizaki
Hi all,
Thank you. I will prepare RC for 2.3.4 this week in parallel. It will be 
in parallel with RC for 2.4.4 managed by Dongjoon.

Regards,
Kazuaki Ishizaki



From:   Hyukjin Kwon 
To: Dilip Biswal 
Cc: jzh...@apache.org, dev , Kazuaki Ishizaki 
, Takeshi Yamamuro , Xiao Li 

Date:   2019/08/17 16:37
Subject:[EXTERNAL] Re: Release Spark 2.3.4



+1 too

2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성
:
+1
 
Regards,
Dilip Biswal
Tel: 408-463-4980
dbis...@us.ibm.com
 
 
- Original message -
From: John Zhuge 
To: Xiao Li 
Cc: Takeshi Yamamuro , Spark dev list <
dev@spark.apache.org>, Kazuaki Ishizaki 
Subject: [EXTERNAL] Re: Release Spark 2.3.4
Date: Fri, Aug 16, 2019 4:33 PM
  
+1
  
On Fri, Aug 16, 2019 at 4:25 PM Xiao Li  wrote:
+1
  
On Fri, Aug 16, 2019 at 4:11 PM Takeshi Yamamuro  
wrote:
+1, too 
 
Bests,
Takeshi
  
On Sat, Aug 17, 2019 at 7:25 AM Dongjoon Hyun  
wrote:
+1 for 2.3.4 release as the last release for `branch-2.3` EOL. 
 
Also, +1 for next week release.
 
Bests,
Dongjoon.
 
  
On Fri, Aug 16, 2019 at 8:19 AM Sean Owen  wrote:
I think it's fine to do these in parallel, yes. Go ahead if you are 
willing.

On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki  
wrote:
>
> Hi, All.
>
> Spark 2.3.3 was released six months ago (15th February, 2019) at 
http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18 
months have been passed after Spark 2.3.0 has been released (28th 
February, 2018).
> As of today (16th August), there are 103 commits (69 JIRAs) in 
`branch-23` since 2.3.3.
>
> It would be great if we can have Spark 2.3.4.
> If it is ok, shall we start `2.3.4 RC1` concurrent with 2.4.4 or after 
2.4.4 will be released?
>
> A issue list in jira: 
https://issues.apache.org/jira/projects/SPARK/versions/12344844
> A commit list in github from the last release: 
https://github.com/apache/spark/compare/66fd9c34bf406a4b5f86605d06c9607752bd637a...branch-2.3

> The 8 correctness issues resolved in branch-2.3:
> 
https://issues.apache.org/jira/browse/SPARK-26873?jql=project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012344844%20AND%20labels%20in%20(%27correctness%27)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC

>
> Best Regards,
> Kazuaki Ishizaki

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 
  
 
-- 
---
Takeshi Yamamuro
  
 
-- 
 
  
 
-- 
John Zhuge
 

- To 
unsubscribe e-mail: dev-unsubscr...@spark.apache.org 




Aggregate pushdown for data source

2019-08-18 Thread Arun Khetarpal
Hi Folks:

I have implemented a data source v2 API for an internal source. As a
consequence of generating the data source, we have bunch of statistical
information about the source which i can potentially use, only if spark
pushes down the aggregates down to the data source itself.

I see that there is already a comprehensive Jira for the same:
https://issues.apache.org/jira/browse/SPARK-22390. Are there any more
blockers for this Jira? I'll be more than happy to contribute if no one
else had picked it up.

Regards,
Arun