Re: Recovering SparkR on CRAN?

2020-12-30 Thread Hyukjin Kwon
I just double checked. 3.0.1 is the latest one that has my fix.

2020년 12월 31일 (목) 오전 9:21, Hyukjin Kwon 님이 작성:

> Nice, yeah, 3.0.1 should have all fixes needed.
>
> 2020년 12월 31일 (목) 오전 5:23, Felix Cheung 님이 작성:
>
>> We could just submit the latest release with the fix again. I would not
>> recommend waiting, often time there are some external changes that are not
>> caught, and a fix will need to go through a release vote.
>>
>> What is the latest release with your fix? 3.0.1? I can put it in but will
>> need to make sure we can get hold of Shivaram.
>>
>>
>> On Tue, Dec 29, 2020 at 11:05 PM Hyukjin Kwon 
>> wrote:
>>
>>> Let me try in this release - I will have to ask some questions to both
>>> of you. I will email you guys offline or private mailing list.
>>> If I happen to be stuck for a difficult reason, I think we can consider
>>> dropping it as Dongjoon initially pointed out.
>>>
>>> 2020년 12월 30일 (수) 오후 1:59, Felix Cheung 님이 작성:
>>>
 Ah, I don’t recall actually - maybe it was just missed?

 The last message I had, was in June when it was broken by R 4.0.1,
 which was fixed.


 On Tue, Dec 29, 2020 at 7:21 PM Hyukjin Kwon 
 wrote:

> BTW, I remember I fixed all standing issues at
> https://issues.apache.org/jira/browse/SPARK-31918 and
> https://issues.apache.org/jira/browse/SPARK-32073.
> I wonder why other releases were not uploaded yet. Do you guys know
> any context or if there is a standing issue on this, @Felix Cheung
>  or @Shivaram Venkataraman
> ?
>
> 2020년 12월 23일 (수) 오전 11:21, Mridul Muralidharan 님이
> 작성:
>
>>
>> I agree, is there something we can do to ensure CRAN publish goes
>> through consistently and predictably ?
>> If possible, it would be good to continue supporting it.
>>
>> Regards,
>> Mridul
>>
>> On Tue, Dec 22, 2020 at 7:48 PM Felix Cheung 
>> wrote:
>>
>>> Ok - it took many years to get it first published, so it was hard to
>>> get there.
>>>
>>>
>>> On Tue, Dec 22, 2020 at 5:45 PM Hyukjin Kwon 
>>> wrote:
>>>
 Adding @Shivaram Venkataraman  and @Felix
 Cheung  FYI

 2020년 12월 23일 (수) 오전 9:22, Michael Heuer 님이 작성:

> Anecdotally, as a project downstream of Spark, we've been
> prevented from pushing to CRAN because of this
>
> https://github.com/bigdatagenomics/adam/issues/1851
>
> We've given up and marked as WontFix.
>
>michael
>
>
> On Dec 22, 2020, at 5:14 PM, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
> Given the current circumstance, I'm thinking of dropping it
> officially from the community release scope.
>
> It's because
>
> - It turns out that our CRAN check is insufficient to
> guarantee the availability of SparkR on CRAN.
>   Apache Spark 3.1.0 may not not available on CRAN, too.
>
> - In daily CIs, CRAN check has been broken frequently due to both
> our side and CRAN side issues. Currently, branch-2.4 is broken.
>
> - It also has a side-effect to cause some delays on the official
> release announcement after RC passes because each release manager 
> takes a
> look at it if he/she can recover it at that release.
>
> If we are unable to support SparkR on CRAN in a sustainable way,
> what about dropping it official instead?
>
> Then, it will alleviate burdens on release managers and improves
> daily CIs' stability by removing the CRAN check.
>
> Bests,
> Dongjoon.
>
>
> On Mon, Dec 21, 2020 at 7:09 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> Hi, All.
>>
>> The last `SparkR` package of Apache Spark in CRAN is `2.4.6`.
>>
>>
>> https://cran-archive.r-project.org/web/checks/2020/2020-07-10_check_results_SparkR.html
>>
>> The latest three Apache Spark distributions (2.4.7/3.0.0/3.0.1)
>> are not published to CRAN and the lack of SparkR on CRAN has been
>> considered a non-release blocker.
>>
>> I'm wondering if we are aiming to recover it in Apache Spark
>> 3.1.0.
>>
>> Bests,
>> Dongjoon.
>>
>
>


Re: Recovering SparkR on CRAN?

2020-12-30 Thread Hyukjin Kwon
Nice, yeah, 3.0.1 should have all fixes needed.

2020년 12월 31일 (목) 오전 5:23, Felix Cheung 님이 작성:

> We could just submit the latest release with the fix again. I would not
> recommend waiting, often time there are some external changes that are not
> caught, and a fix will need to go through a release vote.
>
> What is the latest release with your fix? 3.0.1? I can put it in but will
> need to make sure we can get hold of Shivaram.
>
>
> On Tue, Dec 29, 2020 at 11:05 PM Hyukjin Kwon  wrote:
>
>> Let me try in this release - I will have to ask some questions to both of
>> you. I will email you guys offline or private mailing list.
>> If I happen to be stuck for a difficult reason, I think we can consider
>> dropping it as Dongjoon initially pointed out.
>>
>> 2020년 12월 30일 (수) 오후 1:59, Felix Cheung 님이 작성:
>>
>>> Ah, I don’t recall actually - maybe it was just missed?
>>>
>>> The last message I had, was in June when it was broken by R 4.0.1, which
>>> was fixed.
>>>
>>>
>>> On Tue, Dec 29, 2020 at 7:21 PM Hyukjin Kwon 
>>> wrote:
>>>
 BTW, I remember I fixed all standing issues at
 https://issues.apache.org/jira/browse/SPARK-31918 and
 https://issues.apache.org/jira/browse/SPARK-32073.
 I wonder why other releases were not uploaded yet. Do you guys know any
 context or if there is a standing issue on this, @Felix Cheung
  or @Shivaram Venkataraman
 ?

 2020년 12월 23일 (수) 오전 11:21, Mridul Muralidharan 님이
 작성:

>
> I agree, is there something we can do to ensure CRAN publish goes
> through consistently and predictably ?
> If possible, it would be good to continue supporting it.
>
> Regards,
> Mridul
>
> On Tue, Dec 22, 2020 at 7:48 PM Felix Cheung 
> wrote:
>
>> Ok - it took many years to get it first published, so it was hard to
>> get there.
>>
>>
>> On Tue, Dec 22, 2020 at 5:45 PM Hyukjin Kwon 
>> wrote:
>>
>>> Adding @Shivaram Venkataraman  and @Felix
>>> Cheung  FYI
>>>
>>> 2020년 12월 23일 (수) 오전 9:22, Michael Heuer 님이 작성:
>>>
 Anecdotally, as a project downstream of Spark, we've been prevented
 from pushing to CRAN because of this

 https://github.com/bigdatagenomics/adam/issues/1851

 We've given up and marked as WontFix.

michael


 On Dec 22, 2020, at 5:14 PM, Dongjoon Hyun 
 wrote:

 Given the current circumstance, I'm thinking of dropping it
 officially from the community release scope.

 It's because

 - It turns out that our CRAN check is insufficient to guarantee the
 availability of SparkR on CRAN.
   Apache Spark 3.1.0 may not not available on CRAN, too.

 - In daily CIs, CRAN check has been broken frequently due to both
 our side and CRAN side issues. Currently, branch-2.4 is broken.

 - It also has a side-effect to cause some delays on the official
 release announcement after RC passes because each release manager 
 takes a
 look at it if he/she can recover it at that release.

 If we are unable to support SparkR on CRAN in a sustainable way,
 what about dropping it official instead?

 Then, it will alleviate burdens on release managers and improves
 daily CIs' stability by removing the CRAN check.

 Bests,
 Dongjoon.


 On Mon, Dec 21, 2020 at 7:09 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:

> Hi, All.
>
> The last `SparkR` package of Apache Spark in CRAN is `2.4.6`.
>
>
> https://cran-archive.r-project.org/web/checks/2020/2020-07-10_check_results_SparkR.html
>
> The latest three Apache Spark distributions (2.4.7/3.0.0/3.0.1)
> are not published to CRAN and the lack of SparkR on CRAN has been
> considered a non-release blocker.
>
> I'm wondering if we are aiming to recover it in Apache Spark 3.1.0.
>
> Bests,
> Dongjoon.
>




Re: Recovering SparkR on CRAN?

2020-12-30 Thread Felix Cheung
We could just submit the latest release with the fix again. I would not
recommend waiting, often time there are some external changes that are not
caught, and a fix will need to go through a release vote.

What is the latest release with your fix? 3.0.1? I can put it in but will
need to make sure we can get hold of Shivaram.


On Tue, Dec 29, 2020 at 11:05 PM Hyukjin Kwon  wrote:

> Let me try in this release - I will have to ask some questions to both of
> you. I will email you guys offline or private mailing list.
> If I happen to be stuck for a difficult reason, I think we can consider
> dropping it as Dongjoon initially pointed out.
>
> 2020년 12월 30일 (수) 오후 1:59, Felix Cheung 님이 작성:
>
>> Ah, I don’t recall actually - maybe it was just missed?
>>
>> The last message I had, was in June when it was broken by R 4.0.1, which
>> was fixed.
>>
>>
>> On Tue, Dec 29, 2020 at 7:21 PM Hyukjin Kwon  wrote:
>>
>>> BTW, I remember I fixed all standing issues at
>>> https://issues.apache.org/jira/browse/SPARK-31918 and
>>> https://issues.apache.org/jira/browse/SPARK-32073.
>>> I wonder why other releases were not uploaded yet. Do you guys know any
>>> context or if there is a standing issue on this, @Felix Cheung
>>>  or @Shivaram Venkataraman
>>> ?
>>>
>>> 2020년 12월 23일 (수) 오전 11:21, Mridul Muralidharan 님이 작성:
>>>

 I agree, is there something we can do to ensure CRAN publish goes
 through consistently and predictably ?
 If possible, it would be good to continue supporting it.

 Regards,
 Mridul

 On Tue, Dec 22, 2020 at 7:48 PM Felix Cheung 
 wrote:

> Ok - it took many years to get it first published, so it was hard to
> get there.
>
>
> On Tue, Dec 22, 2020 at 5:45 PM Hyukjin Kwon 
> wrote:
>
>> Adding @Shivaram Venkataraman  and @Felix
>> Cheung  FYI
>>
>> 2020년 12월 23일 (수) 오전 9:22, Michael Heuer 님이 작성:
>>
>>> Anecdotally, as a project downstream of Spark, we've been prevented
>>> from pushing to CRAN because of this
>>>
>>> https://github.com/bigdatagenomics/adam/issues/1851
>>>
>>> We've given up and marked as WontFix.
>>>
>>>michael
>>>
>>>
>>> On Dec 22, 2020, at 5:14 PM, Dongjoon Hyun 
>>> wrote:
>>>
>>> Given the current circumstance, I'm thinking of dropping it
>>> officially from the community release scope.
>>>
>>> It's because
>>>
>>> - It turns out that our CRAN check is insufficient to guarantee the
>>> availability of SparkR on CRAN.
>>>   Apache Spark 3.1.0 may not not available on CRAN, too.
>>>
>>> - In daily CIs, CRAN check has been broken frequently due to both
>>> our side and CRAN side issues. Currently, branch-2.4 is broken.
>>>
>>> - It also has a side-effect to cause some delays on the official
>>> release announcement after RC passes because each release manager takes 
>>> a
>>> look at it if he/she can recover it at that release.
>>>
>>> If we are unable to support SparkR on CRAN in a sustainable way,
>>> what about dropping it official instead?
>>>
>>> Then, it will alleviate burdens on release managers and improves
>>> daily CIs' stability by removing the CRAN check.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Mon, Dec 21, 2020 at 7:09 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
 Hi, All.

 The last `SparkR` package of Apache Spark in CRAN is `2.4.6`.


 https://cran-archive.r-project.org/web/checks/2020/2020-07-10_check_results_SparkR.html

 The latest three Apache Spark distributions (2.4.7/3.0.0/3.0.1) are
 not published to CRAN and the lack of SparkR on CRAN has been 
 considered a
 non-release blocker.

 I'm wondering if we are aiming to recover it in Apache Spark 3.1.0.

 Bests,
 Dongjoon.

>>>
>>>


Re: [3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
Hi,

Sorry. A false alarm. Got mistaken with what IDEA calls "unused" may
not really be unused. It is (re)assigned in StageInfo.fromStage for
a ShuffleMapStage [1] and then caught in ExecutorMonitor [2] (since it's a
SparkListener).

[1]
https://github.com/apache/spark/blob/094563384478a402c36415edf04ee7b884a34fc9/core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala#L108
[2]
https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Wed, Dec 30, 2020 at 3:34 PM Jacek Laskowski  wrote:

> Hi,
>
> It's been a while. Glad to be back Sparkians!
>
> I've been exploring ExecutorMonitor.onJobStart in 3.0.1 and noticed that
> it uses StageInfo.shuffleDepId [1] that is None by default and moreover
> never "written to" according to IntelliJ IDEA.
>
> Is this the case and intentional?
>
> I'm wondering how much IDEA knows about codegen and that's where it's used
> (?)
>
> I've just stumbled upon it and before I spend more time on this I thought
> I'd ask (perhaps it's going to change in 3.1?). Help appreciated.
>
> [1]
> https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>


[3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
Hi,

It's been a while. Glad to be back Sparkians!

I've been exploring ExecutorMonitor.onJobStart in 3.0.1 and noticed that it
uses StageInfo.shuffleDepId [1] that is None by default and moreover never
"written to" according to IntelliJ IDEA.

Is this the case and intentional?

I'm wondering how much IDEA knows about codegen and that's where it's used
(?)

I've just stumbled upon it and before I spend more time on this I thought
I'd ask (perhaps it's going to change in 3.1?). Help appreciated.

[1]
https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski