Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Matei Zaharia
For Flintrock, have you considered using a Requester Pays bucket? That way 
you’d get the availability of S3 without having to foot the bill for bandwidth 
yourself (which was the bulk of the cost for the old bucket).

Matei

> On Feb 27, 2018, at 4:35 PM, Nicholas Chammas  
> wrote:
> 
> So is there no hope for this S3 bucket, or room to replace it with a bucket 
> owned by some organization other than AMPLab (which is technically now 
> defunct, I guess)? Sorry to persist, but I just have to ask.
> 
> On Tue, Feb 27, 2018 at 10:36 AM Michael Heuer  wrote:
> On Tue, Feb 27, 2018 at 8:17 AM, Sean Owen  wrote:
> See 
> http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html
>  -- it was 'retired', yes.
> 
> Agree with all that, though they're intended for occasional individual use 
> and not a case where performance and uptime matter. For that, I think you'd 
> want to just host your own copy of the bits you need. 
> 
> The notional problem was that the S3 bucket wasn't obviously 
> controlled/blessed by the ASF and yet was a source of official bits. It was 
> another set of third-party credentials to hand around to release managers, 
> which was IIRC a little problematic.
> 
> Homebrew does host distributions of ASF projects, like Spark, FWIW. 
> 
> To clarify, the apache-spark.rb formula in Homebrew uses the Apache mirror 
> closer.lua script
> 
> https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.rb#L4
> 
>michael
> 
>  
> On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas 
>  wrote:
> If you go to the Downloads page and download Spark 2.2.1, you’ll get a link 
> to an Apache mirror. It didn’t use to be this way. As recently as Spark 
> 2.2.0, downloads were served via CloudFront, which was backed by an S3 bucket 
> named spark-related-packages.
> 
> It seems that we’ve stopped using CloudFront, and the S3 bucket behind it has 
> stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m guessing this 
> is part of an effort to use the Apache mirror network, like other Apache 
> projects do.
> 
> From a user perspective, the Apache mirror network is several steps down from 
> using a modern CDN. Let me summarize why:
> 
>   • Apache mirrors are often slow. Apache does not impose any performance 
> requirements on its mirrors. The difference between getting a good mirror and 
> a bad one means downloading Spark in less than a minute vs. 20 minutes. The 
> problem is so bad that I’ve thought about adding an Apache mirror blacklist 
> to Flintrock to avoid getting one of these dud mirrors.
>   • Apache mirrors are inconvenient to use. When you download something 
> from an Apache mirror, you get a link like this one. Instead of automatically 
> redirecting you to your download, though, you need to process the results you 
> get back to find your download target. And you need to handle the high 
> download failure rate, since sometimes the mirror you get doesn’t have the 
> file it claims to have.
>   • Apache mirrors are incomplete. Apache mirrors only keep around the 
> latest releases, save for a few “archive” mirrors, which are often slow. So 
> if you want to download anything but the latest version of Spark, you are out 
> of luck.
> Some of these problems can be mitigated by picking a specific mirror that 
> works well and hardcoding it in your scripts, but that defeats the purpose of 
> dynamically selecting a mirror and makes you a “bad” user of the mirror 
> network.
> 
> I raised some of these issues over on INFRA-10999. The ticket sat for a year 
> before I heard anything back, and the bottom line was that none of the above 
> problems have a solution on the horizon. It’s fine. I understand that Apache 
> is a volunteer organization and that the infrastructure team has a lot to 
> manage as it is. I still find it disappointing that an organization of 
> Apache’s stature doesn’t have a better solution for this in collaboration 
> with a third party. Python serves PyPI downloads using Fastly and Homebrew 
> serves packages using Bintray. They both work really, really well. Why don’t 
> we have something as good for Apache projects? Anyway, that’s a separate 
> discussion.
> 
> What I want to say is this:
> 
> Dear whoever owns the spark-related-packages S3 bucket,
> 
> Please keep the bucket up-to-date with the latest Spark releases, alongside 
> the past releases that are already on there. It’s a huge help to the 
> Flintrock project, and it’s an equally big help to those of us writing 
> infrastructure automation scripts that deploy Spark in other contexts.
> 
> I understand that hosting this stuff is not free, and that I am not paying 
> anything for this service. If it needs to go, so be it. But I wanted to take 
> this opportunity to lay out the benefits I’ve enjoyed thanks to having this 
> bucket around, 

Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Nicholas Chammas
So is there no hope for this S3 bucket, or room to replace it with a bucket
owned by some organization other than AMPLab (which is technically now
defunct , I guess)? Sorry to
persist, but I just have to ask.

On Tue, Feb 27, 2018 at 10:36 AM Michael Heuer  wrote:

> On Tue, Feb 27, 2018 at 8:17 AM, Sean Owen  wrote:
>
>> See
>> http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html
>>  --
>> it was 'retired', yes.
>>
>> Agree with all that, though they're intended for occasional individual
>> use and not a case where performance and uptime matter. For that, I think
>> you'd want to just host your own copy of the bits you need.
>>
>> The notional problem was that the S3 bucket wasn't obviously
>> controlled/blessed by the ASF and yet was a source of official bits. It was
>> another set of third-party credentials to hand around to release managers,
>> which was IIRC a little problematic.
>>
>> Homebrew does host distributions of ASF projects, like Spark, FWIW.
>>
>
> To clarify, the apache-spark.rb formula in Homebrew uses the Apache
> mirror closer.lua script
>
>
> https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.rb#L4
>
>michael
>
>
>
>> On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> If you go to the Downloads 
>>> page and download Spark 2.2.1, you’ll get a link to an Apache mirror. It
>>> didn’t use to be this way. As recently as Spark 2.2.0, downloads were
>>> served via CloudFront , which was
>>> backed by an S3 bucket named spark-related-packages.
>>>
>>> It seems that we’ve stopped using CloudFront, and the S3 bucket behind
>>> it has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m
>>> guessing this is part of an effort to use the Apache mirror network, like
>>> other Apache projects do.
>>>
>>> From a user perspective, the Apache mirror network is several steps down
>>> from using a modern CDN. Let me summarize why:
>>>
>>>1. *Apache mirrors are often slow.* Apache does not impose any
>>>performance requirements on its mirrors
>>>
>>> .
>>>The difference between getting a good mirror and a bad one means
>>>downloading Spark in less than a minute vs. 20 minutes. The problem is so
>>>bad that I’ve thought about adding an Apache mirror blacklist
>>>
>>>to Flintrock to avoid getting one of these dud mirrors.
>>>2. *Apache mirrors are inconvenient to use.* When you download
>>>something from an Apache mirror, you get a link like this one
>>>
>>> .
>>>Instead of automatically redirecting you to your download, though, you 
>>> need
>>>to process the results you get back
>>>
>>> 
>>>to find your download target. And you need to handle the high download
>>>failure rate, since sometimes the mirror you get doesn’t have the file it
>>>claims to have.
>>>3. *Apache mirrors are incomplete.* Apache mirrors only keep around
>>>the latest releases, save for a few “archive” mirrors, which are often
>>>slow. So if you want to download anything but the latest version of 
>>> Spark,
>>>you are out of luck.
>>>
>>> Some of these problems can be mitigated by picking a specific mirror
>>> that works well and hardcoding it in your scripts, but that defeats the
>>> purpose of dynamically selecting a mirror and makes you a “bad” user of the
>>> mirror network.
>>>
>>> I raised some of these issues over on INFRA-10999
>>> . The ticket sat for
>>> a year before I heard anything back, and the bottom line was that none of
>>> the above problems have a solution on the horizon. It’s fine. I understand
>>> that Apache is a volunteer organization and that the infrastructure team
>>> has a lot to manage as it is. I still find it disappointing that an
>>> organization of Apache’s stature doesn’t have a better solution for this in
>>> collaboration with a third party. Python serves PyPI downloads using
>>> Fastly  and Homebrew serves packages using
>>> Bintray . They both work really, really well. Why
>>> don’t we have something as good for Apache projects? Anyway, that’s a
>>> separate discussion.
>>>
>>> What I want to say is this:
>>>
>>> Dear whoever owns the spark-related-packages S3 bucket
>>> 

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
Ok, it sounds like this was the intended behavior of the doc
changes... I'm not an R developer, so maybe the new docs make enough
sense, but the previous ones did look nicer.

On Tue, Feb 27, 2018 at 11:09 AM, Felix Cheung
 wrote:
> I had agreed it was a compromise when it was proposed back in May 2017.
>
> I don’t think I can capture the long reviews and many discussed that went
> in, for further discussion please start from JIRA SPARK-20889.
>
>
>
> 
> From: Marcelo Vanzin 
> Sent: Tuesday, February 27, 2018 10:26:23 AM
> To: Felix Cheung
> Cc: Mihály Tóth; Mihály Tóth; dev@spark.apache.org
>
> Subject: Re: Help needed in R documentation generation
>
> I followed Misi's instructions:
> - click on
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
> - click on "s" at the top
> - find "sin" and click on it
>
> And that does not give me the documentation for the "sin" function.
> That leads to you to a really ugly list of functions that's basically
> unreadable. There's lots of things like this:
>
> ## S4 method for signature 'Column'
> abs(x)
>
> Which look to me like the docs weren't properly generated. So it
> doesn't look like it's a discoverability problem, it seems there's
> something odd going on with the new docs.
>
> On the previous version those same steps take me to a nicely formatted
> doc for the "sin" function.
>
>
>
> On Tue, Feb 27, 2018 at 10:14 AM, Felix Cheung
>  wrote:
>> I think what you are calling out is discoverability of names from index -
>> I
>> agree this should be improved.
>>
>> There are several reasons for this change, if I recall, some are:
>>
>> - we have too many doc pages and a very long index page because of the
>> atypical large number of functions - many R packages only have dozens (or
>> a
>> dozen) and we have hundreds; this also affects discoverability
>>
>> - a side effect of high number of functions is that we have hundreds of
>> pages of cross links between functions in the same and different
>> categories
>> that are very hard to read or find
>>
>> - many function examples are too simple or incomplete - it would be good
>> to
>> make them runnable, for instance
>>
>> There was a proposal for a search feature on the doc index at one point,
>> IMO
>> that would be very useful and would address the discoverability issue.
>>
>>
>> 
>> From: Mihály Tóth 
>> Sent: Tuesday, February 27, 2018 9:13:18 AM
>> To: Felix Cheung
>> Cc: Mihály Tóth; dev@spark.apache.org
>>
>> Subject: Re: Help needed in R documentation generation
>>
>> Hi,
>>
>> Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see
>>
>> sin as a title
>> description describes what sin does
>> usage, arguments, note, see also are specific to sin function
>>
>> When opening sin from
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html:
>>
>> Title is 'Math functions for Column operations', not very specific to sin
>> Description is 'Math functions defined for Column.'
>> Usage contains a list of functions, scrolling down you can see sin as well
>> though ...
>>
>> To me that sounds like a problem. Do I overlook something here?
>>
>> Best Regards,
>>   Misi
>>
>>
>> 2018-02-27 16:15 GMT+00:00 Felix Cheung :
>>>
>>> The help content on sin is in
>>>
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/column_math_functions.html
>>>
>>> It’s a fairly long list but sin is in there. Is that not what you are
>>> seeing?
>>>
>>>
>>> 
>>> From: Mihály Tóth 
>>> Sent: Tuesday, February 27, 2018 8:03:34 AM
>>> To: dev@spark.apache.org
>>> Subject: Fwd: Help needed in R documentation generation
>>>
>>> Hi,
>>>
>>> Actually, when I open the link you provided and click on - for example -
>>> 'sin' the page does not seem to describe that function at all. Actually I
>>> get same effect that I get locally. I have attached a screenshot about
>>> that:
>>>
>>>
>>>
>>>
>>>
>>> I tried with Chrome and then with Safari too and got the same result.
>>>
>>> When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark
>>> 2.2.1) and select 'sin' I get a proper Description, Usage, Arguments,
>>> etc.
>>> sections.
>>>
>>> This sounds like a bug in the documentation of Spark R, does'nt it? Shall
>>> I file a Jira about it?
>>>
>>> Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully.
>>> Unfortunately with the result mentioned above.
>>>
>>> Best Regards,
>>>
>>>   Misi
>>>
>>>

 

 From: Felix Cheung 
 Date: 2018-02-26 20:42 GMT+00:00
 Subject: Re: Help needed in R documentation generation
 To: Mihály Tóth 
 Cc: "dev@spark.apache.org" 

Re: Help needed in R documentation generation

2018-02-27 Thread Felix Cheung
I had agreed it was a compromise when it was proposed back in May 2017.

I don’t think I can capture the long reviews and many discussed that went in, 
for further discussion please start from JIRA SPARK-20889.




From: Marcelo Vanzin 
Sent: Tuesday, February 27, 2018 10:26:23 AM
To: Felix Cheung
Cc: Mihály Tóth; Mihály Tóth; dev@spark.apache.org
Subject: Re: Help needed in R documentation generation

I followed Misi's instructions:
- click on 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
- click on "s" at the top
- find "sin" and click on it

And that does not give me the documentation for the "sin" function.
That leads to you to a really ugly list of functions that's basically
unreadable. There's lots of things like this:

## S4 method for signature 'Column'
abs(x)

Which look to me like the docs weren't properly generated. So it
doesn't look like it's a discoverability problem, it seems there's
something odd going on with the new docs.

On the previous version those same steps take me to a nicely formatted
doc for the "sin" function.



On Tue, Feb 27, 2018 at 10:14 AM, Felix Cheung
 wrote:
> I think what you are calling out is discoverability of names from index - I
> agree this should be improved.
>
> There are several reasons for this change, if I recall, some are:
>
> - we have too many doc pages and a very long index page because of the
> atypical large number of functions - many R packages only have dozens (or a
> dozen) and we have hundreds; this also affects discoverability
>
> - a side effect of high number of functions is that we have hundreds of
> pages of cross links between functions in the same and different categories
> that are very hard to read or find
>
> - many function examples are too simple or incomplete - it would be good to
> make them runnable, for instance
>
> There was a proposal for a search feature on the doc index at one point, IMO
> that would be very useful and would address the discoverability issue.
>
>
> 
> From: Mihály Tóth 
> Sent: Tuesday, February 27, 2018 9:13:18 AM
> To: Felix Cheung
> Cc: Mihály Tóth; dev@spark.apache.org
>
> Subject: Re: Help needed in R documentation generation
>
> Hi,
>
> Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see
>
> sin as a title
> description describes what sin does
> usage, arguments, note, see also are specific to sin function
>
> When opening sin from
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html:
>
> Title is 'Math functions for Column operations', not very specific to sin
> Description is 'Math functions defined for Column.'
> Usage contains a list of functions, scrolling down you can see sin as well
> though ...
>
> To me that sounds like a problem. Do I overlook something here?
>
> Best Regards,
>   Misi
>
>
> 2018-02-27 16:15 GMT+00:00 Felix Cheung :
>>
>> The help content on sin is in
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/column_math_functions.html
>>
>> It’s a fairly long list but sin is in there. Is that not what you are
>> seeing?
>>
>>
>> 
>> From: Mihály Tóth 
>> Sent: Tuesday, February 27, 2018 8:03:34 AM
>> To: dev@spark.apache.org
>> Subject: Fwd: Help needed in R documentation generation
>>
>> Hi,
>>
>> Actually, when I open the link you provided and click on - for example -
>> 'sin' the page does not seem to describe that function at all. Actually I
>> get same effect that I get locally. I have attached a screenshot about that:
>>
>>
>>
>>
>>
>> I tried with Chrome and then with Safari too and got the same result.
>>
>> When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark
>> 2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc.
>> sections.
>>
>> This sounds like a bug in the documentation of Spark R, does'nt it? Shall
>> I file a Jira about it?
>>
>> Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully.
>> Unfortunately with the result mentioned above.
>>
>> Best Regards,
>>
>>   Misi
>>
>>
>>>
>>> 
>>>
>>> From: Felix Cheung 
>>> Date: 2018-02-26 20:42 GMT+00:00
>>> Subject: Re: Help needed in R documentation generation
>>> To: Mihály Tóth 
>>> Cc: "dev@spark.apache.org" 
>>>
>>>
>>> Could you tell me more about the steps you are taking? Which page you are
>>> clicking on?
>>>
>>> Could you try
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
>>>
>>> 
>>> From: Mihály Tóth 
>>> Sent: Monday, February 26, 2018 8:06:59 AM
>>> To: Felix Cheung
>>> Cc: dev@spark.apache.org
>>> Subject: Re: Help needed in R documentation generation

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
I followed Misi's instructions:
- click on 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
- click on "s" at the top
- find "sin" and click on it

And that does not give me the documentation for the "sin" function.
That leads to you to a really ugly list of functions that's basically
unreadable. There's lots of things like this:

## S4 method for signature 'Column'
abs(x)

Which look to me like the docs weren't properly generated. So it
doesn't look like it's a discoverability problem, it seems there's
something odd going on with the new docs.

On the previous version those same steps take me to a nicely formatted
doc for the "sin" function.



On Tue, Feb 27, 2018 at 10:14 AM, Felix Cheung
 wrote:
> I think what you are calling out is discoverability of names from index - I
> agree this should be improved.
>
> There are several reasons for this change, if I recall, some are:
>
> - we have too many doc pages and a very long index page because of the
> atypical large number of functions - many R packages only have dozens (or a
> dozen) and we have hundreds; this also affects discoverability
>
> - a side effect of high number of functions is that we have hundreds of
> pages of cross links between functions in the same and different categories
> that are very hard to read or find
>
> - many function examples are too simple or incomplete - it would be good to
> make them runnable, for instance
>
> There was a proposal for a search feature on the doc index at one point, IMO
> that would be very useful and would address the discoverability issue.
>
>
> 
> From: Mihály Tóth 
> Sent: Tuesday, February 27, 2018 9:13:18 AM
> To: Felix Cheung
> Cc: Mihály Tóth; dev@spark.apache.org
>
> Subject: Re: Help needed in R documentation generation
>
> Hi,
>
> Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see
>
> sin as a title
> description describes what sin does
> usage, arguments, note, see also are specific to sin function
>
> When opening sin from
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html:
>
> Title is 'Math functions for Column operations', not very specific to sin
> Description is 'Math functions defined for Column.'
> Usage contains a list of functions, scrolling down you can see sin as well
> though ...
>
> To me that sounds like a problem. Do I overlook something here?
>
> Best Regards,
>   Misi
>
>
> 2018-02-27 16:15 GMT+00:00 Felix Cheung :
>>
>> The help content on sin is in
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/column_math_functions.html
>>
>> It’s a fairly long list but sin is in there. Is that not what you are
>> seeing?
>>
>>
>> 
>> From: Mihály Tóth 
>> Sent: Tuesday, February 27, 2018 8:03:34 AM
>> To: dev@spark.apache.org
>> Subject: Fwd: Help needed in R documentation generation
>>
>> Hi,
>>
>> Actually, when I open the link you provided and click on - for example -
>> 'sin' the page does not seem to describe that function at all. Actually I
>> get same effect that I get locally. I have attached a screenshot about that:
>>
>>
>>
>>
>>
>> I tried with Chrome and then with Safari too and got the same result.
>>
>> When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark
>> 2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc.
>> sections.
>>
>> This sounds like a bug in the documentation of Spark R, does'nt it? Shall
>> I file a Jira about it?
>>
>> Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully.
>> Unfortunately with the result mentioned above.
>>
>> Best Regards,
>>
>>   Misi
>>
>>
>>>
>>> 
>>>
>>> From: Felix Cheung 
>>> Date: 2018-02-26 20:42 GMT+00:00
>>> Subject: Re: Help needed in R documentation generation
>>> To: Mihály Tóth 
>>> Cc: "dev@spark.apache.org" 
>>>
>>>
>>> Could you tell me more about the steps you are taking? Which page you are
>>> clicking on?
>>>
>>> Could you try
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
>>>
>>> 
>>> From: Mihály Tóth 
>>> Sent: Monday, February 26, 2018 8:06:59 AM
>>> To: Felix Cheung
>>> Cc: dev@spark.apache.org
>>> Subject: Re: Help needed in R documentation generation
>>>
>>> I see.
>>>
>>> When I click on such a selected function, like 'sin' the page falls apart
>>> and does not tell anything about sin function. How is it supposed to work
>>> when all functions link to the same column_math_functions.html ?
>>>
>>> Thanks,
>>>
>>>   Misi
>>>
>>>
>>> On Sun, Feb 25, 2018, 22:53 Felix Cheung 
>>> wrote:

 This is recent change. The html file column_math_functions.html should
 have 

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Sameer Agarwal
This vote passes! I'll follow up with a formal release announcement soon.

+1:
Wenchen Fan (binding)
Takuya Ueshin
Xingbo Jiang
Gengliang Wang
Weichen Xu
Sean Owen (binding)
Josh Goldsborough
Denny Lee
Nicholas Chammas
Marcelo Vanzin (binding)
Holden Karau (binding)
Cheng Lian (binding)
Bryan Cutler
Hyukjin Kwon
Ricardo Almeida
Xiao Li (binding)
Ryan Blue
Dongjoon Hyun
Michael Armbrust (binding)
Nan Zhu
Felix Cheung (binding)
Nick Pentreath (binding)

+0: None

-1: None

On 27 February 2018 at 00:21, Nick Pentreath 
wrote:

> +1 (binding)
>
> Built and ran Scala tests with "-Phadoop-2.6 -Pyarn -Phive", all passed.
>
> Python tests passed (also including pyspark-streaming w/kafka-0.8 and
> flume packages built)
>
>
> On Tue, 27 Feb 2018 at 10:09 Felix Cheung 
> wrote:
>
>> +1
>>
>> Tested R:
>>
>> install from package, CRAN tests, manual tests, help check, vignettes
>> check
>>
>> Filed this https://issues.apache.org/jira/browse/SPARK-23461
>> This is not a regression so not a blocker of the release.
>>
>> Tested this on win-builder and r-hub. On r-hub on multiple platforms
>> everything passed. For win-builder tests failed on x86 but passed x64 -
>> perhaps due to an intermittent download issue causing a gzip error,
>> re-testing now but won’t hold the release on this.
>>
>> --
>> *From:* Nan Zhu 
>> *Sent:* Monday, February 26, 2018 4:03:22 PM
>> *To:* Michael Armbrust
>> *Cc:* dev
>> *Subject:* Re: [VOTE] Spark 2.3.0 (RC5)
>>
>> +1  (non-binding), tested with internal workloads and benchmarks
>>
>> On Mon, Feb 26, 2018 at 12:09 PM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> +1 all our pipelines have been running the RC for several days now.
>>>
>>> On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun >> > wrote:
>>>
 +1 (non-binding).

 Bests,
 Dongjoon.



 On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue 
 wrote:

> +1 (non-binding)
>
> On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li  wrote:
>
>> +1 (binding) in Spark SQL, Core and PySpark.
>>
>> Xiao
>>
>> 2018-02-24 14:49 GMT-08:00 Ricardo Almeida <
>> ricardo.alme...@actnowib.com>:
>>
>>> +1 (non-binding)
>>>
>>> same as previous RC
>>>
>>> On 24 February 2018 at 11:10, Hyukjin Kwon 
>>> wrote:
>>>
 +1

 2018-02-24 16:57 GMT+09:00 Bryan Cutler :

> +1
> Tests passed and additionally ran Arrow related tests and did some
> perf checks with python 2.7.14
>
> On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>> Note: given the state of Jenkins I'd love to see Bryan Cutler or
>> someone with Arrow experience sign off on this release.
>>
>> On Fri, Feb 23, 2018 at 6:13 PM, Cheng Lian <
>> lian.cs@gmail.com> wrote:
>>
>>> +1 (binding)
>>>
>>> Passed all the tests, looks good.
>>>
>>> Cheng
>>>
>>> On 2/23/18 15:00, Holden Karau wrote:
>>>
>>> +1 (binding)
>>> PySpark artifacts install in a fresh Py3 virtual env
>>>
>>> On Feb 23, 2018 7:55 AM, "Denny Lee" 
>>> wrote:
>>>
 +1 (non-binding)

 On Fri, Feb 23, 2018 at 07:08 Josh Goldsborough <
 joshgoldsboroughs...@gmail.com> wrote:

> New to testing out Spark RCs for the community but I was able
> to run some of the basic unit tests without error so for what 
> it's worth,
> I'm a +1.
>
> On Thu, Feb 22, 2018 at 4:23 PM, Sameer Agarwal <
> samee...@apache.org> wrote:
>
>> Please vote on releasing the following candidate as Apache
>> Spark version 2.3.0. The vote is open until Tuesday February 27, 
>> 2018 at
>> 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 
>> votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc5:
>> https://github.com/apache/spark/tree/v2.3.0-rc5 (
>> 992447fb30ee9ebb3cf794f2d06f4d63a2d792db)
>>
>> List of JIRA tickets resolved in this release can be found
>> here: https://issues.apache.org/jira/projects/SPARK/versions/
>> 12339551

Re: Help needed in R documentation generation

2018-02-27 Thread Felix Cheung
I think what you are calling out is discoverability of names from index - I 
agree this should be improved.

There are several reasons for this change, if I recall, some are:

- we have too many doc pages and a very long index page because of the atypical 
large number of functions - many R packages only have dozens (or a dozen) and 
we have hundreds; this also affects discoverability

- a side effect of high number of functions is that we have hundreds of pages 
of cross links between functions in the same and different categories that are 
very hard to read or find

- many function examples are too simple or incomplete - it would be good to 
make them runnable, for instance

There was a proposal for a search feature on the doc index at one point, IMO 
that would be very useful and would address the discoverability issue.



From: Mihály Tóth 
Sent: Tuesday, February 27, 2018 9:13:18 AM
To: Felix Cheung
Cc: Mihály Tóth; dev@spark.apache.org
Subject: Re: Help needed in R documentation generation

Hi,

Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see

  1.  sin as a title
  2.  description describes what sin does
  3.  usage, arguments, note, see also are specific to sin function

When opening sin from 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html:

  1.  Title is 'Math functions for Column operations', not very specific to sin
  2.  Description is 'Math functions defined for Column.'
  3.  Usage contains a list of functions, scrolling down you can see sin as 
well though ...

To me that sounds like a problem. Do I overlook something here?

Best Regards,
  Misi


2018-02-27 16:15 GMT+00:00 Felix Cheung 
>:
The help content on sin is in
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/column_math_functions.html

It’s a fairly long list but sin is in there. Is that not what you are seeing?



From: Mihály Tóth >
Sent: Tuesday, February 27, 2018 8:03:34 AM
To: dev@spark.apache.org
Subject: Fwd: Help needed in R documentation generation

Hi,

Actually, when I open the link you provided and click on - for example - 'sin' 
the page does not seem to describe that function at all. Actually I get same 
effect that I get locally. I have attached a screenshot about that:


[Szövegközi kép 1]


I tried with Chrome and then with Safari too and got the same result.

When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark 
2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc. 
sections.

This sounds like a bug in the documentation of Spark R, does'nt it? Shall I 
file a Jira about it?

Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully. 
Unfortunately with the result mentioned above.

Best Regards,

  Misi





From: Felix Cheung >
Date: 2018-02-26 20:42 GMT+00:00
Subject: Re: Help needed in R documentation generation
To: Mihály Tóth >
Cc: "dev@spark.apache.org" 
>


Could you tell me more about the steps you are taking? Which page you are 
clicking on?

Could you try 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html


From: Mihály Tóth >
Sent: Monday, February 26, 2018 8:06:59 AM
To: Felix Cheung
Cc: dev@spark.apache.org
Subject: Re: Help needed in R documentation generation

I see.

When I click on such a selected function, like 'sin' the page falls apart and 
does not tell anything about sin function. How is it supposed to work when all 
functions link to the same column_math_functions.html ?

Thanks,

  Misi


On Sun, Feb 25, 2018, 22:53 Felix Cheung 
> wrote:
This is recent change. The html file column_math_functions.html should have the 
right help content.

What is the problem you are experiencing?


From: Mihály Tóth >
Sent: Sunday, February 25, 2018 10:42:50 PM
To: dev@spark.apache.org
Subject: Help needed in R documentation generation

Hi,

I am having difficulties generating R documentation.

In R/pkg/html/index.html file at the individual function entries it reference
column_math_functions.html instead of the function page itself. Like

http://column_math_functions.ht>ml">asin

Have you met with such a problem?

Thanks,

  Misi








Re: Help needed in R documentation generation

2018-02-27 Thread Mihály Tóth
Hi,

Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see

   1. sin as a title
   2. description describes what sin does
   3. usage, arguments, note, see also are specific to sin function

When opening sin from
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
:

   1. Title is 'Math functions for Column operations', not very specific to
   sin
   2. Description is 'Math functions defined for Column.'
   3. Usage contains a list of functions, scrolling down you can see sin as
   well though ...

To me that sounds like a problem. Do I overlook something here?

Best Regards,
  Misi


2018-02-27 16:15 GMT+00:00 Felix Cheung :

> The help content on sin is in
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-
> docs/_site/api/R/column_math_functions.html
>
> It’s a fairly long list but sin is in there. Is that not what you are
> seeing?
>
>
> --
> *From:* Mihály Tóth 
> *Sent:* Tuesday, February 27, 2018 8:03:34 AM
> *To:* dev@spark.apache.org
> *Subject:* Fwd: Help needed in R documentation generation
>
> Hi,
>
> Actually, when I open the link you provided and click on - for example -
> 'sin' the page does not seem to describe that function at all. Actually I
> get same effect that I get locally. I have attached a screenshot about that:
>
>
> [image: Szövegközi kép 1]
>
>
> I tried with Chrome and then with Safari too and got the same result.
>
> When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark
> 2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc.
> sections.
>
> This sounds like a bug in the documentation of Spark R, does'nt it? Shall
> I file a Jira about it?
>
> Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully.
> Unfortunately with the result mentioned above.
>
> Best Regards,
>
>   Misi
>
>
>
>> 
>>
>> From: Felix Cheung 
>> Date: 2018-02-26 20:42 GMT+00:00
>> Subject: Re: Help needed in R documentation generation
>> To: Mihály Tóth 
>> Cc: "dev@spark.apache.org" 
>>
>>
>> Could you tell me more about the steps you are taking? Which page you are
>> clicking on?
>>
>> Could you try https://dist.apache.org/repos/
>> dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html
>>
>> --
>> *From:* Mihály Tóth 
>> *Sent:* Monday, February 26, 2018 8:06:59 AM
>> *To:* Felix Cheung
>> *Cc:* dev@spark.apache.org
>> *Subject:* Re: Help needed in R documentation generation
>>
>> I see.
>>
>> When I click on such a selected function, like 'sin' the page falls apart
>> and does not tell anything about sin function. How is it supposed to work
>> when all functions link to the same column_math_functions.html ?
>>
>> Thanks,
>>
>>   Misi
>>
>>
>> On Sun, Feb 25, 2018, 22:53 Felix Cheung 
>> wrote:
>>
>>> This is recent change. The html file column_math_functions.html should
>>> have the right help content.
>>>
>>> What is the problem you are experiencing?
>>>
>>> --
>>> *From:* Mihály Tóth 
>>> *Sent:* Sunday, February 25, 2018 10:42:50 PM
>>> *To:* dev@spark.apache.org
>>> *Subject:* Help needed in R documentation generation
>>>
>>> Hi,
>>>
>>> I am having difficulties generating R documentation.
>>>
>>> In R/pkg/html/index.html file at the individual function entries it
>>> reference
>>> column_math_functions.html instead of the function page itself. Like
>>>
>>> asin
>>>
>>> Have you met with such a problem?
>>>
>>> Thanks,
>>>
>>>   Misi
>>>
>>>
>>>
>>
>
>


Re: Help needed in R documentation generation

2018-02-27 Thread Felix Cheung
The help content on sin is in
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/column_math_functions.html

It’s a fairly long list but sin is in there. Is that not what you are seeing?



From: Mihály Tóth 
Sent: Tuesday, February 27, 2018 8:03:34 AM
To: dev@spark.apache.org
Subject: Fwd: Help needed in R documentation generation

Hi,

Actually, when I open the link you provided and click on - for example - 'sin' 
the page does not seem to describe that function at all. Actually I get same 
effect that I get locally. I have attached a screenshot about that:


[Szövegközi kép 1]


I tried with Chrome and then with Safari too and got the same result.

When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark 
2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc. 
sections.

This sounds like a bug in the documentation of Spark R, does'nt it? Shall I 
file a Jira about it?

Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully. 
Unfortunately with the result mentioned above.

Best Regards,

  Misi





From: Felix Cheung >
Date: 2018-02-26 20:42 GMT+00:00
Subject: Re: Help needed in R documentation generation
To: Mihály Tóth >
Cc: "dev@spark.apache.org" 
>


Could you tell me more about the steps you are taking? Which page you are 
clicking on?

Could you try 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html


From: Mihály Tóth >
Sent: Monday, February 26, 2018 8:06:59 AM
To: Felix Cheung
Cc: dev@spark.apache.org
Subject: Re: Help needed in R documentation generation

I see.

When I click on such a selected function, like 'sin' the page falls apart and 
does not tell anything about sin function. How is it supposed to work when all 
functions link to the same column_math_functions.html ?

Thanks,

  Misi


On Sun, Feb 25, 2018, 22:53 Felix Cheung 
> wrote:
This is recent change. The html file column_math_functions.html should have the 
right help content.

What is the problem you are experiencing?


From: Mihály Tóth >
Sent: Sunday, February 25, 2018 10:42:50 PM
To: dev@spark.apache.org
Subject: Help needed in R documentation generation

Hi,

I am having difficulties generating R documentation.

In R/pkg/html/index.html file at the individual function entries it reference
column_math_functions.html instead of the function page itself. Like

http://column_math_functions.ht>ml">asin

Have you met with such a problem?

Thanks,

  Misi







Fwd: Help needed in R documentation generation

2018-02-27 Thread Mihály Tóth
Hi,

Actually, when I open the link you provided and click on - for example -
'sin' the page does not seem to describe that function at all. Actually I
get same effect that I get locally. I have attached a screenshot about that:


[image: Szövegközi kép 1]


I tried with Chrome and then with Safari too and got the same result.

When I go to https://spark.apache.org/docs/latest/api/R/index.html (Spark
2.2.1) and select 'sin' I get a proper Description, Usage, Arguments, etc.
sections.

This sounds like a bug in the documentation of Spark R, does'nt it? Shall I
file a Jira about it?

Locally I ran SPARK_HOME/R/create-docs.sh and it returned successfully.
Unfortunately with the result mentioned above.

Best Regards,

  Misi



> 
>
> From: Felix Cheung 
> Date: 2018-02-26 20:42 GMT+00:00
> Subject: Re: Help needed in R documentation generation
> To: Mihály Tóth 
> Cc: "dev@spark.apache.org" 
>
>
> Could you tell me more about the steps you are taking? Which page you are
> clicking on?
>
> Could you try https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs
> /_site/api/R/index.html
>
> --
> *From:* Mihály Tóth 
> *Sent:* Monday, February 26, 2018 8:06:59 AM
> *To:* Felix Cheung
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Help needed in R documentation generation
>
> I see.
>
> When I click on such a selected function, like 'sin' the page falls apart
> and does not tell anything about sin function. How is it supposed to work
> when all functions link to the same column_math_functions.html ?
>
> Thanks,
>
>   Misi
>
>
> On Sun, Feb 25, 2018, 22:53 Felix Cheung 
> wrote:
>
>> This is recent change. The html file column_math_functions.html should
>> have the right help content.
>>
>> What is the problem you are experiencing?
>>
>> --
>> *From:* Mihály Tóth 
>> *Sent:* Sunday, February 25, 2018 10:42:50 PM
>> *To:* dev@spark.apache.org
>> *Subject:* Help needed in R documentation generation
>>
>> Hi,
>>
>> I am having difficulties generating R documentation.
>>
>> In R/pkg/html/index.html file at the individual function entries it
>> reference
>> column_math_functions.html instead of the function page itself. Like
>>
>> asin
>>
>> Have you met with such a problem?
>>
>> Thanks,
>>
>>   Misi
>>
>>
>>
>


Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Michael Heuer
On Tue, Feb 27, 2018 at 8:17 AM, Sean Owen  wrote:

> See http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-
> d3kbcqa49mib13-cloudfront-net-td22427.html -- it was 'retired', yes.
>
> Agree with all that, though they're intended for occasional individual use
> and not a case where performance and uptime matter. For that, I think you'd
> want to just host your own copy of the bits you need.
>
> The notional problem was that the S3 bucket wasn't obviously
> controlled/blessed by the ASF and yet was a source of official bits. It was
> another set of third-party credentials to hand around to release managers,
> which was IIRC a little problematic.
>
> Homebrew does host distributions of ASF projects, like Spark, FWIW.
>

To clarify, the apache-spark.rb formula in Homebrew uses the Apache mirror
closer.lua script

https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.rb#L4

   michael



> On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> If you go to the Downloads  page
>> and download Spark 2.2.1, you’ll get a link to an Apache mirror. It didn’t
>> use to be this way. As recently as Spark 2.2.0, downloads were served via
>> CloudFront , which was backed by an
>> S3 bucket named spark-related-packages.
>>
>> It seems that we’ve stopped using CloudFront, and the S3 bucket behind it
>> has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m guessing
>> this is part of an effort to use the Apache mirror network, like other
>> Apache projects do.
>>
>> From a user perspective, the Apache mirror network is several steps down
>> from using a modern CDN. Let me summarize why:
>>
>>1. *Apache mirrors are often slow.* Apache does not impose any
>>performance requirements on its mirrors
>>
>> .
>>The difference between getting a good mirror and a bad one means
>>downloading Spark in less than a minute vs. 20 minutes. The problem is so
>>bad that I’ve thought about adding an Apache mirror blacklist
>>
>>to Flintrock to avoid getting one of these dud mirrors.
>>2. *Apache mirrors are inconvenient to use.* When you download
>>something from an Apache mirror, you get a link like this one
>>
>> .
>>Instead of automatically redirecting you to your download, though, you 
>> need
>>to process the results you get back
>>
>> 
>>to find your download target. And you need to handle the high download
>>failure rate, since sometimes the mirror you get doesn’t have the file it
>>claims to have.
>>3. *Apache mirrors are incomplete.* Apache mirrors only keep around
>>the latest releases, save for a few “archive” mirrors, which are often
>>slow. So if you want to download anything but the latest version of Spark,
>>you are out of luck.
>>
>> Some of these problems can be mitigated by picking a specific mirror that
>> works well and hardcoding it in your scripts, but that defeats the purpose
>> of dynamically selecting a mirror and makes you a “bad” user of the mirror
>> network.
>>
>> I raised some of these issues over on INFRA-10999
>> . The ticket sat for
>> a year before I heard anything back, and the bottom line was that none of
>> the above problems have a solution on the horizon. It’s fine. I understand
>> that Apache is a volunteer organization and that the infrastructure team
>> has a lot to manage as it is. I still find it disappointing that an
>> organization of Apache’s stature doesn’t have a better solution for this in
>> collaboration with a third party. Python serves PyPI downloads using
>> Fastly  and Homebrew serves packages using
>> Bintray . They both work really, really well. Why
>> don’t we have something as good for Apache projects? Anyway, that’s a
>> separate discussion.
>>
>> What I want to say is this:
>>
>> Dear whoever owns the spark-related-packages S3 bucket
>> ,
>>
>> Please keep the bucket up-to-date with the latest Spark releases,
>> alongside the past releases that are already on there. It’s a huge help to
>> the Flintrock  project, and it’s
>> an equally big help to those of us writing infrastructure automation
>> scripts that deploy Spark in other contexts.
>>
>> I understand that hosting this stuff is not free, and 

Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Sean Owen
See
http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-d3kbcqa49mib13-cloudfront-net-td22427.html
--
it was 'retired', yes.

Agree with all that, though they're intended for occasional individual use
and not a case where performance and uptime matter. For that, I think you'd
want to just host your own copy of the bits you need.

The notional problem was that the S3 bucket wasn't obviously
controlled/blessed by the ASF and yet was a source of official bits. It was
another set of third-party credentials to hand around to release managers,
which was IIRC a little problematic.

Homebrew does host distributions of ASF projects, like Spark, FWIW.

On Mon, Feb 26, 2018 at 10:57 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> If you go to the Downloads  page
> and download Spark 2.2.1, you’ll get a link to an Apache mirror. It didn’t
> use to be this way. As recently as Spark 2.2.0, downloads were served via
> CloudFront , which was backed by an
> S3 bucket named spark-related-packages.
>
> It seems that we’ve stopped using CloudFront, and the S3 bucket behind it
> has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m guessing
> this is part of an effort to use the Apache mirror network, like other
> Apache projects do.
>
> From a user perspective, the Apache mirror network is several steps down
> from using a modern CDN. Let me summarize why:
>
>1. *Apache mirrors are often slow.* Apache does not impose any
>performance requirements on its mirrors
>
> .
>The difference between getting a good mirror and a bad one means
>downloading Spark in less than a minute vs. 20 minutes. The problem is so
>bad that I’ve thought about adding an Apache mirror blacklist
>
>to Flintrock to avoid getting one of these dud mirrors.
>2. *Apache mirrors are inconvenient to use.* When you download
>something from an Apache mirror, you get a link like this one
>
> .
>Instead of automatically redirecting you to your download, though, you need
>to process the results you get back
>
> 
>to find your download target. And you need to handle the high download
>failure rate, since sometimes the mirror you get doesn’t have the file it
>claims to have.
>3. *Apache mirrors are incomplete.* Apache mirrors only keep around
>the latest releases, save for a few “archive” mirrors, which are often
>slow. So if you want to download anything but the latest version of Spark,
>you are out of luck.
>
> Some of these problems can be mitigated by picking a specific mirror that
> works well and hardcoding it in your scripts, but that defeats the purpose
> of dynamically selecting a mirror and makes you a “bad” user of the mirror
> network.
>
> I raised some of these issues over on INFRA-10999
> . The ticket sat for a
> year before I heard anything back, and the bottom line was that none of the
> above problems have a solution on the horizon. It’s fine. I understand that
> Apache is a volunteer organization and that the infrastructure team has a
> lot to manage as it is. I still find it disappointing that an organization
> of Apache’s stature doesn’t have a better solution for this in
> collaboration with a third party. Python serves PyPI downloads using
> Fastly  and Homebrew serves packages using
> Bintray . They both work really, really well. Why
> don’t we have something as good for Apache projects? Anyway, that’s a
> separate discussion.
>
> What I want to say is this:
>
> Dear whoever owns the spark-related-packages S3 bucket
> ,
>
> Please keep the bucket up-to-date with the latest Spark releases,
> alongside the past releases that are already on there. It’s a huge help to
> the Flintrock  project, and it’s
> an equally big help to those of us writing infrastructure automation
> scripts that deploy Spark in other contexts.
>
> I understand that hosting this stuff is not free, and that I am not paying
> anything for this service. If it needs to go, so be it. But I wanted to
> take this opportunity to lay out the benefits I’ve enjoyed thanks to having
> this bucket around, and to make sure that if it did die, it didn’t die a
> quiet death.
>
> Sincerely,
> Nick
> ​
>


Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Reynold Xin
This was actually an AMPLab bucket.

On Feb 27, 2018, 6:04 PM +1300, Holden Karau , wrote:
> Thanks Nick, we deprecated this during the roll over to the new release 
> managers. I assume this bucket was maintained by someone at databricks so 
> maybe they can chime in.
>
> > On Feb 26, 2018 8:57 PM, "Nicholas Chammas"  
> > wrote:
> > > If you go to the Downloads page and download Spark 2.2.1, you’ll get a 
> > > link to an Apache mirror. It didn’t use to be this way. As recently as 
> > > Spark 2.2.0, downloads were served via CloudFront, which was backed by an 
> > > S3 bucket named spark-related-packages.
> > > It seems that we’ve stopped using CloudFront, and the S3 bucket behind it 
> > > has stopped receiving updates (e.g. Spark 2.2.1 isn’t there). I’m 
> > > guessing this is part of an effort to use the Apache mirror network, like 
> > > other Apache projects do.
> > > From a user perspective, the Apache mirror network is several steps down 
> > > from using a modern CDN. Let me summarize why:
> > >
> > > 1. Apache mirrors are often slow. Apache does not impose any performance 
> > > requirements on its mirrors. The difference between getting a good mirror 
> > > and a bad one means downloading Spark in less than a minute vs. 20 
> > > minutes. The problem is so bad that I’ve thought about adding an Apache 
> > > mirror blacklist to Flintrock to avoid getting one of these dud mirrors.
> > > 2. Apache mirrors are inconvenient to use. When you download something 
> > > from an Apache mirror, you get a link like this one. Instead of 
> > > automatically redirecting you to your download, though, you need to 
> > > process the results you get back to find your download target. And you 
> > > need to handle the high download failure rate, since sometimes the mirror 
> > > you get doesn’t have the file it claims to have.
> > > 3. Apache mirrors are incomplete. Apache mirrors only keep around the 
> > > latest releases, save for a few “archive” mirrors, which are often slow. 
> > > So if you want to download anything but the latest version of Spark, you 
> > > are out of luck.
> > >
> > > Some of these problems can be mitigated by picking a specific mirror that 
> > > works well and hardcoding it in your scripts, but that defeats the 
> > > purpose of dynamically selecting a mirror and makes you a “bad” user of 
> > > the mirror network.
> > > I raised some of these issues over on INFRA-10999. The ticket sat for a 
> > > year before I heard anything back, and the bottom line was that none of 
> > > the above problems have a solution on the horizon. It’s fine. I 
> > > understand that Apache is a volunteer organization and that the 
> > > infrastructure team has a lot to manage as it is. I still find it 
> > > disappointing that an organization of Apache’s stature doesn’t have a 
> > > better solution for this in collaboration with a third party. Python 
> > > serves PyPI downloads using Fastly and Homebrew serves packages using 
> > > Bintray. They both work really, really well. Why don’t we have something 
> > > as good for Apache projects? Anyway, that’s a separate discussion.
> > > What I want to say is this:
> > > Dear whoever owns the spark-related-packages S3 bucket,
> > > Please keep the bucket up-to-date with the latest Spark releases, 
> > > alongside the past releases that are already on there. It’s a huge help 
> > > to the Flintrock project, and it’s an equally big help to those of us 
> > > writing infrastructure automation scripts that deploy Spark in other 
> > > contexts.
> > > I understand that hosting this stuff is not free, and that I am not 
> > > paying anything for this service. If it needs to go, so be it. But I 
> > > wanted to take this opportunity to lay out the benefits I’ve enjoyed 
> > > thanks to having this bucket around, and to make sure that if it did die, 
> > > it didn’t die a quiet death.
> > > Sincerely,
> > > Nick
> > >
>


Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Nick Pentreath
+1 (binding)

Built and ran Scala tests with "-Phadoop-2.6 -Pyarn -Phive", all passed.

Python tests passed (also including pyspark-streaming w/kafka-0.8 and flume
packages built)

On Tue, 27 Feb 2018 at 10:09 Felix Cheung  wrote:

> +1
>
> Tested R:
>
> install from package, CRAN tests, manual tests, help check, vignettes check
>
> Filed this https://issues.apache.org/jira/browse/SPARK-23461
> This is not a regression so not a blocker of the release.
>
> Tested this on win-builder and r-hub. On r-hub on multiple platforms
> everything passed. For win-builder tests failed on x86 but passed x64 -
> perhaps due to an intermittent download issue causing a gzip error,
> re-testing now but won’t hold the release on this.
>
> --
> *From:* Nan Zhu 
> *Sent:* Monday, February 26, 2018 4:03:22 PM
> *To:* Michael Armbrust
> *Cc:* dev
> *Subject:* Re: [VOTE] Spark 2.3.0 (RC5)
>
> +1  (non-binding), tested with internal workloads and benchmarks
>
> On Mon, Feb 26, 2018 at 12:09 PM, Michael Armbrust  > wrote:
>
>> +1 all our pipelines have been running the RC for several days now.
>>
>> On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun 
>> wrote:
>>
>>> +1 (non-binding).
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>>
>>> On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue 
>>> wrote:
>>>
 +1 (non-binding)

 On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li  wrote:

> +1 (binding) in Spark SQL, Core and PySpark.
>
> Xiao
>
> 2018-02-24 14:49 GMT-08:00 Ricardo Almeida <
> ricardo.alme...@actnowib.com>:
>
>> +1 (non-binding)
>>
>> same as previous RC
>>
>> On 24 February 2018 at 11:10, Hyukjin Kwon 
>> wrote:
>>
>>> +1
>>>
>>> 2018-02-24 16:57 GMT+09:00 Bryan Cutler :
>>>
 +1
 Tests passed and additionally ran Arrow related tests and did some
 perf checks with python 2.7.14

 On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau  wrote:

> Note: given the state of Jenkins I'd love to see Bryan Cutler or
> someone with Arrow experience sign off on this release.
>
> On Fri, Feb 23, 2018 at 6:13 PM, Cheng Lian  > wrote:
>
>> +1 (binding)
>>
>> Passed all the tests, looks good.
>>
>> Cheng
>>
>> On 2/23/18 15:00, Holden Karau wrote:
>>
>> +1 (binding)
>> PySpark artifacts install in a fresh Py3 virtual env
>>
>> On Feb 23, 2018 7:55 AM, "Denny Lee" 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Fri, Feb 23, 2018 at 07:08 Josh Goldsborough <
>>> joshgoldsboroughs...@gmail.com> wrote:
>>>
 New to testing out Spark RCs for the community but I was able
 to run some of the basic unit tests without error so for what it's 
 worth,
 I'm a +1.

 On Thu, Feb 22, 2018 at 4:23 PM, Sameer Agarwal <
 samee...@apache.org> wrote:

> Please vote on releasing the following candidate as Apache
> Spark version 2.3.0. The vote is open until Tuesday February 27, 
> 2018 at
> 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 
> votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see
> https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc5:
> https://github.com/apache/spark/tree/v2.3.0-rc5
> (992447fb30ee9ebb3cf794f2d06f4d63a2d792db)
>
> List of JIRA tickets resolved in this release can be found
> here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be
> found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1266/
>
> The documentation corresponding to this release can be found
> at:
>
> 

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Felix Cheung
+1

Tested R:

install from package, CRAN tests, manual tests, help check, vignettes check

Filed this https://issues.apache.org/jira/browse/SPARK-23461
This is not a regression so not a blocker of the release.

Tested this on win-builder and r-hub. On r-hub on multiple platforms everything 
passed. For win-builder tests failed on x86 but passed x64 - perhaps due to an 
intermittent download issue causing a gzip error, re-testing now but won’t hold 
the release on this.


From: Nan Zhu 
Sent: Monday, February 26, 2018 4:03:22 PM
To: Michael Armbrust
Cc: dev
Subject: Re: [VOTE] Spark 2.3.0 (RC5)

+1  (non-binding), tested with internal workloads and benchmarks

On Mon, Feb 26, 2018 at 12:09 PM, Michael Armbrust 
> wrote:
+1 all our pipelines have been running the RC for several days now.

On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun 
> wrote:
+1 (non-binding).

Bests,
Dongjoon.



On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue 
> wrote:
+1 (non-binding)

On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li 
> wrote:
+1 (binding) in Spark SQL, Core and PySpark.

Xiao

2018-02-24 14:49 GMT-08:00 Ricardo Almeida 
>:
+1 (non-binding)

same as previous RC

On 24 February 2018 at 11:10, Hyukjin Kwon 
> wrote:
+1

2018-02-24 16:57 GMT+09:00 Bryan Cutler 
>:
+1
Tests passed and additionally ran Arrow related tests and did some perf checks 
with python 2.7.14

On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau 
> wrote:
Note: given the state of Jenkins I'd love to see Bryan Cutler or someone with 
Arrow experience sign off on this release.

On Fri, Feb 23, 2018 at 6:13 PM, Cheng Lian 
> wrote:

+1 (binding)

Passed all the tests, looks good.

Cheng

On 2/23/18 15:00, Holden Karau wrote:
+1 (binding)
PySpark artifacts install in a fresh Py3 virtual env

On Feb 23, 2018 7:55 AM, "Denny Lee" 
> wrote:
+1 (non-binding)

On Fri, Feb 23, 2018 at 07:08 Josh Goldsborough 
> wrote:
New to testing out Spark RCs for the community but I was able to run some of 
the basic unit tests without error so for what it's worth, I'm a +1.

On Thu, Feb 22, 2018 at 4:23 PM, Sameer Agarwal 
> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.3.0. 
The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC and passes 
if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc5: 
https://github.com/apache/spark/tree/v2.3.0-rc5 
(992447fb30ee9ebb3cf794f2d06f4d63a2d792db)

List of JIRA tickets resolved in this release can be found here: 
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1266/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are 
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

If you're working in PySpark you can set up a virtual env and install the 
current RC and see if anything important breaks, in the Java/Scala you can add 
the staging repository to your projects resolvers and test with the RC (make 
sure to clean up the artifact cache before/after so you don't end up building 
with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage.