Spark Streaming Custom Receiver Anomaly

2018-02-20 Thread Thakrar, Jayesh
Hi All,

I am trying to "test" a very simple custom receiver and am a little puzzled.

Using Spark 2.2.0 shell on my laptop, I am running the code below.
I was expecting the code to timeout since my timeout wait period is 1 ms and I 
have a sleep in the class that is much more (1200 ms).

Is this normal? Or am I interpreting something incorrectly?

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming._

class CustomReceiver extends 
org.apache.spark.streaming.receiver.Receiver[String](org.apache.spark.storage.StorageLevel.MEMORY_ONLY)
 {
  def onStart() {
new Thread("CustomReceiver") {
  override def run() { receive() }
}.start()
  }
  def onStop() {}
  private def receive() {
val hostname = java.net.InetAddress.getLocalHost()
val time = java.util.Calendar.getInstance.getTime
var counter = 0
while (isStarted && !isStopped) {
  counter += 1
  store(s"host = ${hostname} time = ${time} counter = ${counter}")
  Thread.sleep(1200)
}
  }
}

val ssc = new StreamingContext(sc, Seconds(1))
val words = ssc.receiverStream(new CustomReceiver())

words.print()
ssc.start()
ssc.awaitTerminationOrTimeout(1)




Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Weichen Xu
+1

On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin 
wrote:

> Done, thanks!
>
> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal 
> wrote:
> > Sure, please feel free to backport.
> >
> > On 20 February 2018 at 18:02, Marcelo Vanzin 
> wrote:
> >>
> >> Hey Sameer,
> >>
> >> Mind including https://github.com/apache/spark/pull/20643
> >> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
> >> with older shuffle services, but it's pretty safe.
> >>
> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
> >> wrote:
> >> > This RC has failed due to
> >> > https://issues.apache.org/jira/browse/SPARK-23470.
> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow
> >> > up
> >> > with an RC5 soon.
> >> >
> >> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
> >> >>
> >> >> +1
> >> >>
> >> >> Build & tests look fine, checked signature and checksums for src
> >> >> tarball.
> >> >>
> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >> >>  wrote:
> >> >>>
> >> >>> I'm -1 because of the UI regression
> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs
> page
> >> >>> may be
> >> >>> too slow and cause "read timeout" when there are lots of jobs and
> >> >>> stages.
> >> >>> This is one of the most important pages because when it's broken,
> it's
> >> >>> pretty hard to use Spark Web UI.
> >> >>>
> >> >>>
> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <
> marcogaid...@gmail.com>
> >> >>> wrote:
> >> 
> >>  +1
> >> 
> >>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
> >> >
> >> > +1 too
> >> >
> >> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN  >:
> >> >>
> >> >> +1
> >> >>
> >> >>
> >> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> +1
> >> >>>
> >> >>>
> >> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
> >> 
> >>  +1
> >> 
> >>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
> >>  
> >>  wrote:
> >> >
> >> > +1
> >> >
> >> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
> >> > , wrote:
> >> >>
> >> >> this file shouldn't be included?
> >> >>
> >> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
> spark-parent_2.11.iml
> >> >
> >> >
> >> > I've now deleted this file
> >> >
> >> >> From: Sameer Agarwal 
> >> >> Sent: Saturday, February 17, 2018 1:43:39 PM
> >> >> To: Sameer Agarwal
> >> >> Cc: dev
> >> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
> >> >>
> >> >> I'll start with a +1 once again.
> >> >>
> >> >> All blockers reported against RC3 have been resolved and the
> >> >> builds are healthy.
> >> >>
> >> >> On 17 February 2018 at 13:41, Sameer Agarwal
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> Please vote on releasing the following candidate as Apache
> >> >>> Spark
> >> >>> version 2.3.0. The vote is open until Thursday February 22,
> >> >>> 2018 at 8:00:00
> >> >>> am UTC and passes if a majority of at least 3 PMC +1 votes
> are
> >> >>> cast.
> >> >>>
> >> >>>
> >> >>> [ ] +1 Release this package as Apache Spark 2.3.0
> >> >>>
> >> >>> [ ] -1 Do not release this package because ...
> >> >>>
> >> >>>
> >> >>> To learn more about Apache Spark, please see
> >> >>> https://spark.apache.org/
> >> >>>
> >> >>> The tag to be voted on is v2.3.0-rc4:
> >> >>> https://github.com/apache/spark/tree/v2.3.0-rc4
> >> >>> (44095cb65500739695b0324c177c19dfa1471472)
> >> >>>
> >> >>> List of JIRA tickets resolved in this release can be found
> >> >>> here:
> >> >>>
> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/
> 12339551
> >> >>>
> >> >>> The release files, including signatures, digests, etc. can
> be
> >> >>> found at:
> >> >>> https://dist.apache.org/repos/
> dist/dev/spark/v2.3.0-rc4-bin/
> >> >>>
> >> >>> Release artifacts are signed with the following key:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >>>
> >> >>> The staging repository for this release can be found at:
> >> >>>
> >> >>>
> >> >>> https://repository.apache.org/content/repositories/
> orgapachespark-1265/
> >> >>>
> >> >>> The documentation corresponding to 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Done, thanks!

On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal  wrote:
> Sure, please feel free to backport.
>
> On 20 February 2018 at 18:02, Marcelo Vanzin  wrote:
>>
>> Hey Sameer,
>>
>> Mind including https://github.com/apache/spark/pull/20643
>> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
>> with older shuffle services, but it's pretty safe.
>>
>> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
>> wrote:
>> > This RC has failed due to
>> > https://issues.apache.org/jira/browse/SPARK-23470.
>> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow
>> > up
>> > with an RC5 soon.
>> >
>> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
>> >>
>> >> +1
>> >>
>> >> Build & tests look fine, checked signature and checksums for src
>> >> tarball.
>> >>
>> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>> >>  wrote:
>> >>>
>> >>> I'm -1 because of the UI regression
>> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs page
>> >>> may be
>> >>> too slow and cause "read timeout" when there are lots of jobs and
>> >>> stages.
>> >>> This is one of the most important pages because when it's broken, it's
>> >>> pretty hard to use Spark Web UI.
>> >>>
>> >>>
>> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido 
>> >>> wrote:
>> 
>>  +1
>> 
>>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>> >
>> > +1 too
>> >
>> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
>> >>
>> >> +1
>> >>
>> >>
>> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang
>> >> 
>> >> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>>
>> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>> 
>>  +1
>> 
>>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin
>>  
>>  wrote:
>> >
>> > +1
>> >
>> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
>> > , wrote:
>> >>
>> >> this file shouldn't be included?
>> >>
>> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>> >
>> >
>> > I've now deleted this file
>> >
>> >> From: Sameer Agarwal 
>> >> Sent: Saturday, February 17, 2018 1:43:39 PM
>> >> To: Sameer Agarwal
>> >> Cc: dev
>> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>> >>
>> >> I'll start with a +1 once again.
>> >>
>> >> All blockers reported against RC3 have been resolved and the
>> >> builds are healthy.
>> >>
>> >> On 17 February 2018 at 13:41, Sameer Agarwal
>> >> 
>> >> wrote:
>> >>>
>> >>> Please vote on releasing the following candidate as Apache
>> >>> Spark
>> >>> version 2.3.0. The vote is open until Thursday February 22,
>> >>> 2018 at 8:00:00
>> >>> am UTC and passes if a majority of at least 3 PMC +1 votes are
>> >>> cast.
>> >>>
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.0
>> >>>
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>>
>> >>> To learn more about Apache Spark, please see
>> >>> https://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.0-rc4:
>> >>> https://github.com/apache/spark/tree/v2.3.0-rc4
>> >>> (44095cb65500739695b0324c177c19dfa1471472)
>> >>>
>> >>> List of JIRA tickets resolved in this release can be found
>> >>> here:
>> >>>
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >>>
>> >>> The release files, including signatures, digests, etc. can be
>> >>> found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>> >>>
>> >>> Release artifacts are signed with the following key:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> >>>
>> >>> https://repository.apache.org/content/repositories/orgapachespark-1265/
>> >>>
>> >>> The documentation corresponding to this release can be found
>> >>> at:
>> >>>
>> >>>
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>> >>>
>> >>>
>> >>> FAQ
>> >>>
>> >>> ===
>> >>> What are the unresolved issues targeted for 2.3.0?
>> >>> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
Sure, please feel free to backport.

On 20 February 2018 at 18:02, Marcelo Vanzin  wrote:

> Hey Sameer,
>
> Mind including https://github.com/apache/spark/pull/20643
> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
> with older shuffle services, but it's pretty safe.
>
> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal 
> wrote:
> > This RC has failed due to https://issues.apache.org/
> jira/browse/SPARK-23470.
> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
> > with an RC5 soon.
> >
> > On 20 February 2018 at 16:49, Ryan Blue  wrote:
> >>
> >> +1
> >>
> >> Build & tests look fine, checked signature and checksums for src
> tarball.
> >>
> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >>  wrote:
> >>>
> >>> I'm -1 because of the UI regression
> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs page
> may be
> >>> too slow and cause "read timeout" when there are lots of jobs and
> stages.
> >>> This is one of the most important pages because when it's broken, it's
> >>> pretty hard to use Spark Web UI.
> >>>
> >>>
> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido 
> >>> wrote:
> 
>  +1
> 
>  2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
> >
> > +1 too
> >
> > 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
> >>
> >> +1
> >>
> >>
> >> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang <
> jiangxb1...@gmail.com>
> >> wrote:
> >>>
> >>> +1
> >>>
> >>>
> >>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
> 
>  +1
> 
>  On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin <
> r...@databricks.com>
>  wrote:
> >
> > +1
> >
> > On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
> > , wrote:
> >>
> >> this file shouldn't be included?
> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
> spark-parent_2.11.iml
> >
> >
> > I've now deleted this file
> >
> >> From: Sameer Agarwal 
> >> Sent: Saturday, February 17, 2018 1:43:39 PM
> >> To: Sameer Agarwal
> >> Cc: dev
> >> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
> >>
> >> I'll start with a +1 once again.
> >>
> >> All blockers reported against RC3 have been resolved and the
> >> builds are healthy.
> >>
> >> On 17 February 2018 at 13:41, Sameer Agarwal <
> samee...@apache.org>
> >> wrote:
> >>>
> >>> Please vote on releasing the following candidate as Apache
> Spark
> >>> version 2.3.0. The vote is open until Thursday February 22,
> 2018 at 8:00:00
> >>> am UTC and passes if a majority of at least 3 PMC +1 votes are
> cast.
> >>>
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.3.0
> >>>
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>>
> >>> To learn more about Apache Spark, please see
> >>> https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v2.3.0-rc4:
> >>> https://github.com/apache/spark/tree/v2.3.0-rc4
> >>> (44095cb65500739695b0324c177c19dfa1471472)
> >>>
> >>> List of JIRA tickets resolved in this release can be found
> here:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/
> 12339551
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >>> found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
> >>>
> >>> Release artifacts are signed with the following key:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>> https://repository.apache.org/content/repositories/
> orgapachespark-1265/
> >>>
> >>> The documentation corresponding to this release can be found
> at:
> >>>
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-
> docs/_site/index.html
> >>>
> >>>
> >>> FAQ
> >>>
> >>> ===
> >>> What are the unresolved issues targeted for 2.3.0?
> >>> ===
> >>>
> >>> Please see https://s.apache.org/oXKi. At the time of writing,
> >>> there are currently no known release blockers.
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Hey Sameer,

Mind including https://github.com/apache/spark/pull/20643
(SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
with older shuffle services, but it's pretty safe.

On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal  wrote:
> This RC has failed due to https://issues.apache.org/jira/browse/SPARK-23470.
> Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
> with an RC5 soon.
>
> On 20 February 2018 at 16:49, Ryan Blue  wrote:
>>
>> +1
>>
>> Build & tests look fine, checked signature and checksums for src tarball.
>>
>> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
>>  wrote:
>>>
>>> I'm -1 because of the UI regression
>>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs page may be
>>> too slow and cause "read timeout" when there are lots of jobs and stages.
>>> This is one of the most important pages because when it's broken, it's
>>> pretty hard to use Spark Web UI.
>>>
>>>
>>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido 
>>> wrote:

 +1

 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>
> +1 too
>
> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
>>
>> +1
>>
>>
>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
>> wrote:
>>>
>>> +1
>>>
>>>
>>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:

 +1

 On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
 wrote:
>
> +1
>
> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
> , wrote:
>>
>> this file shouldn't be included?
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>
>
> I've now deleted this file
>
>> From: Sameer Agarwal 
>> Sent: Saturday, February 17, 2018 1:43:39 PM
>> To: Sameer Agarwal
>> Cc: dev
>> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
>>
>> I'll start with a +1 once again.
>>
>> All blockers reported against RC3 have been resolved and the
>> builds are healthy.
>>
>> On 17 February 2018 at 13:41, Sameer Agarwal 
>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
>>> 8:00:00
>>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see
>>> https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc4:
>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be
>>> found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1265/
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing,
>>> there are currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by
>>> taking an existing Spark workload and running on this release 
>>> candidate,
>>> then reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and
>>> install the current RC and see if anything important breaks, in the
>>> Java/Scala you can add the staging repository 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
This RC has failed due to https://issues.apache.org/jira/browse/SPARK-23470.
Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
with an RC5 soon.

On 20 February 2018 at 16:49, Ryan Blue  wrote:

> +1
>
> Build & tests look fine, checked signature and checksums for src tarball.
>
> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> I'm -1 because of the UI regression https://issues.apac
>> he.org/jira/browse/SPARK-23470 : the All Jobs page may be too slow and
>> cause "read timeout" when there are lots of jobs and stages. This is one of
>> the most important pages because when it's broken, it's pretty hard to use
>> Spark Web UI.
>>
>>
>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido 
>> wrote:
>>
>>> +1
>>>
>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>>>
 +1 too

 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :

> +1
>
>
> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
> wrote:
>
>> +1
>>
>>
>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>
>>> +1
>>>
>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
>>> wrote:
>>>
 +1

 On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal <
 sameer.a...@gmail.com>, wrote:

 this file shouldn't be included? https://dist.apache.org/repos/
> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>

 I've now deleted this file

 *From:* Sameer Agarwal 
> *Sent:* Saturday, February 17, 2018 1:43:39 PM
> *To:* Sameer Agarwal
> *Cc:* dev
> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>
> I'll start with a +1 once again.
>
> All blockers reported against RC3 have been resolved and the
> builds are healthy.
>
> On 17 February 2018 at 13:41, Sameer Agarwal 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
>> 8:00:00
>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc4:
>> https://github.com/apache/spark/tree/v2.3.0-rc4
>> (44095cb65500739695b0324c177c19dfa1471472)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be
>> found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapache
>> spark-1265/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
>> /_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing,
>> there are currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by
>> taking an existing Spark workload and running on this release 
>> candidate,
>> then reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and
>> install the current RC and see if anything important breaks, in the
>> Java/Scala you can add the staging repository to your projects 
>> resolvers
>> and test with the RC (make sure to clean up the artifact cache 
>> before/after
>> so you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Ryan Blue
+1

Build & tests look fine, checked signature and checksums for src tarball.

On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:

> I'm -1 because of the UI regression https://issues.
> apache.org/jira/browse/SPARK-23470 : the All Jobs page may be too slow
> and cause "read timeout" when there are lots of jobs and stages. This is
> one of the most important pages because when it's broken, it's pretty hard
> to use Spark Web UI.
>
>
> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido 
> wrote:
>
>> +1
>>
>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>>
>>> +1 too
>>>
>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
>>>
 +1


 On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
 wrote:

> +1
>
>
> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>
>> +1
>>
>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
>> wrote:
>>
>>> +1
>>>
>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal <
>>> sameer.a...@gmail.com>, wrote:
>>>
>>> this file shouldn't be included? https://dist.apache.org/repos/
 dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml

>>>
>>> I've now deleted this file
>>>
>>> *From:* Sameer Agarwal 
 *Sent:* Saturday, February 17, 2018 1:43:39 PM
 *To:* Sameer Agarwal
 *Cc:* dev
 *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)

 I'll start with a +1 once again.

 All blockers reported against RC3 have been resolved and the builds
 are healthy.

 On 17 February 2018 at 13:41, Sameer Agarwal 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
> 8:00:00
> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see
> https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc4:
> https://github.com/apache/spark/tree/v2.3.0-rc4
> (44095cb65500739695b0324c177c19dfa1471472)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be
> found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapache
> spark-1265/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
> /_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing,
> there are currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by
> taking an existing Spark workload and running on this release 
> candidate,
> then reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and
> install the current RC and see if anything important breaks, in the
> Java/Scala you can add the staging repository to your projects 
> resolvers
> and test with the RC (make sure to clean up the artifact cache 
> before/after
> so you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important
> bug fixes, documentation, and API tweaks that impact compatibility 
> should
> be worked on immediately. Everything else please retarget to 2.3.1 or 
> 2.4.0
> as appropriate.
>
> ===
> Why is my bug 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Shixiong(Ryan) Zhu
I'm -1 because of the UI regression https://issues.apache.org/jira
/browse/SPARK-23470 : the All Jobs page may be too slow and cause "read
timeout" when there are lots of jobs and stages. This is one of the most
important pages because when it's broken, it's pretty hard to use Spark Web
UI.


On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido  wrote:

> +1
>
> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :
>
>> +1 too
>>
>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
>>
>>> +1
>>>
>>>
>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
>>> wrote:
>>>
 +1


 Wenchen Fan 于2018年2月20日 周二下午1:09写道:

> +1
>
> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
> wrote:
>
>> +1
>>
>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal ,
>> wrote:
>>
>> this file shouldn't be included? https://dist.apache.org/repos/
>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>>
>>
>> I've now deleted this file
>>
>> *From:* Sameer Agarwal 
>>> *Sent:* Saturday, February 17, 2018 1:43:39 PM
>>> *To:* Sameer Agarwal
>>> *Cc:* dev
>>> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>>>
>>> I'll start with a +1 once again.
>>>
>>> All blockers reported against RC3 have been resolved and the builds
>>> are healthy.
>>>
>>> On 17 February 2018 at 13:41, Sameer Agarwal 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.3.0. The vote is open until Thursday February 22, 2018 at 
 8:00:00
 am UTC and passes if a majority of at least 3 PMC +1 votes are cast.


 [ ] +1 Release this package as Apache Spark 2.3.0

 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see
 https://spark.apache.org/

 The tag to be voted on is v2.3.0-rc4:
 https://github.com/apache/spark/tree/v2.3.0-rc4
 (44095cb65500739695b0324c177c19dfa1471472)

 List of JIRA tickets resolved in this release can be found here:
 https://issues.apache.org/jira/projects/SPARK/versions/12339551

 The release files, including signatures, digests, etc. can be found
 at:
 https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/

 Release artifacts are signed with the following key:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapache
 spark-1265/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
 /_site/index.html


 FAQ

 ===
 What are the unresolved issues targeted for 2.3.0?
 ===

 Please see https://s.apache.org/oXKi. At the time of writing,
 there are currently no known release blockers.

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by
 taking an existing Spark workload and running on this release 
 candidate,
 then reporting any regressions.

 If you're working in PySpark you can set up a virtual env and
 install the current RC and see if anything important breaks, in the
 Java/Scala you can add the staging repository to your projects 
 resolvers
 and test with the RC (make sure to clean up the artifact cache 
 before/after
 so you don't end up building with a out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 2.3.0?
 ===

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should 
 be
 worked on immediately. Everything else please retarget to 2.3.1 or 
 2.4.0 as
 appropriate.

 ===
 Why is my bug not fixed?
 ===

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from 2.2.0. That 
 being
 said, if there is something which is a regression from 2.2.0 and has 
 not

Re: How to change the attributes order in Apache SparkSQL `Project` operator ?

2018-02-20 Thread parana
Resolved using information from this post
https://developer.ibm.com/code/2017/11/30/learn-extension-points-apache-spark-extend-spark-catalyst-optimizer/
 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen

Dear Apache Enthusiast,

(You’re receiving this message because you’re subscribed to a user@ or 
dev@ list of one or more Apache Software Foundation projects.)


We’re pleased to announce the upcoming ApacheCon [1] in Montréal, 
September 24-27. This event is all about you — the Apache project community.


We’ll have four tracks of technical content this time, as well as lots 
of opportunities to connect with your project community, hack on the 
code, and learn about other related (and unrelated!) projects across the 
foundation.


The Call For Papers (CFP) [2] and registration are now open. Register 
early to take advantage of the early bird prices and secure your place 
at the event hotel.


Important dates
March 30: CFP closes
April 20: CFP notifications sent
	August 24: Hotel room block closes (please do not wait until the last 
minute)


Follow @ApacheCon on Twitter to be the first to hear announcements about 
keynotes, the schedule, evening events, and everything you can expect to 
see at the event.


See you in Montréal!

Sincerely, Rich Bowen, V.P. Events,
on behalf of the entire ApacheCon team

[1] http://www.apachecon.com/acna18
[2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marco Gaido
+1

2018-02-20 12:30 GMT+01:00 Hyukjin Kwon :

> +1 too
>
> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN :
>
>> +1
>>
>>
>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
>> wrote:
>>
>>> +1
>>>
>>>
>>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>>
 +1

 On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
 wrote:

> +1
>
> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal ,
> wrote:
>
> this file shouldn't be included? https://dist.apache.org/repos/
>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>
>
> I've now deleted this file
>
> *From:* Sameer Agarwal 
>> *Sent:* Saturday, February 17, 2018 1:43:39 PM
>> *To:* Sameer Agarwal
>> *Cc:* dev
>> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>>
>> I'll start with a +1 once again.
>>
>> All blockers reported against RC3 have been resolved and the builds
>> are healthy.
>>
>> On 17 February 2018 at 13:41, Sameer Agarwal 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
>>> 8:00:00
>>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see
>>> https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc4: https://github.com/apache/spar
>>> k/tree/v2.3.0-rc4 (44095cb65500739695b0324c177c19dfa1471472)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found
>>> at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapache
>>> spark-1265/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
>>> /_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing, there
>>> are currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and
>>> install the current RC and see if anything important breaks, in the
>>> Java/Scala you can add the staging repository to your projects resolvers
>>> and test with the RC (make sure to clean up the artifact cache 
>>> before/after
>>> so you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.1 or 
>>> 2.4.0 as
>>> appropriate.
>>>
>>> ===
>>> Why is my bug not fixed?
>>> ===
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from 2.2.0. That 
>>> being
>>> said, if there is something which is a regression from 2.2.0 and has not
>>> been correctly targeted please ping me or a committer to help target the
>>> issue (you can see the open issues listed as impacting Spark 2.3.0 at
>>> https://s.apache.org/WmoI).
>>>
>>
>>
>>
>> --
>> Sameer Agarwal
>> Computer Science | UC Berkeley
>> http://cs.berkeley.edu/~sameerag
>>
>
>
>
> --
> Sameer Agarwal
> Computer Science | UC Berkeley
> http://cs.berkeley.edu/~sameerag
>
>

>>
>>
>> --
>> Takuya UESHIN
>> Tokyo, Japan
>>
>> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Hyukjin Kwon
+1 too

2018-02-20 14:41 GMT+09:00 Takuya UESHIN :

> +1
>
>
> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang 
> wrote:
>
>> +1
>>
>>
>> Wenchen Fan 于2018年2月20日 周二下午1:09写道:
>>
>>> +1
>>>
>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin 
>>> wrote:
>>>
 +1

 On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal ,
 wrote:

 this file shouldn't be included? https://dist.apache.org/repos/
> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>

 I've now deleted this file

 *From:* Sameer Agarwal 
> *Sent:* Saturday, February 17, 2018 1:43:39 PM
> *To:* Sameer Agarwal
> *Cc:* dev
> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>
> I'll start with a +1 once again.
>
> All blockers reported against RC3 have been resolved and the builds
> are healthy.
>
> On 17 February 2018 at 13:41, Sameer Agarwal 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
>> 8:00:00
>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc4: https://github.com/apache/spar
>> k/tree/v2.3.0-rc4 (44095cb65500739695b0324c177c19dfa1471472)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapache
>> spark-1265/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
>> /_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there
>> are currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala 
>> you
>> can add the staging repository to your projects resolvers and test with 
>> the
>> RC (make sure to clean up the artifact cache before/after so you don't 
>> end
>> up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 
>> as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from 2.2.0. That being
>> said, if there is something which is a regression from 2.2.0 and has not
>> been correctly targeted please ping me or a committer to help target the
>> issue (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>
>
>
> --
> Sameer Agarwal
> Computer Science | UC Berkeley
> http://cs.berkeley.edu/~sameerag
>



 --
 Sameer Agarwal
 Computer Science | UC Berkeley
 http://cs.berkeley.edu/~sameerag


>>>
>
>
> --
> Takuya UESHIN
> Tokyo, Japan
>
> http://twitter.com/ueshin
>