Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Felix Cheung
Thanks
This was with an external package and unrelated

  >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)

As for CentOS - would it be possible to test against R older than 3.4.0? This 
is the same error reported by Nick below.

_
From: Hyukjin Kwon >
Sent: Tuesday, June 13, 2017 8:02 PM
Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
To: dev >
Cc: Sean Owen >, Nick Pentreath 
>, Felix Cheung 
>


For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed 
(https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning 
(https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced 
(https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests and 
observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it is a 
bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li 
>:
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or 
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li



2017-06-13 9:39 GMT-07:00 Joseph Bradley 
>:
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I particularly 
notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier 
Spark releases).  One other point not mentioned above: I think they serve as a 
very helpful reminder/training for the community for rigor in development.  
Since we instituted QA JIRAs, contributors have been a lot better about adding 
in docs early, rather than waiting until the end of the cycle (though I know 
this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported 
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen 
> wrote:
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but 
that's also reporting R test failures.

I went back and tried to run the R tests and they passed, at least on Ubuntu 17 
/ R 3.3.


On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
> wrote:
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R 
it seems).

However, I'm seeing the following test failure on R consistently: 
https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


On Thu, 8 Jun 2017 at 08:48 Denny Lee 
> wrote:
+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML, GraphFrames, 
Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
> wrote:
+1 non-binding

Regards,
vaquar khan

On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
> wrote:
+1 (non-binding)

Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive 
-Phive-thriftserver -Pscala-2.11 on

  *   Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
  *   macOS 10.12.5 Java 8 (build 1.8.0_131)

On 5 June 2017 at 21:14, Michael Armbrust 
> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Thurs, June 8th, 2017 at 12:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc4 
(377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)

List of JIRA tickets resolved can be found with this 
filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/

Release 

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Hyukjin Kwon
For the test failure on R, I checked:


Per https://github.com/apache/spark/tree/v2.2.0-rc4,

1. Windows Server 2012 R2 / R 3.3.1 - passed (
https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4
)
2. macOS Sierra 10.12.3 / R 3.4.0 - passed
3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
4. CentOS 7.2.1511 / R 3.4.0 - reproduced (
https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)


Per https://github.com/apache/spark/tree/v2.1.1,

1. CentOS 7.2.1511 / R 3.4.0 - reproduced (
https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)


This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
and observations.

This is failed in Spark 2.1.1. So, it sounds not a regression although it
is a bug that should be fixed (whether in Spark or R).


2017-06-14 8:28 GMT+09:00 Xiao Li :

> -1
>
> Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or
> earlier.
>
> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>
> Will fix it soon.
>
> Thanks,
>
> Xiao Li
>
>
>
> 2017-06-13 9:39 GMT-07:00 Joseph Bradley :
>
>> Re: the QA JIRAs:
>> Thanks for discussing them.  I still feel they are very helpful; I
>> particularly notice not having to spend a solid 2-3 weeks of time QAing
>> (unlike in earlier Spark releases).  One other point not mentioned above: I
>> think they serve as a very helpful reminder/training for the community for
>> rigor in development.  Since we instituted QA JIRAs, contributors have been
>> a lot better about adding in docs early, rather than waiting until the end
>> of the cycle (though I know this is drawing conclusions from correlations).
>>
>> I would vote in favor of the RC...but I'll wait to see about the reported
>> failures.
>>
>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:
>>
>>> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
>>> that's also reporting R test failures.
>>>
>>> I went back and tried to run the R tests and they passed, at least on
>>> Ubuntu 17 / R 3.3.
>>>
>>>
>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
>>> wrote:
>>>
 All Scala, Python tests pass. ML QA and doc issues are resolved (as
 well as R it seems).

 However, I'm seeing the following test failure on R consistently:
 https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72


 On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:

> +1 non-binding
>
> Tested on macOS Sierra, Ubuntu 16.04
> test suite includes various test cases including Spark SQL, ML,
> GraphFrames, Structured Streaming
>
>
> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
> wrote:
>
>> +1 non-binding
>>
>> Regards,
>> vaquar khan
>>
>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
>> ricardo.alme...@actnowib.com> wrote:
>>
>> +1 (non-binding)
>>
>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
>> -Phive -Phive-thriftserver -Pscala-2.11 on
>>
>>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>>
>>
>> On 5 June 2017 at 21:14, Michael Armbrust 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at
>>> 12:00 PST and passes if a majority of at least 3 +1 PMC votes are
>>> cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc4
>>>  (377cfa8ac7ff7a8
>>> a6a6d273182e18ea7dc25ce7e)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found
>>> at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapache
>>> spark-1241/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>>> 0-rc4-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this 

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Xiao Li
-1

Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or
earlier.

Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085

Will fix it soon.

Thanks,

Xiao Li



2017-06-13 9:39 GMT-07:00 Joseph Bradley :

> Re: the QA JIRAs:
> Thanks for discussing them.  I still feel they are very helpful; I
> particularly notice not having to spend a solid 2-3 weeks of time QAing
> (unlike in earlier Spark releases).  One other point not mentioned above: I
> think they serve as a very helpful reminder/training for the community for
> rigor in development.  Since we instituted QA JIRAs, contributors have been
> a lot better about adding in docs early, rather than waiting until the end
> of the cycle (though I know this is drawing conclusions from correlations).
>
> I would vote in favor of the RC...but I'll wait to see about the reported
> failures.
>
> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:
>
>> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
>> that's also reporting R test failures.
>>
>> I went back and tried to run the R tests and they passed, at least on
>> Ubuntu 17 / R 3.3.
>>
>>
>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
>> wrote:
>>
>>> All Scala, Python tests pass. ML QA and doc issues are resolved (as well
>>> as R it seems).
>>>
>>> However, I'm seeing the following test failure on R consistently:
>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>>
>>>
>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:
>>>
 +1 non-binding

 Tested on macOS Sierra, Ubuntu 16.04
 test suite includes various test cases including Spark SQL, ML,
 GraphFrames, Structured Streaming


 On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
 wrote:

> +1 non-binding
>
> Regards,
> vaquar khan
>
> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
> ricardo.alme...@actnowib.com> wrote:
>
> +1 (non-binding)
>
> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
> -Phive -Phive-thriftserver -Pscala-2.11 on
>
>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>
>
> On 5 June 2017 at 21:14, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (377cfa8ac7ff7a8
>> a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapache
>> spark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>> 0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from 2.1.1.
>>
>
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>


Re: [RISE Research] [build system] jenkins currently down due to campus-wide power failure

2017-06-13 Thread Anthony D. Joseph
Dodged a fire too!

On Tue, Jun 13, 2017 at 3:31 PM, shane knapp  wrote:

> ok, we're back up...  thankfully i didn't need to go to the colo!
>
> shane (who strongly feels he just dodged a bullet)
>
> On Tue, Jun 13, 2017 at 3:06 PM, shane knapp  wrote:
> > i'm able to VPN in, but not connect to the master or any slaves.  it's
> > looking like i'll need to head down from my building to the colo and
> > see what's up.
> >
> > shane
>


Re: [build system] jenkins currently down due to campus-wide power failure

2017-06-13 Thread shane knapp
ok, we're back up...  thankfully i didn't need to go to the colo!

shane (who strongly feels he just dodged a bullet)

On Tue, Jun 13, 2017 at 3:06 PM, shane knapp  wrote:
> i'm able to VPN in, but not connect to the master or any slaves.  it's
> looking like i'll need to head down from my building to the colo and
> see what's up.
>
> shane

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[build system] jenkins currently down due to campus-wide power failure

2017-06-13 Thread shane knapp
i'm able to VPN in, but not connect to the master or any slaves.  it's
looking like i'll need to head down from my building to the colo and
see what's up.

shane

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2017-06-13 Thread hazem
this can also happen if your PATH is not set properly (but JAVA_HOME is),
just in case anyone hits this in the future. This simply fixes it:

$ export PATH=$PATH:$JAVA_HOME/bin

and rerun build



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUILD-FAILURE-at-Spark-Project-Test-Tags-for-2-11-7-tp16055p21739.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Can I use ChannelTrafficShapingHandler to control the network read/write speed in shuffle?

2017-06-13 Thread Shixiong(Ryan) Zhu
I took a look at ChannelTrafficShapingHandler. Looks like it's because it
doesn't support FileRegion. Spark's messages use this interface.
See org.apache.spark.network.protocol.MessageWithHeader.

On Tue, Jun 13, 2017 at 4:17 AM, Niu Zhaojie  wrote:

> Hi All:
>
> I am trying to control the network read/write speed with
> ChannelTrafficShapingHandler provided by Netty.
>
>
> In TransportContext.java
>
> I modify it as below:
>
> public TransportChannelHandler initializePipeline(
> SocketChannel channel,
> RpcHandler channelRpcHandler) {
>   try {
> // added by zhaojie
> logger.info("want to try control read bandwidth on host: " + host);
> final ChannelTrafficShapingHandler channelShaping = new 
> ChannelTrafficShapingHandler(50, 50, 1000);
>
> TransportChannelHandler channelHandler = createChannelHandler(channel, 
> channelRpcHandler);
>
> channel.pipeline()
> .addLast("encoder", ENCODER)
> .addLast(TransportFrameDecoder.HANDLER_NAME, 
> NettyUtils.createFrameDecoder())
> .addLast("decoder", DECODER)
> .addLast("channelTrafficShaping", channelShaping)
> .addLast("idleStateHandler", new IdleStateHandler(0, 0, 
> conf.connectionTimeoutMs() / 1000))
> // NOTE: Chunks are currently guaranteed to be returned in the 
> order of request, but this
> // would require more logic to guarantee if this were not part of 
> the same event loop.
> .addLast("handler", channelHandler);
>
>
> I create a ChannelTrafficShapingHandler and register it into the pipeline
> of the channel. I set the write and read speed as 50kb/sec in the
> constructor.
> Except for it, what else do I need to do?
>
> However, it does not work. Is this idea correct? Am I missing something?
> Is there any better way ?
>
> Thanks.
>
> --
> *Regards,*
> *Zhaojie*
>
>


Re: [build system] jenkins firewall reboot

2017-06-13 Thread shane knapp
...and we're back!

On Tue, Jun 13, 2017 at 10:17 AM, shane knapp  wrote:
> i have been seeing continuous and slow network speeds apparently being
> caused by the VPN...  it's currently rebooting and should be back up
> in ~5 mins.
>
> sorry for the interruption is service!

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Performance regression for partitioned parquet data

2017-06-13 Thread Michael Allman
Hi Bertrand,

I encourage you to create a ticket for this and submit a PR if you have time. 
Please add me as a listener, and I'll try to contribute/review.

Michael

> On Jun 6, 2017, at 5:18 AM, Bertrand Bossy  
> wrote:
> 
> Hi,
> 
> since moving to spark 2.1 from 2.0, we experience a performance regression 
> when reading a large, partitioned parquet dataset: 
> 
> We observe many (hundreds) very short jobs executing before the job that 
> reads the data is starting. I looked into this issue and pinned it down to 
> PartitioningAwareFileIndex: While recursively listing the directories, if a 
> directory contains more than 
> "spark.sql.sources.parallelPartitionDiscovery.threshold" (default: 32) paths, 
> the children are listed using a spark job. Because the tree is listed 
> serially, this can result in a lot of small spark jobs executed one after the 
> other and the overhead dominates. Performance can be improved by tuning 
> "spark.sql.sources.parallelPartitionDiscovery.threshold". However, this is 
> not a satisfactory solution.
> 
> I think that the current behaviour could be improved by walking the directory 
> tree in breadth first search order and only launching one spark job to list 
> files in parallel if the number of paths to be listed at some level exceeds 
> spark.sql.sources.parallelPartitionDiscovery.threshold .
> 
> Does this approach make sense? I have found "Regression in file listing 
> performance" ( https://issues.apache.org/jira/browse/SPARK-18679 
>  ) as the most closely 
> related ticket.
> 
> Unless there is a reason for the current behaviour, I will create a ticket on 
> this soon. I might have some time in the coming days to work on this.
> 
> Regards,
> Bertrand
> 
> -- 
> Bertrand Bossy | TERALYTICS
> 
> software engineer
> 
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland 
> www.teralytics.net 
> Company registration number: CH-020.3.037.709-7 | Trade register Canton Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann de 
> Vries
> This e-mail message contains confidential information which is for the sole 
> attention and use of the intended recipient. Please notify us at once if you 
> think that it may not be intended for you and delete it immediately.
> 
> 



Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Joseph Bradley
Re: the QA JIRAs:
Thanks for discussing them.  I still feel they are very helpful; I
particularly notice not having to spend a solid 2-3 weeks of time QAing
(unlike in earlier Spark releases).  One other point not mentioned above: I
think they serve as a very helpful reminder/training for the community for
rigor in development.  Since we instituted QA JIRAs, contributors have been
a lot better about adding in docs early, rather than waiting until the end
of the cycle (though I know this is drawing conclusions from correlations).

I would vote in favor of the RC...but I'll wait to see about the reported
failures.

On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen  wrote:

> Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but
> that's also reporting R test failures.
>
> I went back and tried to run the R tests and they passed, at least on
> Ubuntu 17 / R 3.3.
>
>
> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath 
> wrote:
>
>> All Scala, Python tests pass. ML QA and doc issues are resolved (as well
>> as R it seems).
>>
>> However, I'm seeing the following test failure on R consistently:
>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>
>>
>> On Thu, 8 Jun 2017 at 08:48 Denny Lee  wrote:
>>
>>> +1 non-binding
>>>
>>> Tested on macOS Sierra, Ubuntu 16.04
>>> test suite includes various test cases including Spark SQL, ML,
>>> GraphFrames, Structured Streaming
>>>
>>>
>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan 
>>> wrote:
>>>
 +1 non-binding

 Regards,
 vaquar khan

 On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
 wrote:

 +1 (non-binding)

 Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
 -Phive -Phive-thriftserver -Pscala-2.11 on

- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
- macOS 10.12.5 Java 8 (build 1.8.0_131)


 On 5 June 2017 at 21:14, Michael Armbrust 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00
> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.0-rc4
>  (377cfa8ac7ff7a8
> a6a6d273182e18ea7dc25ce7e)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/
> orgapachespark-1241/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-
> 2.2.0-rc4-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should be
> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from 2.1.1.
>





-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


how to debug app with cluster mode please?

2017-06-13 Thread ??????????
Hi all,


I am learning spark 2.1code.
I write app with "master[4]", I run and debug code.It works well.


when I change code with "master[2,2??1024]" and debug it as before,  I meet 
error as follow:
 java.lang.classnotfindexception:

com.xxx.xxx$$anonfun$main$1


the class is my main class.
Would you like help me please??
THANKS 
Robin  Shao

Can I use ChannelTrafficShapingHandler to control the network read/write speed in shuffle?

2017-06-13 Thread Niu Zhaojie
Hi All:

I am trying to control the network read/write speed with
ChannelTrafficShapingHandler provided by Netty.


In TransportContext.java

I modify it as below:

public TransportChannelHandler initializePipeline(
SocketChannel channel,
RpcHandler channelRpcHandler) {
  try {
// added by zhaojie
logger.info("want to try control read bandwidth on host: " + host);
final ChannelTrafficShapingHandler channelShaping = new
ChannelTrafficShapingHandler(50, 50, 1000);

TransportChannelHandler channelHandler =
createChannelHandler(channel, channelRpcHandler);

channel.pipeline()
.addLast("encoder", ENCODER)
.addLast(TransportFrameDecoder.HANDLER_NAME,
NettyUtils.createFrameDecoder())
.addLast("decoder", DECODER)
.addLast("channelTrafficShaping", channelShaping)
.addLast("idleStateHandler", new IdleStateHandler(0, 0,
conf.connectionTimeoutMs() / 1000))
// NOTE: Chunks are currently guaranteed to be returned in
the order of request, but this
// would require more logic to guarantee if this were not
part of the same event loop.
.addLast("handler", channelHandler);


I create a ChannelTrafficShapingHandler and register it into the pipeline
of the channel. I set the write and read speed as 50kb/sec in the
constructor.
Except for it, what else do I need to do?

However, it does not work. Is this idea correct? Am I missing something?
Is there any better way ?

Thanks.

-- 
*Regards,*
*Zhaojie*