Re: jenkins failing on Kinesis shard limits

2015-07-24 Thread Prabeesh K.
For me

https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/97/console

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38417/console

On 25 July 2015 at 09:57, Patrick Wendell  wrote:

> I've disabled the test and filed a JIRA:
>
> https://issues.apache.org/jira/browse/SPARK-9335
>
> On Fri, Jul 24, 2015 at 4:05 PM, Steve Loughran 
> wrote:
> >
> > Looks like Jenkins is hitting some AWS limits
> >
> >
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38396/testReport/org.apache.spark.streaming.kinesis/KinesisBackedBlockRDDSuite/_It_is_not_a_test_/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Protocol for build breaks

2015-07-24 Thread Patrick Wendell
Hi All,

If there is a build break (i.e. a compile issue or consistently
failing test) that somehow makes it into master, the best protocol is:

1. Revert the offending patch.
2. File a JIRA and assign it to the committer of the offending patch.
The JIRA should contain links to broken builds.

It's not worth waiting any time to try and figure out how to fix it,
or blocking on tracking down the commit author. This is because every
hour that we have the PRB broken is a major cost in terms of developer
productivity.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: jenkins failing on Kinesis shard limits

2015-07-24 Thread Patrick Wendell
I've disabled the test and filed a JIRA:

https://issues.apache.org/jira/browse/SPARK-9335

On Fri, Jul 24, 2015 at 4:05 PM, Steve Loughran  wrote:
>
> Looks like Jenkins is hitting some AWS limits
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38396/testReport/org.apache.spark.streaming.kinesis/KinesisBackedBlockRDDSuite/_It_is_not_a_test_/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Jenkins HiveCompatibilitySuite Test Failures

2015-07-24 Thread Calvin Jia
Hi,

I've been seeing errors with
org.apache.spark.sql.hive.execution.HiveCompatibilitySuite
from the Jenkins tests in a PR I proposed as well as in PRs from other
members of the community. Is this test stable at the moment?

Thanks,
Calvin


jenkins failing on Kinesis shard limits

2015-07-24 Thread Steve Loughran

Looks like Jenkins is hitting some AWS limits

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38396/testReport/org.apache.spark.streaming.kinesis/KinesisBackedBlockRDDSuite/_It_is_not_a_test_/


Re: PySpark on PyPi

2015-07-24 Thread Jeremy Freeman
Hey all, great discussion, just wanted to +1 that I see a lot of value in steps 
that make it easier to use PySpark as an ordinary python library.

You might want to check out this (https://github.com/minrk/findspark 
), started by Jupyter project devs, that 
offers one way to facilitate this stuff. I’ve also cced them here to join the 
conversation.

Also, @Jey, I can also confirm that at least in some scenarios (I’ve done it in 
an EC2 cluster in standalone mode) it’s possible to run PySpark jobs just using 
`from pyspark import SparkContext; sc = SparkContext(master=“X”)` so long as 
the environmental variables (PYTHONPATH and PYSPARK_PYTHON) are set correctly 
on *both* workers and driver. That said, there’s definitely additional 
configuration / functionality that would require going through the proper 
submit scripts.

> On Jul 22, 2015, at 7:41 PM, Punyashloka Biswal  
> wrote:
> 
> I agree with everything Justin just said. An additional advantage of 
> publishing PySpark's Python code in a standards-compliant way is the fact 
> that we'll be able to declare transitive dependencies (Pandas, Py4J) in a way 
> that pip can use. Contrast this with the current situation, where 
> df.toPandas() exists in the Spark API but doesn't actually work until you 
> install Pandas.
> 
> Punya
> On Wed, Jul 22, 2015 at 12:49 PM Justin Uang  > wrote:
> // + Davies for his comments
> // + Punya for SA
> 
> For development and CI, like Olivier mentioned, I think it would be hugely 
> beneficial to publish pyspark (only code in the python/ dir) on PyPI. If 
> anyone wants to develop against PySpark APIs, they need to download the 
> distribution and do a lot of PYTHONPATH munging for all the tools (pylint, 
> pytest, IDE code completion). Right now that involves adding python/ and 
> python/lib/py4j-0.8.2.1-src.zip. In case pyspark ever wants to add more 
> dependencies, we would have to manually mirror all the PYTHONPATH munging in 
> the ./pyspark script. With a proper pyspark setup.py which declares its 
> dependencies, and a published distribution, depending on pyspark will just be 
> adding pyspark to my setup.py dependencies.
> 
> Of course, if we actually want to run parts of pyspark that is backed by Py4J 
> calls, then we need the full spark distribution with either ./pyspark or 
> ./spark-submit, but for things like linting and development, the PYTHONPATH 
> munging is very annoying.
> 
> I don't think the version-mismatch issues are a compelling reason to not go 
> ahead with PyPI publishing. At runtime, we should definitely enforce that the 
> version has to be exact, which means there is no backcompat nightmare as 
> suggested by Davies in https://issues.apache.org/jira/browse/SPARK-1267 
> . This would mean that even 
> if the user got his pip installed pyspark to somehow get loaded before the 
> spark distribution provided pyspark, then the user would be alerted 
> immediately.
> 
> Davies, if you buy this, should me or someone on my team pick up 
> https://issues.apache.org/jira/browse/SPARK-1267 
>  and 
> https://github.com/apache/spark/pull/464 
> ?
> 
> On Sat, Jun 6, 2015 at 12:48 AM Olivier Girardot 
> mailto:o.girar...@lateral-thoughts.com>> 
> wrote:
> Ok, I get it. Now what can we do to improve the current situation, because 
> right now if I want to set-up a CI env for PySpark, I have to :
> 1- download a pre-built version of pyspark and unzip it somewhere on every 
> agent
> 2- define the SPARK_HOME env 
> 3- symlink this distribution pyspark dir inside the python install dir 
> site-packages/ directory
> and if I rely on additional packages (like databricks' Spark-CSV project), I 
> have to (except if I'm mistaken) 
> 4- compile/assembly spark-csv, deploy the jar in a specific directory on 
> every agent
> 5- add this jar-filled directory to the Spark distribution's additional 
> classpath using the conf/spark-default file 
> 
> Then finally we can launch our unit/integration-tests. 
> Some issues are related to spark-packages, some to the lack of python-based 
> dependency, and some to the way SparkContext are launched when using pyspark.
> I think step 1 and 2 are fair enough
> 4 and 5 may already have solutions, I didn't check and considering 
> spark-shell is downloading such dependencies automatically, I think if 
> nothing's done yet it will (I guess ?).
> 
> For step 3, maybe just adding a setup.py to the distribution would be enough, 
> I'm not exactly advocating to distribute a full 300Mb spark distribution in 
> PyPi, maybe there's a better compromise ?
> 
> Regards, 
> 
> Olivier.
> 
> Le ven. 5 juin 2015 à 22:12, Jey Kottalam  > a écrit :
> Couldn't we have a pip installable "pyspark" package that just serves as a 
> shim to an existing Spark inst

Policy around backporting bug fixes

2015-07-24 Thread Patrick Wendell
Hi All,

A few times I've been asked about backporting and when to backport and
not backport fix patches. Since I have managed this for many of the
past releases, I wanted to point out the way I have been thinking
about it. If we have some consensus I can put it on the wiki.

The trade off when backporting is you get to deliver the fix to people
running older versions (great!), but you risk introducing new or even
worse bugs in maintenance releases (bad!). The decision point is when
you have a bug fix and it's not clear whether it is worth backporting.

I think the following facets are important to consider:
(a) Backports are an extremely valuable service to the community and
should be considered for any bug fix.
(b) Introducing a new bug in a maintenance release must be avoided at
all costs. It over time would erode confidence in our release process.
(c) Distributions or advanced users can always backport risky patches
on their own, if they see fit.

For me, the consequence of these is that we should backport in the
following situations:
- Both the bug and the fix are well understood and isolated. Code
being modified is well tested.
- The bug being addressed is high priority to the community.
- The backported fix does not vary widely from the master branch fix.

We tend to avoid backports in the converse situations:
- The bug or fix are not well understood. For instance, it relates to
interactions between complex components or third party libraries (e.g.
Hadoop libraries). The code is not well tested outside of the
immediate bug being fixed.
- The bug is not clearly a high priority for the community.
- The backported fix is widely different from the master branch fix.

These are clearly subjective criteria, but ones worth considering. I
am always happy to help advise people on specific patches if they want
a soundingboard to understand whether it makes sense to backport.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Reynold Xin
Jenkins only run Scala 2.10. I'm actually not sure what the behavior is
with 2.11 for that patch.

iulian - can you take a look into it and see if it is working as expected?


On Fri, Jul 24, 2015 at 10:24 AM, Iulian Dragoș 
wrote:

> On Thu, Jul 23, 2015 at 6:08 AM, Reynold Xin  wrote:
>
> Hi all,
>>
>> FYI, we just merged a patch that fails a build if there is a scala
>> compiler warning (if it is not deprecation warning).
>>
> I’m a bit confused, since I see quite a lot of warnings in semi-legitimate
> code.
>
> For instance, @transient (plenty of instances like this in
> spark-streaming) might generate warnings like:
>
> abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ : 
> StreamingContext)
>   extends InputDStream[T](ssc_) {
>
> // and the warning is:
> no valid targets for annotation on value ssc_ - it is discarded unused. You 
> may specify targets with meta-annotations, e.g. @(transient @param)
>
> At least that’s what happens if I build with Scala 2.11, not sure if this
> setting is only for 2.10, or something really weird is happening on my
> machine that doesn’t happen on others.
>
> iulian
>
>
>> In the past, many compiler warnings are actually caused by legitimate
>> bugs that we need to address. However, if we don't fail the build with
>> warnings, people don't pay attention at all to the warnings (it is also
>> tough to pay attention since there are a lot of deprecated warnings due to
>> unit tests testing deprecated APIs and reliance on Hadoop on deprecated
>> APIs).
>>
>> Note that ideally we should be able to mark deprecation warnings as
>> errors as well. However, due to the lack of ability to suppress individual
>> warning messages in the Scala compiler, we cannot do that (since we do need
>> to access deprecated APIs in Hadoop).
>>
>>
>>  ​
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Iulian Dragoș
On Thu, Jul 23, 2015 at 6:08 AM, Reynold Xin  wrote:

Hi all,
>
> FYI, we just merged a patch that fails a build if there is a scala
> compiler warning (if it is not deprecation warning).
>
I’m a bit confused, since I see quite a lot of warnings in semi-legitimate
code.

For instance, @transient (plenty of instances like this in spark-streaming)
might generate warnings like:

abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ :
StreamingContext)
  extends InputDStream[T](ssc_) {

// and the warning is:
no valid targets for annotation on value ssc_ - it is discarded
unused. You may specify targets with meta-annotations, e.g.
@(transient @param)

At least that’s what happens if I build with Scala 2.11, not sure if this
setting is only for 2.10, or something really weird is happening on my
machine that doesn’t happen on others.

iulian


> In the past, many compiler warnings are actually caused by legitimate bugs
> that we need to address. However, if we don't fail the build with warnings,
> people don't pay attention at all to the warnings (it is also tough to pay
> attention since there are a lot of deprecated warnings due to unit tests
> testing deprecated APIs and reliance on Hadoop on deprecated APIs).
>
> Note that ideally we should be able to mark deprecation warnings as errors
> as well. However, due to the lack of ability to suppress individual warning
> messages in the Scala compiler, we cannot do that (since we do need to
> access deprecated APIs in Hadoop).
>
>
>  ​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Reynold Xin
You can give it a shot, but we will have to revert it for a project as soon
as a project uses a deprecated API somewhere.


On Fri, Jul 24, 2015 at 7:43 AM, Punyashloka Biswal 
wrote:

> Would it make sense to isolate the use of deprecated warnings to a subset
> of projects? That way we could turn on more stringent checks for the other
> ones.
>
> Punya
>
> On Thu, Jul 23, 2015 at 12:08 AM Reynold Xin  wrote:
>
>> Hi all,
>>
>> FYI, we just merged a patch that fails a build if there is a scala
>> compiler warning (if it is not deprecation warning).
>>
>> In the past, many compiler warnings are actually caused by legitimate
>> bugs that we need to address. However, if we don't fail the build with
>> warnings, people don't pay attention at all to the warnings (it is also
>> tough to pay attention since there are a lot of deprecated warnings due to
>> unit tests testing deprecated APIs and reliance on Hadoop on deprecated
>> APIs).
>>
>> Note that ideally we should be able to mark deprecation warnings as
>> errors as well. However, due to the lack of ability to suppress individual
>> warning messages in the Scala compiler, we cannot do that (since we do need
>> to access deprecated APIs in Hadoop).
>>
>>
>>


Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-24 Thread Patrick Wendell
Hey Bharath,

There was actually an incompatible change to the build process that
broke several of the Jenkins builds. This should be patched up in the
next day or two and nightly builds will resume.

- Patrick

On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
 wrote:
> I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
> builds been discontinued since then?
>
> Thanks,
> Bharath
>
> On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell  wrote:
>>
>> Hi All,
>>
>> This week I got around to setting up nightly builds for Spark on
>> Jenkins. I'd like feedback on these and if it's going well I can merge
>> the relevant automation scripts into Spark mainline and document it on
>> the website. Right now I'm doing:
>>
>> 1. SNAPSHOT's of Spark master and release branches published to ASF
>> Maven snapshot repo:
>>
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
>>
>> These are usable by adding this repository in your build and using a
>> snapshot version (e.g. 1.3.2-SNAPSHOT).
>>
>> 2. Nightly binary package builds and doc builds of master and release
>> versions.
>>
>> http://people.apache.org/~pwendell/spark-nightly/
>>
>> These build 4 times per day and are tagged based on commits.
>>
>> If anyone has feedback on these please let me know.
>>
>> Thanks!
>> - Patrick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Punyashloka Biswal
Would it make sense to isolate the use of deprecated warnings to a subset
of projects? That way we could turn on more stringent checks for the other
ones.

Punya

On Thu, Jul 23, 2015 at 12:08 AM Reynold Xin  wrote:

> Hi all,
>
> FYI, we just merged a patch that fails a build if there is a scala
> compiler warning (if it is not deprecation warning).
>
> In the past, many compiler warnings are actually caused by legitimate bugs
> that we need to address. However, if we don't fail the build with warnings,
> people don't pay attention at all to the warnings (it is also tough to pay
> attention since there are a lot of deprecated warnings due to unit tests
> testing deprecated APIs and reliance on Hadoop on deprecated APIs).
>
> Note that ideally we should be able to mark deprecation warnings as errors
> as well. However, due to the lack of ability to suppress individual warning
> messages in the Scala compiler, we cannot do that (since we do need to
> access deprecated APIs in Hadoop).
>
>
>


Re: review SPARK-8730

2015-07-24 Thread Eugen Cepoi
I just got those comments and it doesn't merge since few time, the code
evolved since I opened the pr

2015-07-24 14:12 GMT+02:00 Sean Owen :

> It looks like you have a number of review comments on the PR that you
> have not replied to. The PR does not merge at the moment either.
>
> On Fri, Jul 24, 2015 at 12:03 PM, Eugen Cepoi 
> wrote:
> > Hey,
> >
> > I've opened a PR to fix ser/de issue of primitives classes in the java
> > serializer.
> > I already encountered this problem in different scenarios, so I am
> bringing
> > it up.
> > Would be great if someone wants to have a look at it! :)
> >
> > https://issues.apache.org/jira/browse/SPARK-8730
> > https://github.com/apache/spark/pull/7122
> >
> >
> > Thanks,
> > Eugen
>


Re: review SPARK-8730

2015-07-24 Thread Sean Owen
It looks like you have a number of review comments on the PR that you
have not replied to. The PR does not merge at the moment either.

On Fri, Jul 24, 2015 at 12:03 PM, Eugen Cepoi  wrote:
> Hey,
>
> I've opened a PR to fix ser/de issue of primitives classes in the java
> serializer.
> I already encountered this problem in different scenarios, so I am bringing
> it up.
> Would be great if someone wants to have a look at it! :)
>
> https://issues.apache.org/jira/browse/SPARK-8730
> https://github.com/apache/spark/pull/7122
>
>
> Thanks,
> Eugen

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



review SPARK-8730

2015-07-24 Thread Eugen Cepoi
Hey,

I've opened a PR to fix ser/de issue of primitives classes in the java
serializer.
I already encountered this problem in different scenarios, so I am bringing
it up.
Would be great if someone wants to have a look at it! :)

https://issues.apache.org/jira/browse/SPARK-8730
https://github.com/apache/spark/pull/7122


Thanks,
Eugen


Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-24 Thread Bharath Ravi Kumar
I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
builds been discontinued since then?

Thanks,
Bharath

On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell  wrote:

> Hi All,
>
> This week I got around to setting up nightly builds for Spark on
> Jenkins. I'd like feedback on these and if it's going well I can merge
> the relevant automation scripts into Spark mainline and document it on
> the website. Right now I'm doing:
>
> 1. SNAPSHOT's of Spark master and release branches published to ASF
> Maven snapshot repo:
>
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
>
> These are usable by adding this repository in your build and using a
> snapshot version (e.g. 1.3.2-SNAPSHOT).
>
> 2. Nightly binary package builds and doc builds of master and release
> versions.
>
> http://people.apache.org/~pwendell/spark-nightly/
>
> These build 4 times per day and are tagged based on commits.
>
> If anyone has feedback on these please let me know.
>
> Thanks!
> - Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>