[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-10-18 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-149055980
  
@JoshRosen not a problem and completely understand on getting this merged 
this weekend. Let me know if you need any help from here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-10-17 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-148949648
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-10-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-148841311
  
Hey guys, sorry I've been away from this recently, been busy with some 
other things. I'll get the merge conflicts fixed and fix a few other things 
done here shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-10-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-148843226
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9057] [STREAMING] [WIP] Twitter example...

2015-08-31 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-136423918
  
It looks like 
[SPARK-9057](https://issues.apache.org/jira/browse/SPARK-9057) requires an 
example of `DStream.transform` done in all three primary Spark langs (java, 
scala, and python). This is a great start, but will need a Python and Scala 
example as well to be considered complete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9057] [STREAMING] [WIP] Twitter example...

2015-08-31 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/8431#issuecomment-136424197
  
Is there really a need to commit the `twitter_sentiment_list.txt` file? I 
would think we could grab it remotely on execution rather than commit it into 
the source code base.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9607] [SPARK-9608] fix zinc-port handli...

2015-08-04 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7944#issuecomment-127798476
  
Totally missed that, awesome! Just wanted to make sure! LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9607] [SPARK-9608] fix zinc-port handli...

2015-08-04 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7944#issuecomment-127793887
  
Do we need to set a default zinc port or are we assuming that either:

1. the zinc port defaults if you pass in `-port `
2. the `ZINC_PORT` variable will always be filled

I don't know the default behavior so just want to make sure. Was thinking 
we could set the default port like `ZINC_PORT=${ZINC_PORT:-default-port}`. 
Necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-28 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125667034
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125255797
  
All, was on vacation last week, sorry for no updates. @JoshRosen is there 
anything else we need to complete for this to merge in? I've reviewed the code 
and I can't see any other TODO's on my end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125257277
  
Scratch that last comment, the `tar` command to grab all the unit logs 
needs a flag added. Making a fix and pushing now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125267532
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125284019
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-125287302
  
@shaneknapp do you know why this 
[PRB](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/120/console)
 is failing? Is there something new going on with Jenkins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-122031700
  
Its worthwhile to note as well that this patch will consume and resolve 
[SPARK-6557](https://issues.apache.org/jira/browse/SPARK-6557) as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-15 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7401#issuecomment-121677522
  
@JoshRosen the only big thing you mentioned that I couldn't get was using 
`glob` over find. Per this 
[SO](http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python)
 and others it seems its better to run an `os.walk` which is what I did. Let me 
know what you think!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...

2015-07-14 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7374#issuecomment-121329813
  
Bump. Any issues with this guys? If so let me know and I'll get them in!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...

2015-07-14 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/7401

[WIP][SPARK-7018][Build]: Refactor dev/run-tests-jenkins into Python

First draft, and WIP, of the refactoring of the `run-tests-jenkins` script 
into Python. Currently a few things are left out that, could and I think 
should, be smaller JIRA's after this.

1. There are still a few areas where we use environment variables where we 
don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in 
lieu of everything else, but wanted to point that out.
2. The PR tests are still written in bash. I opted to not change those and 
just rewrite the runner into Python. This is a great follow-on JIRA IMO.
3. All of the linting scripts are still in bash as well and would likely do 
to just add those in as follow-on JIRA's as well.

Still a WIP now, but would love to get initial rounds of feedback as we 
iterate on this / test with Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-7018

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7401


commit 31b51dea681534fca28b762b6eca01b81229215c
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-07-13T22:28:53Z

initial cut of refactored run-tests-jenkins script into python

commit f2a1dc6eaf6c316809cdf08c5340a8a81de504b3
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-07-14T18:31:36Z

fixed pep8 issues




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...

2015-07-13 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/7374

[SPARK-8933][Build]: Provide a --force flag to build/mvn that always uses 
downloaded maven

added --force flag to manually download, if necessary, and use a built-in 
version of maven best for spark

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-8933

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7374


commit d673127e4a463ad97f5bd6e1668bdb5455f1013d
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-07-13T15:41:55Z

added --force flag to manually download, if necessary, and use a built-in 
version of maven best for spark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...

2015-07-13 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7374#issuecomment-120972882
  
/cc @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...

2015-07-13 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/7374#issuecomment-121060220
  
@pwendell yeah, I had inserted a few `echo forcing maven` statements into 
the `if` branch and afterwards (hence why I dump the current `mvn` path now 
too) and tested on my local machine. All worked out just fine, but feel free to 
give it a quick whirl! Caches the downloaded `mvn` as well so it never reaches 
back to the internet if it already downloaded a local copy (in the instance of 
running a `build/mvn --force ...` and then a second time).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...

2015-06-29 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/7085

[SPARK-8693][Project Infra]: profiles and goals are not printed in a nice 
way

Hotfix to correct formatting errors of print statements within the dev and 
jenkins builds. Error looks like:

```
-Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these 
arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using 
SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) 
using SBT with these arguments:  -Phive-thriftserver[info] Building Spark 
(w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark 
(w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark 
(w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] 
Building Spark (w/Hive 0.13.1) using SBT with these arguments:  
streaming-kafka-assembly/assembly
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-8693

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7085.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7085


commit c5575f1276032e878c7d7e680ccbf9eb527c2f68
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-06-29T13:26:45Z

added commas to end of print statements for proper printing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...

2015-06-18 Thread brennonyork
Github user brennonyork closed the pull request at:

https://github.com/apache/spark/pull/6865


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...

2015-06-18 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/6865#issuecomment-113054631
  
Closing in favor of #6866 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...

2015-06-17 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/6865

[SPARK-7017][HOTFIX][Project Infra]: Refactor dev/run-tests into Python

Fixed minor nits from the [previous 
PR](https://github.com/apache/spark/pull/5694) and removed unnecessary doc 
build code as docs will be built with 'jekyll' and not any calls through 'sbt' 
(i.e. the `get_build_profiles` function).

/cc @JoshRosen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-7017-HOTFIX-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6865


commit 79845b12c837ffa2b1d2d8a439ffd558624ff999
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-06-17T23:12:10Z

fixed minor nits from previous PR and removed unnecessary doc build code as 
docs will be built with 'jekyll' and not any calls through 'sbt'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-17 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112976517
  
Thanks for the PR merge @JoshRosen. I'll go ahead and make a hotfix branch 
to capture the last few nits you have above!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-17 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r32694170
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,536 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_HOME = os.path.join(os.path.dirname(os.path.realpath(__file__)), 
..)
+USER_HOME = os.environ.get(HOME)
+
+
+def get_error_codes(err_code_file):
+Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins`
+script
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=')
+ for e in f if e.startswith(readonly)]
+return dict(err_codes)
+
+
+ERROR_CODES = get_error_codes(os.path.join(SPARK_HOME, 
dev/run-tests-codes.sh))
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print [error] running, cmd, ; received return code, retcode
+sys.exit(int(os.environ.get(CURRENT_BLOCK, 255)))
+
+
+def rm_r(path):
+Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881;
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def run_cmd(cmd):
+Given a command as a list of arguments will attempt to execute the
+command from the determined SPARK_HOME directory and, on failure, print
+an error message
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def is_exe(path):
+Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028;
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+Find and return the given program by its absolute path or 'None'
+- from: http://stackoverflow.com/a/377028;
+
+fpath, fname = os.path.split(program)
+
+if fpath:
+if is_exe(program):
+return program
+else:
+for path in os.environ.get(PATH).split(os.pathsep):
+path = path.strip('')
+exe_file = os.path.join(path, program)
+if is_exe(exe_file):
+return exe_file
+return None
+
+
+def determine_java_executable():
+Will return the path of the java executable that will be used by 
Spark's
+tests or `None`
+
+# Any changes in the way that Spark's build detects java must be 
reflected
+# here. Currently the build looks for $JAVA_HOME/bin/java then falls 
back to
+# the `java` executable on the path
+
+java_home = os.environ.get(JAVA_HOME)
+
+# check if there is an executable at $JAVA_HOME/bin/java
+java_exe = which(os.path.join(java_home, bin, java)) if java_home 
else None
+# if the java_exe wasn't set, check for a `java` version on the $PATH
+return java_exe if java_exe else which(java)
+
+
+JavaVersion = namedtuple('JavaVersion', ['major', 'minor', 'patch', 
'update'])
+
+
+def determine_java_version(java_exe):
+Given a valid java executable will return its version in named 
tuple format
+with accessors '.major', '.minor', '.patch', '.update'
+
+raw_output = subprocess.check_output([java_exe, -version],
+ stderr=subprocess.STDOUT)
+raw_version_str = raw_output.split('\n')[0]  # eg 'java version 
1.8.0_25'
+version_str = raw_version_str.split()[-1].strip('')  # eg '1.8.0_25

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112482703
  
@JoshRosen just FYI i forgot to commit the code that would actually 
**build** the documentation yesterday (the `jekyll build` call) so retesting 
now, but if this passes (and builds docs) then I can revert the simple doc 
change and it should be ready!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112509328
  
Sounds like a deal. I've got a separate thread with @shaneknapp on this one 
(he said the same thing re: the `jekyll` tool only on 
`amplab-jenkins-worker-01`) so understand on the revert here. Let me get that 
in place...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112575903
  
@JoshRosen for the `JAVA_HOME` issue, are you asking if the code checks the 
regular `PATH` for a `java` executable after checking for `JAVA_HOME`? I 
believe what you're asking is already done 
[here](https://github.com/brennonyork/spark/blob/SPARK-7017/dev/run-tests.py#L112).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112578835
  
Roger. I see it now. Will have a fix up shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-16 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-112599782
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8316] Upgrade to Maven 3.3.3

2015-06-12 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/6770#issuecomment-111560478
  
We could also set a warning to print if a user already has `mvn` installed, 
but version  3.3. That seems the least intrusive to the dev community without 
mandating they install the latest version. Just my 2c.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110782037
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110810019
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110817307
  
Yeah you nailed it. I was about to push up a bug fix that should fix that, 
but I like your idea better of just updating the example. Turns out moving the 
`os.environ[PATH]` set to include `python3` in the `PATH` before *all* python 
checks was failing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110809581
  
@shaneknapp I'm pretty sure you're talking about the check 
[here](https://github.com/brennonyork/spark/blob/7d2f5e28beb3cc20fe39d1d61443fcdd69fe632b/dev/run-tests.py#L469)
 which will test for `AMPLAB_JENKINS` being set in the environment. Let me know 
if I'm wrong here!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110826006
  
I would say thats a great idea since its already set that Spark supports 
Python3. As long as devs know that all python scripts will run under `python3` 
by default it would simplify this (and likely other bash-python scripts 
coming).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-10 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-110934573
  
@pwendell @JoshRosen thoughts on the initial refactor? I've incorporated 
an, albeit minimal, additional set of test checks for MLlib, GraphX, and 
Streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-04 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-108956716
  
@pwendell thanks for the review! You're certainly correct in that I took a 
just get it into Python and working first approach. Was unsure whether we 
wanted more of what you had laid out above or something that got it into Python 
and then incrementally built upon, but glad that clarification is there now. 
Will move forward fixing the comments you and @JoshRosen supplied and get a new 
commit back soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-105971797
  
Bump on this thread.

/cc @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-21 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-104420452
  
Thanks @davies, anyone have any comments / concerns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-19 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r30630715
  
--- Diff: dev/run-tests ---
@@ -17,216 +17,11 @@
 # limitations under the License.
 #
 
-# Go to the Spark project root directory
 FWDIR=$(cd `dirname $0`/..; pwd)
 cd $FWDIR
 
-# Clean up work directory and caches
-rm -rf ./work
-rm -rf ~/.ivy2/local/org.apache.spark
-rm -rf ~/.ivy2/cache/org.apache.spark
-
-source $FWDIR/dev/run-tests-codes.sh
-
-CURRENT_BLOCK=$BLOCK_GENERAL
-
-function handle_error () {
-  echo [error] Got a return code of $? on line $1 of the run-tests 
script.
-  exit $CURRENT_BLOCK
-}
-
-
-# Build against the right version of Hadoop.
-{
-  if [ -n $AMPLAB_JENKINS_BUILD_PROFILE ]; then
-if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=1.0.4
-elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 
-Dhadoop.version=2.0.0-mr1-cdh4.1.1
-elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.2 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.2
-elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.3 ]; then
-  export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 
-Dhadoop.version=2.3.0
-fi
-  fi
-
-  if [ -z $SBT_MAVEN_PROFILES_ARGS ]; then
-export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 
-Dhadoop.version=2.3.0
-  fi
-}
-
-export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl
-
-# Determine Java path and version.
-{
-  if test -x $JAVA_HOME/bin/java; then
-  declare java_cmd=$JAVA_HOME/bin/java
-  else
-  declare java_cmd=java
-  fi
-
-  # We can't use sed -r -e due to OS X / BSD compatibility; hence, all the 
parentheses.
-  JAVA_VERSION=$(
-$java_cmd -version 21 \
-| grep -e ^java version --max-count=1 \
-| sed s/java version \\(.*\)\.\(.*\)\.\(.*\)\/\1\2/
-  )
-
-  if [ $JAVA_VERSION -lt 18 ]; then
-echo [warn] Java 8 tests will not run because JDK version is  1.8.
-  fi
-}
-
-# Only run Hive tests if there are SQL changes.
-# Partial solution for SPARK-1455.
-if [ -n $AMPLAB_JENKINS ]; then
-  git fetch origin master:master
-
-  sql_diffs=$(
-git diff --name-only master \
-| grep -e ^sql/ -e ^bin/spark-sql -e ^sbin/start-thriftserver.sh
-  )
-
-  non_sql_diffs=$(
-git diff --name-only master \
-| grep -v -e ^sql/ -e ^bin/spark-sql -e 
^sbin/start-thriftserver.sh
-  )
-
-  if [ -n $sql_diffs ]; then
-echo [info] Detected changes in SQL. Will run Hive test suite.
-_RUN_SQL_TESTS=true
-
-if [ -z $non_sql_diffs ]; then
-  echo [info] Detected no changes except in SQL. Will only run SQL 
tests.
-  _SQL_TESTS_ONLY=true
-fi
-  fi
-fi
-
-set -o pipefail
-trap 'handle_error $LINENO' ERR
-
-echo 
-echo 
=
-echo Running Apache RAT checks
-echo 
=
-
-CURRENT_BLOCK=$BLOCK_RAT
-
-./dev/check-license
-
-echo 
-echo 
=
-echo Running Scala style checks
-echo 
=
-
-CURRENT_BLOCK=$BLOCK_SCALA_STYLE
-
-./dev/lint-scala
-
-echo 
-echo 
=
-echo Running Python style checks
-echo 
=
-
-CURRENT_BLOCK=$BLOCK_PYTHON_STYLE
-
-./dev/lint-python
-
-echo 
-echo 
=
-echo Building Spark
-echo 
=
-
-CURRENT_BLOCK=$BLOCK_BUILD
-
-{
-  HIVE_BUILD_ARGS=$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-thriftserver
-  echo [info] Compile with Hive 0.13.1
-  [ -d lib_managed ]  rm -rf lib_managed
-  echo [info] Building Spark with these arguments: $HIVE_BUILD_ARGS
-
-  if [ ${AMPLAB_JENKINS_BUILD_TOOL} == maven ]; then
-build/mvn $HIVE_BUILD_ARGS clean package -DskipTests
-  else
-echo -e q\n \
-  | build/sbt $HIVE_BUILD_ARGS package assembly/assembly 
streaming-kafka-assembly/assembly \
-  | grep -v -e info.*Resolving -e warn.*Merging -e 
info.*Including
-  fi
-}
-
-echo 
-echo

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-19 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r30630897
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,418 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = \
+os.path.join(os.path.dirname(os.path.realpath(__file__)), ..)
+USER_HOME_DIR = os.environ.get(HOME)
+
+SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL)
+AMPLAB_JENKINS = os.environ.get(AMPLAB_JENKINS)
+
+SBT_OUTPUT_FILTER = re.compile(^.*[info].*Resolving + | + 
+   ^.*[warn].*Merging + | +
+   ^.*[info].*Including)
+
+
+def get_error_codes(err_code_file):
+Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith(readonly)]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print [error] running, cmd, ; received return code, retcode
--- End diff --

From @nchammas above:

 Python 2.6 is the oldest version of Python that Spark officially supports.
 
 We also added Python 3 support recently, so ideally this script should be 
able to run on 2.6+ and 3.3+, but I think it's fine to start with just 2.6+ 
since this is a developer script.

Given that maybe we should explicitly call out `python2` at the top?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-19 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r30630935
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,418 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = \
+os.path.join(os.path.dirname(os.path.realpath(__file__)), ..)
--- End diff --

Roger, will fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-19 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r30634117
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,418 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = \
+os.path.join(os.path.dirname(os.path.realpath(__file__)), ..)
+USER_HOME_DIR = os.environ.get(HOME)
+
+SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL)
--- End diff --

This was something @pwendell was looking for and was included with 
[SPARK-3355](https://issues.apache.org/jira/browse/SPARK-3355)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-19 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-103613947
  
@rxin @pwendell @srowen could I get a few more eyes on this? Getting tricky 
to keep fixing merge conflicts and backporting them into this script :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-05-14 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-102196732
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-13 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-101720684
  
@shaneknapp honestly I hadn't thought about it, but since this should be 
capable of running on developers' boxes I would assume we should keep it 
agnostic. Is there a specific version of Python that Spark dictates is needed 
just for builds? If so we should match to that I would say.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-13 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-101762800
  
@nchammas you're correct on SPARK-6908. I hadn't pulled in #5955 as it 
wasn't already closed / merged so wasn't sure if that was something the 
committers wanted or not. Assumed it was, but figured I'd wait to see. If 
there's consensus on that though I'll be happy to add it in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-12 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-101455226
  
Now that the `branch-1.4` was cut, could I get a few eyes on this one? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-05-11 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-101035868
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-05-05 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-99234851
  
@ankurdave when you get a chance can you review this? I know its been a 
while, but I finally had a chance to get back to this and rework it given your 
above comments. The only issue I had was with the Iterator[(VertexId, A)] and, 
instead, assumed a structure of RDD[(VertexId, A)] with, possibly, duplicate 
keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-05-05 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-99234568
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-01 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-98175544
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-01 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-98228719
  
@pwendell @srowen @nchammas @rxin can I get a final review on this? It's 
LGTM and I've tested all the major error cases we need this script to report 
on. Recognize this is a highly critical piece of code when it comes to the 
whole Spark ecosystem though so I'd rather make sure we get more eyes on it / 
nits taken care of now before we look to merge into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-05-01 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-98230750
  
@shaneknapp forgot to add you into this as well (sorry)! Esp. since you're 
dealing with `stdout` and `stderr` issues right now I want to make sure this 
doesn't add any excess bloat to that (shouldn't...).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: redir stderr better, remove unused code, bette...

2015-05-01 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5817#issuecomment-98239787
  
LGTM for whenever this can get reviewed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97883074
  
jenkins, for the last time, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7214] Reserve space for unrolling even ...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5784#issuecomment-97910902
  
thanks for that clarity @shaneknapp. Looks like I don't have access to see 
the Jenkins environment variables from the link you sent (unless it became 
stale before I clicked), but I'll look for the review note and provide what I 
can!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97852085
  
jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7214] Reserve space for unrolling even ...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5784#issuecomment-97845313
  
@shaneknapp well, to start, I couldn't agree more that something looks very 
fishy with `pr_public_classes`. That was a direct port from the previous code 
though which makes it even more interesting :/

To address the SHA1 hash its getting pulled from, as I'm sure you know, 
[this code 
here](https://github.com/jenkinsci/ghprb-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ghprb/GhprbTrigger.java#L170)
 which, I'll admit, is interesting in and of itself in that it could produce 
the SHA1 as an actual hash **or** what we see above (in the case that the patch 
can merge without conflict into master).

That said `pr_public_classes` only relies on the `ghprbActualCommit` and 
not the SHA1 so unless that were empty somehow I'm not immediately sure how 
this could be happening (of which it isn't empty according to Jenkins).

My only thought, and I'm hoping you could shed some light here, would be a 
possible race condition from some shared state on each Jenkins box such that 
the Bash calls (or environment variables) aren't atomic to the PR they're 
building. Thoughts on that? I'll continue to dig and see what I can find.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97845733
  
Oh, we also showed a failed build scenario too. Going to finish with 
PySpark tests and SparkR tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97844015
  
@nchammas just wanted to get out a rolling count of what we've tested thus 
far:

1. Failed tests
2. Failed MiMa excludes
3. Failed scala style checks
4. Failed Apache RAT checks

Will continue today to hopefully finish up the last bit! Let me know if I 
missed anything!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97514854
  
jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97571451
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97514720
  
@nchammas good catch, turns out there was a bug with what error code was 
being returned to properly have `dev/run-tests-jenkins` reporting the correct 
error message. Just pushed up a fix to hopefully handle this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97540839
  
jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97587252
  
:+1: jenkins, lets keep this going, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97590384
  
jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97580071
  
jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-29 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97600636
  
jenkins, looking for a break in mima! retest this please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-28 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97110985
  
@nchammas any other thoughts? I think we've got a pretty solid start wrt 
the refactor here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-28 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-97184852
  
jenkinbox, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-96827964
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r29155851
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+
+spark_proj_root = \
+os.path.join(os.path.dirname(os.path.realpath(__file__)), ..)
+user_home_dir = os.environ.get(HOME)
+
+sbt_maven_profile_args_env = SBT_MAVEN_PROFILES_ARGS
+amplab_jenkins_build_tool_env = AMPLAB_JENKINS_BUILD_TOOL
+amplab_jenkins_build_tool = os.environ.get(amplab_jenkins_build_tool_env)
+amplab_jenkins = os.environ.get(AMPLAB_JENKINS)
+
+resolving_re = ^.*[info].*Resolving
+merging_re = ^.*[warn].*Merging
+including_re = ^.*[info].*Including
+sbt_output_filter = re.compile(resolving_re + | + 
+   merging_re + | +
+   including_re)
+
+
+def get_error_codes(err_code_file):
+Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith(readonly)]
+return dict(err_codes)
+
+
+def rm_r(path):
+Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881;
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059;
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_output(cmd)
+except subprocess.CalledProcessError as e:
+print [error] running, e.cmd, ; received return code, 
e.returncode
+sys.exit(e.returncode)
+
+
+def set_sbt_maven_profile_args():
+Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine
+
+# base environment values for sbt_maven_profile_args_env which will be 
appended on
+sbt_maven_profile_args_base = [-Pkinesis-asl]
+
+sbt_maven_profile_arg_dict = {
+hadoop1.0 : [-Dhadoop.version=1.0.4],
+hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1],
+hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0],
+hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE):
+os.environ[sbt_maven_profile_args_env] = \
+ .join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[sbt_maven_profile_args_env] = \
+ .join(sbt_maven_profile_arg_dict.get(hadoop2.3, [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028;
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+Find and return the given program by its absolute path or 'None'
+- from: http://stackoverflow.com/a/377028;
+
+fpath, fname = os.path.split(program

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-96718084
  
@rxin Thanks and taken care of!
@nchammas First, thanks a ton for all the Python reviews (I know it can be 
tedious)! Second, to your point about removing the Bash-isms, you're completely 
right in that I left them in for **this PR** such that we can get incremental 
improvement to the codebase. Once I tackle SPARK-7018 (e.g. 
`dev/run-tests-jenkins`) I think I'll be able to slowly move some of this old 
Bash necessity out. Feedback on that being the right path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-96726126
  
Roger that. Let me look into using `pipes.quote` for the `sbt` output. Do 
we know what's up with Jenkins right now? I saw a thread a while back talking 
about a power outage at Berkeley, but thought I saw a message from Shane saying 
everything was back to normal. Is that not the case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r29163677
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,413 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+
+SPARK_PROJ_ROOT = \
+os.path.join(os.path.dirname(os.path.realpath(__file__)), ..)
+USER_HOME_DIR = os.environ.get(HOME)
+
+SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL)
+AMPLAB_JENKINS = os.environ.get(AMPLAB_JENKINS)
+
+SBT_OUTPUT_FILTER = re.compile(^.*[info].*Resolving + | + 
+   ^.*[warn].*Merging + | +
+   ^.*[info].*Including)
+
+
+def get_error_codes(err_code_file):
+Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith(readonly)]
+return dict(err_codes)
+
+
+def rm_r(path):
+Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881;
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059;
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+print [error] running, e.cmd, ; received return code, 
e.returncode
+sys.exit(e.returncode)
+
+
+def set_sbt_maven_profile_args():
+Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = [-Pkinesis-asl]
+
+sbt_maven_profile_arg_dict = {
+hadoop1.0 : [-Dhadoop.version=1.0.4],
+hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1],
+hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0],
+hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+ .join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+ .join(sbt_maven_profile_arg_dict.get(hadoop2.3, [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028;
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+Find and return the given program by its absolute path or 'None'
+- from: http://stackoverflow.com/a/377028;
+
+fpath, fname = os.path.split(program)
+
+if fpath:
+if is_exe(program):
+return program
+else:
+for path in os.environ.get(PATH).split

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-25 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r29105800
  
--- Diff: dev/run-tests ---
@@ -17,239 +17,394 @@
 # limitations under the License.
 #
 
-# Go to the Spark project root directory
-FWDIR=$(cd `dirname $0`/..; pwd)
-cd $FWDIR
+import os
+import re
+import shutil
+import subprocess as sp
+
+# Set the Spark project root directory
+spark_proj_root = os.path.abspath(..)
+# Set the user 'HOME' directory
+user_home_dir = os.environ.get(HOME)
+# Set the sbt maven profile arguments environment variable name
+sbt_maven_profile_args_env = SBT_MAVEN_PROFILES_ARGS
+# Set the amplab jenkins build tool environment variable name
+amplab_jenkins_build_tool_env = AMPLAB_JENKINS_BUILD_TOOL
+# Set the amplab jenkins build tool environment value
+amplab_jenkins_build_tool = os.environ.get(amplab_jenkins_build_tool_env)
+# Set whether we're on an Amplab Jenkins box by checking for a specific
+# environment variable
+amplab_jenkins = os.environ.get(AMPLAB_JENKINS)
+# Set the pattern for sbt output e.g. [info] Resolving ...
+resolving_re = ^.*[info].*Resolving
+# Set the pattern for sbt output e.g. [warn] Merging ...
+merging_re = ^.*[warn].*Merging
+# Set the pattern for sbt output e.g. [info] Including ...
+including_re = ^.*[info].*Including
+# Compile the various regex patterns into a filter
+sbt_output_filter = re.compile(resolving_re + | + 
+   merging_re + | +
+   including_re)
+
+def get_error_codes(err_code_file):
+Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith(readonly)]
+return dict(err_codes)
+
+def rm_r(path):
+Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881;
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+def lineno():
+Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059;
+
+return inspect.currentframe().f_back.f_lineno
+
+def set_sbt_maven_profile_args():
+Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine
+
+# base environment values for sbt_maven_profile_args_env which will be 
appended on
+sbt_maven_profile_args_base = [-Pkinesis-asl]
+
+sbt_maven_profile_arg_dict = {
+hadoop1.0 : [-Dhadoop.version=1.0.4],
+hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1],
+hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0],
+hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE):
+os.environ[sbt_maven_profile_args_env] = \
+ .join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[sbt_maven_profile_args_env] = \
+ .join(sbt_maven_profile_arg_dict.get(hadoop2.3, [])
+ + sbt_maven_profile_args_base)
+
+def is_exe(path):
+Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028;
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+def which(program):
+Find and return the given program by its absolute path or 'None'
+- from: http://stackoverflow.com/a/377028;
+
+fpath, fname = os.path.split(program)
+
+if fpath:
+if is_exe(program):
+return program
+else:
+for path in os.environ.get(PATH).split(os.pathsep):
+path = path.strip('')
+exe_file = os.path.join(path, program)
+if is_exe(exe_file):
+return exe_file
+return None
+
+def determine_java_executable():
+Will return the *best* path possible for a 'java' executable or 
`None`
+
+java_home = os.environ.get(JAVA_HOME)
+
+# check if there is an executable at $JAVA_HOME/bin/java
+java_exe = which(os.path.join(java_home, bin/java))
+# if the java_exe

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-24 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/5694

[SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python

All, this is a first attempt at refactoring `dev/run-tests` into Python. 
Initially I merely converted all Bash calls over to Python, then moved to a 
much more modular approach (more functions, moved the calls around, etc.). What 
is here is the initial culmination and should provide a great base to various 
downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). 
Would love comments / suggestions for this initial first step!

/cc @srowen @pwendell @nchammas

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-7017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5694.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5694


commit 6126c4f4d97db16b0ed6a95c60fae1fff44e2afe
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-04-24T21:27:54Z

refactored run-tests into python




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][HOTFIX][SPARK-4123]: Fix bug in PR depen...

2015-04-13 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5443#issuecomment-92518053
  
@shaneknapp can you help me understand how Jenkins is doing the checkouts? 
I'm seeing the PR builder outputting:

```
Building remotely on amp-jenkins-worker-06 (centos) in workspace 
/home/jenkins/workspace/SparkPullRequestBuilder
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.url https://github.com/apache/spark.git # 
timeout=10
Fetching upstream changes from https://github.com/apache/spark.git
  git --version # timeout=10
  git fetch --tags --progress https://github.com/apache/spark.git 
+refs/pull/5443/*:refs/remotes/origin/pr/5443/* # timeout=15
  git rev-parse origin/pr/5443/merge^{commit} # timeout=10
  git branch -a --contains c5916336e6aff94dd3abfc9a0a41a2528c765fce # 
timeout=10
  git rev-parse remotes/origin/pr/5443/merge^{commit} # timeout=10
Checking out Revision c5916336e6aff94dd3abfc9a0a41a2528c765fce 
(origin/pr/5443/merge)
  git config core.sparsecheckout # timeout=10
  git checkout -f c5916336e6aff94dd3abfc9a0a41a2528c765fce
```

although I'm a bit confused what checkout I should switch between if, say, 
I want to, from a PR, checkout the `master` branch, then switch back to the 
given PR branch, then possibly back to `master`, and finally back to the PR 
again.

I'm currently doing what I believe is correct 
[here](https://github.com/apache/spark/blob/master/dev/tests/pr_new_dependencies.sh#L42)
 although there are times when the checkout from `master` back to the current 
PR fails, producing odd dependency reports. I've noticed that Jenkins is using 
the `-f` flag which I've added, but wanted to see if you had any thoughts into 
the matter.

Further, I've added `echo` statements to dump the `ghprbActualCommit`, the 
`sha1`, and the output of `git rev-parse HEAD`. Each are different commit 
hashes which makes me further think this is the cause for all the errors. 
Again, any advice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-04-08 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5096#issuecomment-90827465
  
We can certainly set the timeout to be something larger. Let me take a look 
at the previous builds and see if I can find a good timeout number and if there 
might be anything else we can do. 
@pwendell any other ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-04-08 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5096#issuecomment-91028275
  
@shivaram a few things after looking at the build code some more...

1. The timeout value comes from the line [here in 
`dev/run-tests-jenkins`](https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L50).
 Its currently set at 120 minutes and **doesn't** include the time it takes for 
PR's to be tested against the master branch (i.e. for dependencies). We could 
certainly up that value, but I'd ask that since, I'm assuming, the 
`dev/run-tests` script on this PR runs all the new SparkR tests (plus any 
additional for core Spark you've added), that you run `dev/run-tests` locally 
and, for whatever additional time is needed, update the timeout in 
`dev/run-tests-jenkins` for this PR. The impetus for running locally first is 
that I'd much rather get a baseline for what it takes for all the new tests to 
run and then add 15ish minutes for fluff rather than throw a number into the 
wind.
2. Completely agree we should get some timing metrics for the various PR 
tests (thanks for the idea!). I'll generate a JIRA for that and take a look 
soon. That said, just to reiterate, those tests **are not** holding up the 
actual Spark test suite from finishing unless Jenkins has some deeper timing 
hooks than I know about. I assume though that it's merely a factor of the large 
corpus tests that were likely added in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][SPARK-4123]: Updated to fix bug where...

2015-03-30 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/5269

[HOTFIX][SPARK-4123]: Updated to fix bug where multiple dependencies added 
breaks Github output

Currently there is a bug whereby if a new patch introduces more than one 
new dependency (or removes more than one) it breaks the Github post output (see 
[this 
build](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull)).
 This hotfix will remove `awk` print statements in place of `printf` so as not 
to automatically add the newline character. It is then escaped and added 
directly at the end of the `awk` statement. This should take a failed build 
output such as:

```json
api_response: {
  message: Problems parsing JSON,
  documentation_url: https://developer.github.com/v3;
}
  data: {body:   [Test build #29400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull)
 for   PR 5266 at commit 
[`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a).\n
 * This patch **passes all tests**.\n * This patch merges cleanly.\n * This 
patch adds the following public classes _(experimental)_:\n  * `class IDF 
extends Estimator[IDFModel] with IDFParams `\n  * `class Normalizer extends 
UnaryTransformer[Vector, Vector, Normalizer] `\n\n * This patch **adds the 
following new dependencies:**\n   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`\n * This patch **removes the following 
dependencies:**\n   * `avro-1.7.6.jar`
   * `breeze-macros_2.10-0.11.1.jar`
   * `breeze_2.10-0.11.1.jar`}
```

and turn it into:

```json
api_response: {
  message: Problems parsing JSON,
  documentation_url: https://developer.github.com/v3;
}
  data: {body:   [Test build #29400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull)
 for   PR 5266 at commit 
[`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a).\n
 * This patch **passes all tests**.\n * This patch merges cleanly.\n * This 
patch adds the following public classes _(experimental)_:\n  * `class IDF 
extends Estimator[IDFModel] with IDFParams `\n  * `class Normalizer extends 
UnaryTransformer[Vector, Vector, Normalizer] `\n\n * This patch **adds the 
following new dependencies:**\n   * `avro-1.7.7.jar`\n   * 
`breeze-macros_2.10-0.11.2.jar`\n   * `breeze_2.10-0.11.2.jar`\n * This patch 
**removes the following dependencies:**\n   * `avro-1.7.6.jar`\n   * 
`breeze-macros_2.10-0.11.1.jar`\n   * `breeze_2.10-0.11.1.jar`}
```

I've tested this locally and all worked.

/cc @srowen @pwendell @nchammas

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark HOTFIX-SPARK-4123

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5269


commit a4410680b9cc1ed4616127f320f51198783c250c
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-30T15:37:01Z

Updated awk to use printf and to manually insert newlines so that the JSON 
github string when posted is corrected




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX][SPARK-4123]: Updated to fix bug where...

2015-03-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5269#issuecomment-87766986
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-03-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-87827641
  
To test #5269 I'm going to rerun these Jenkins tests as this is a prime 
example of that bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer

2015-03-30 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5266#issuecomment-87827693
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4123][Project Infra]: Show new dependen...

2015-03-27 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-87025972
  
Thanks for the update guys. Per the consensus I moved the tests have 
started message to before the PR tests run. Also, @srowen, updated all items 
per your comments. Any additional thoughts all? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-26 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-86747670
  
/cc @pwendell @srowen @nchammas 

All complete. You can check out build 29250 up a few to get what the output 
would be like if a new dependency were added. One issue which I'd love to get 
some opinion on...

Right now the initial post message to Github (i.e. the Test build started 
+ patch merges cleanly) will take up to 20 minutes to post **if** any 
`pom.xml` files were changed because it will then run and build both the 
current PR plus the master branch. This is purely because the This patch 
merges cleanly output is from a `pr_test` and runs in the core test loop. The 
easiest option to reflect the original way things have been posted would be to 
move the `pr_merge_ability` test out of the main test loop and have it execute 
independently. The other option would be to merely post the Test build started 
at ... message and remove the merges cleanly portion to be posted with in 
the post-test message.

I'll admit I'm more in favor of the latter option as I think it keeps 
things clean as well as the fact that merges cleanly is slightly ambiguous 
given that Github reports on this as well. Thoughts? Whichever way we go I can 
get that final change up and then I'd say this is ready for review into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-25 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-86178368
  
@ankurdave, thanks for the clarification. Let me take a second stab at this 
given what you stated and I should have something much more in line with the 
original thought!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-25 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5093#discussion_r27158731
  
--- Diff: dev/tests/pr_new_dependencies.sh ---
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#
+# This script follows the base format for testing pull requests against
+# another branch and returning results to be published. More details can be
+# found at dev/run-tests-jenkins.
+#
+# Arg1: The Github Pull Request Actual Commit
+#+ known as `ghprbActualCommit` in `run-tests-jenkins`
+# Arg2: The SHA1 hash
+#+ known as `sha1` in `run-tests-jenkins`
+#
+
+ghprbActualCommit=$1
+sha1=$2
+
+MVN_BIN=`pwd`/build/mvn
+CURR_CP_FILE=my-classpath.txt
+MASTER_CP_FILE=master-classpath.txt
+
+${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \
--- End diff --

Sounds like a plan. Once I get this working in a state I like I'll set a 
gate to check all pom.xml files for changes and, if any show changes, will go 
ahead and execute the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...

2015-03-24 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5175#issuecomment-85739639
  
/cc @maropu @ankurdave @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...

2015-03-24 Thread brennonyork
GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/5175

[SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference

Adds a `Graph#minus` method which will return only unique `VertexId`'s from 
the calling `VertexRDD`. 

For example:

```
Set((0L,0),(1L,1)).minus(Set((1L,1),(2L,2)))
 Set((0L,0))
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-6510

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5175


commit 7227c0ffd8a2ea93a3dcb28440c912921ff14380
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-24T22:59:28Z

beginning work on minus functionality

commit aaa030b3ff04738f5ffd38b6fec3f92043359b3a
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-24T23:14:54Z

completed graph#minus functionality

commit 6575d927cd36076db7797a12d45d6bb98f1bf43e
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-24T23:16:09Z

updated mima exclude




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...

2015-03-24 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5175#issuecomment-85788983
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-24 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85789231
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-24 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5142#discussion_r27046758
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala ---
@@ -154,7 +154,30 @@ abstract class VertexRDD[VD](
* @return a VertexRDD containing the results of `f`
*/
   def leftZipJoin[VD2: ClassTag, VD3: ClassTag]
-  (other: VertexRDD[VD2])(f: (VertexId, VD, Option[VD2]) = VD3): 
VertexRDD[VD3]
+  (other: VertexRDD[VD2])
+  (f: (VertexId, VD, Option[VD2]) = VD3)
+: VertexRDD[VD3]
+
+  /**
+   * Left joins this RDD with another VertexRDD with the same index. This 
function will fail if
+   * both VertexRDDs do not share the same index. The resulting vertex set 
contains an entry for
--- End diff --

Very true. Rereading the docs it looks like they haven't been updated with 
verbage of a few of the more recent bug fixes. Will add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-24 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5142#discussion_r27045011
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala 
---
@@ -136,6 +136,31 @@ private[graphx] abstract class VertexPartitionBaseOps
 leftJoin(createUsingIndex(other))(f)
   }
 
+  def leftJoinWithFold[VD2: ClassTag, VD3: ClassTag, A]
+  (other: Self[VD2], acc: A)
--- End diff --

I'm pretty sure @ankurdave was looking at the `*WithFold` style addition 
and not changing the original method names to keep backwards compatibility as 
best as possible. I agree its a bit confusing, but I figure maintaining 
backwards compatibility as best as best is the better option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-24 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5142#discussion_r27045035
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala ---
@@ -154,7 +154,30 @@ abstract class VertexRDD[VD](
* @return a VertexRDD containing the results of `f`
*/
   def leftZipJoin[VD2: ClassTag, VD3: ClassTag]
-  (other: VertexRDD[VD2])(f: (VertexId, VD, Option[VD2]) = VD3): 
VertexRDD[VD3]
+  (other: VertexRDD[VD2])
+  (f: (VertexId, VD, Option[VD2]) = VD3)
+: VertexRDD[VD3]
+
+  /**
+   * Left joins this RDD with another VertexRDD with the same index. This 
function will fail if
+   * both VertexRDDs do not share the same index. The resulting vertex set 
contains an entry for
+   * each vertex in `this`.
+   * If `other` is missing any vertex in this VertexRDD, `f` is passed 
`None`.
+   *
+   * @tparam VD2 the attribute type of the other VertexRDD
+   * @tparam VD3 the attribute type of the resulting VertexRDD
+   * @tparam A the type of the given starting value and accumulator
+   *
+   * @param other the other VertexRDD with which to join.
+   * @param acc the initial value for the accumulator
+   * @param f the function mapping a vertex id and its attributes in this 
and the other vertex set
+   * to a new vertex attribute.
+   * @return a VertexRDD containing the results of `f`
+   */
+  def leftZipJoinWithFold[VD2: ClassTag, VD3: ClassTag, A]
+  (other: VertexRDD[VD2], acc: A)
+  (f: (A, VertexId, VD, Option[VD2]) = VD3)
--- End diff --

Very good point. Will update that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread brennonyork
Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5093#discussion_r26983769
  
--- Diff: dev/tests/pr_new_dependencies.sh ---
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#
+# This script follows the base format for testing pull requests against
+# another branch and returning results to be published. More details can be
+# found at dev/run-tests-jenkins.
+#
+# Arg1: The Github Pull Request Actual Commit
+#+ known as `ghprbActualCommit` in `run-tests-jenkins`
+# Arg2: The SHA1 hash
+#+ known as `sha1` in `run-tests-jenkins`
+#
+
+ghprbActualCommit=$1
+sha1=$2
+
+MVN_BIN=`pwd`/build/mvn
+CURR_CP_FILE=my-classpath.txt
+MASTER_CP_FILE=master-classpath.txt
+
+${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \
--- End diff --

Yeah, its required :/ I've tested without it and it fails at building 
`spark-networking`. This adds on, for each run (of which there are two) around 
4.5 mins, so 9mins added to the build time. I also looked at seeing what `sbt` 
could output, but couldn't find anything. Further thought about this as a 
special case test and to grab the output from the generic build of spark that 
happens for each PR, but with having to build against the `master` branch as 
well that didn't seem like a much better option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >