[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-149055980 @JoshRosen not a problem and completely understand on getting this merged this weekend. Let me know if you need any help from here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-148949648 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-148841311 Hey guys, sorry I've been away from this recently, been busy with some other things. I'll get the merge conflicts fixed and fix a few other things done here shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-148843226 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9057] [STREAMING] [WIP] Twitter example...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-136423918 It looks like [SPARK-9057](https://issues.apache.org/jira/browse/SPARK-9057) requires an example of `DStream.transform` done in all three primary Spark langs (java, scala, and python). This is a great start, but will need a Python and Scala example as well to be considered complete. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9057] [STREAMING] [WIP] Twitter example...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/8431#issuecomment-136424197 Is there really a need to commit the `twitter_sentiment_list.txt` file? I would think we could grab it remotely on execution rather than commit it into the source code base. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9607] [SPARK-9608] fix zinc-port handli...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7944#issuecomment-127798476 Totally missed that, awesome! Just wanted to make sure! LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9607] [SPARK-9608] fix zinc-port handli...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7944#issuecomment-127793887 Do we need to set a default zinc port or are we assuming that either: 1. the zinc port defaults if you pass in `-port ` 2. the `ZINC_PORT` variable will always be filled I don't know the default behavior so just want to make sure. Was thinking we could set the default port like `ZINC_PORT=${ZINC_PORT:-default-port}`. Necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125667034 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125255797 All, was on vacation last week, sorry for no updates. @JoshRosen is there anything else we need to complete for this to merge in? I've reviewed the code and I can't see any other TODO's on my end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125257277 Scratch that last comment, the `tar` command to grab all the unit logs needs a flag added. Making a fix and pushing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125267532 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125284019 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-125287302 @shaneknapp do you know why this [PRB](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/120/console) is failing? Is there something new going on with Jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-122031700 Its worthwhile to note as well that this patch will consume and resolve [SPARK-6557](https://issues.apache.org/jira/browse/SPARK-6557) as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7401#issuecomment-121677522 @JoshRosen the only big thing you mentioned that I couldn't get was using `glob` over find. Per this [SO](http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python) and others it seems its better to run an `os.walk` which is what I did. Let me know what you think! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7374#issuecomment-121329813 Bump. Any issues with this guys? If so let me know and I'll get them in! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-7018][Build]: Refactor dev/run-tes...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/7401 [WIP][SPARK-7018][Build]: Refactor dev/run-tests-jenkins into Python First draft, and WIP, of the refactoring of the `run-tests-jenkins` script into Python. Currently a few things are left out that, could and I think should, be smaller JIRA's after this. 1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out. 2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO. 3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well. Still a WIP now, but would love to get initial rounds of feedback as we iterate on this / test with Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-7018 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7401.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7401 commit 31b51dea681534fca28b762b6eca01b81229215c Author: Brennon York brennon.y...@capitalone.com Date: 2015-07-13T22:28:53Z initial cut of refactored run-tests-jenkins script into python commit f2a1dc6eaf6c316809cdf08c5340a8a81de504b3 Author: Brennon York brennon.y...@capitalone.com Date: 2015-07-14T18:31:36Z fixed pep8 issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/7374 [SPARK-8933][Build]: Provide a --force flag to build/mvn that always uses downloaded maven added --force flag to manually download, if necessary, and use a built-in version of maven best for spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-8933 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7374.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7374 commit d673127e4a463ad97f5bd6e1668bdb5455f1013d Author: Brennon York brennon.y...@capitalone.com Date: 2015-07-13T15:41:55Z added --force flag to manually download, if necessary, and use a built-in version of maven best for spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7374#issuecomment-120972882 /cc @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8933][Build]: Provide a --force flag to...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/7374#issuecomment-121060220 @pwendell yeah, I had inserted a few `echo forcing maven` statements into the `if` branch and afterwards (hence why I dump the current `mvn` path now too) and tested on my local machine. All worked out just fine, but feel free to give it a quick whirl! Caches the downloaded `mvn` as well so it never reaches back to the internet if it already downloaded a local copy (in the instance of running a `build/mvn --force ...` and then a second time). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/7085 [SPARK-8693][Project Infra]: profiles and goals are not printed in a nice way Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like: ``` -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: streaming-kafka-assembly/assembly ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-8693 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7085.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7085 commit c5575f1276032e878c7d7e680ccbf9eb527c2f68 Author: Brennon York brennon.y...@capitalone.com Date: 2015-06-29T13:26:45Z added commas to end of print statements for proper printing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...
Github user brennonyork closed the pull request at: https://github.com/apache/spark/pull/6865 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/6865#issuecomment-113054631 Closing in favor of #6866 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][HOTFIX][Project Infra]: Refactor ...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/6865 [SPARK-7017][HOTFIX][Project Infra]: Refactor dev/run-tests into Python Fixed minor nits from the [previous PR](https://github.com/apache/spark/pull/5694) and removed unnecessary doc build code as docs will be built with 'jekyll' and not any calls through 'sbt' (i.e. the `get_build_profiles` function). /cc @JoshRosen You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-7017-HOTFIX-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6865 commit 79845b12c837ffa2b1d2d8a439ffd558624ff999 Author: Brennon York brennon.y...@capitalone.com Date: 2015-06-17T23:12:10Z fixed minor nits from previous PR and removed unnecessary doc build code as docs will be built with 'jekyll' and not any calls through 'sbt' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112976517 Thanks for the PR merge @JoshRosen. I'll go ahead and make a hotfix branch to capture the last few nits you have above! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r32694170 --- Diff: dev/run-tests.py --- @@ -0,0 +1,536 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_HOME = os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) +USER_HOME = os.environ.get(HOME) + + +def get_error_codes(err_code_file): +Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith(readonly)] +return dict(err_codes) + + +ERROR_CODES = get_error_codes(os.path.join(SPARK_HOME, dev/run-tests-codes.sh)) + + +def exit_from_command_with_retcode(cmd, retcode): +print [error] running, cmd, ; received return code, retcode +sys.exit(int(os.environ.get(CURRENT_BLOCK, 255))) + + +def rm_r(path): +Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def run_cmd(cmd): +Given a command as a list of arguments will attempt to execute the +command from the determined SPARK_HOME directory and, on failure, print +an error message + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def is_exe(path): +Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +Find and return the given program by its absolute path or 'None' +- from: http://stackoverflow.com/a/377028; + +fpath, fname = os.path.split(program) + +if fpath: +if is_exe(program): +return program +else: +for path in os.environ.get(PATH).split(os.pathsep): +path = path.strip('') +exe_file = os.path.join(path, program) +if is_exe(exe_file): +return exe_file +return None + + +def determine_java_executable(): +Will return the path of the java executable that will be used by Spark's +tests or `None` + +# Any changes in the way that Spark's build detects java must be reflected +# here. Currently the build looks for $JAVA_HOME/bin/java then falls back to +# the `java` executable on the path + +java_home = os.environ.get(JAVA_HOME) + +# check if there is an executable at $JAVA_HOME/bin/java +java_exe = which(os.path.join(java_home, bin, java)) if java_home else None +# if the java_exe wasn't set, check for a `java` version on the $PATH +return java_exe if java_exe else which(java) + + +JavaVersion = namedtuple('JavaVersion', ['major', 'minor', 'patch', 'update']) + + +def determine_java_version(java_exe): +Given a valid java executable will return its version in named tuple format +with accessors '.major', '.minor', '.patch', '.update' + +raw_output = subprocess.check_output([java_exe, -version], + stderr=subprocess.STDOUT) +raw_version_str = raw_output.split('\n')[0] # eg 'java version 1.8.0_25' +version_str = raw_version_str.split()[-1].strip('') # eg '1.8.0_25
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112482703 @JoshRosen just FYI i forgot to commit the code that would actually **build** the documentation yesterday (the `jekyll build` call) so retesting now, but if this passes (and builds docs) then I can revert the simple doc change and it should be ready! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112509328 Sounds like a deal. I've got a separate thread with @shaneknapp on this one (he said the same thing re: the `jekyll` tool only on `amplab-jenkins-worker-01`) so understand on the revert here. Let me get that in place... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112575903 @JoshRosen for the `JAVA_HOME` issue, are you asking if the code checks the regular `PATH` for a `java` executable after checking for `JAVA_HOME`? I believe what you're asking is already done [here](https://github.com/brennonyork/spark/blob/SPARK-7017/dev/run-tests.py#L112). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112578835 Roger. I see it now. Will have a fix up shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-112599782 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8316] Upgrade to Maven 3.3.3
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/6770#issuecomment-111560478 We could also set a warning to print if a user already has `mvn` installed, but version 3.3. That seems the least intrusive to the dev community without mandating they install the latest version. Just my 2c. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110782037 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110810019 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110817307 Yeah you nailed it. I was about to push up a bug fix that should fix that, but I like your idea better of just updating the example. Turns out moving the `os.environ[PATH]` set to include `python3` in the `PATH` before *all* python checks was failing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110809581 @shaneknapp I'm pretty sure you're talking about the check [here](https://github.com/brennonyork/spark/blob/7d2f5e28beb3cc20fe39d1d61443fcdd69fe632b/dev/run-tests.py#L469) which will test for `AMPLAB_JENKINS` being set in the environment. Let me know if I'm wrong here! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110826006 I would say thats a great idea since its already set that Spark supports Python3. As long as devs know that all python scripts will run under `python3` by default it would simplify this (and likely other bash-python scripts coming). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-110934573 @pwendell @JoshRosen thoughts on the initial refactor? I've incorporated an, albeit minimal, additional set of test checks for MLlib, GraphX, and Streaming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-108956716 @pwendell thanks for the review! You're certainly correct in that I took a just get it into Python and working first approach. Was unsure whether we wanted more of what you had laid out above or something that got it into Python and then incrementally built upon, but glad that clarification is there now. Will move forward fixing the comments you and @JoshRosen supplied and get a new commit back soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-105971797 Bump on this thread. /cc @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-104420452 Thanks @davies, anyone have any comments / concerns? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r30630715 --- Diff: dev/run-tests --- @@ -17,216 +17,11 @@ # limitations under the License. # -# Go to the Spark project root directory FWDIR=$(cd `dirname $0`/..; pwd) cd $FWDIR -# Clean up work directory and caches -rm -rf ./work -rm -rf ~/.ivy2/local/org.apache.spark -rm -rf ~/.ivy2/cache/org.apache.spark - -source $FWDIR/dev/run-tests-codes.sh - -CURRENT_BLOCK=$BLOCK_GENERAL - -function handle_error () { - echo [error] Got a return code of $? on line $1 of the run-tests script. - exit $CURRENT_BLOCK -} - - -# Build against the right version of Hadoop. -{ - if [ -n $AMPLAB_JENKINS_BUILD_PROFILE ]; then -if [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop1.0 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=1.0.4 -elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.0 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1 -elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.2 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.2 -elif [ $AMPLAB_JENKINS_BUILD_PROFILE = hadoop2.3 ]; then - export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -fi - fi - - if [ -z $SBT_MAVEN_PROFILES_ARGS ]; then -export SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 - fi -} - -export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl - -# Determine Java path and version. -{ - if test -x $JAVA_HOME/bin/java; then - declare java_cmd=$JAVA_HOME/bin/java - else - declare java_cmd=java - fi - - # We can't use sed -r -e due to OS X / BSD compatibility; hence, all the parentheses. - JAVA_VERSION=$( -$java_cmd -version 21 \ -| grep -e ^java version --max-count=1 \ -| sed s/java version \\(.*\)\.\(.*\)\.\(.*\)\/\1\2/ - ) - - if [ $JAVA_VERSION -lt 18 ]; then -echo [warn] Java 8 tests will not run because JDK version is 1.8. - fi -} - -# Only run Hive tests if there are SQL changes. -# Partial solution for SPARK-1455. -if [ -n $AMPLAB_JENKINS ]; then - git fetch origin master:master - - sql_diffs=$( -git diff --name-only master \ -| grep -e ^sql/ -e ^bin/spark-sql -e ^sbin/start-thriftserver.sh - ) - - non_sql_diffs=$( -git diff --name-only master \ -| grep -v -e ^sql/ -e ^bin/spark-sql -e ^sbin/start-thriftserver.sh - ) - - if [ -n $sql_diffs ]; then -echo [info] Detected changes in SQL. Will run Hive test suite. -_RUN_SQL_TESTS=true - -if [ -z $non_sql_diffs ]; then - echo [info] Detected no changes except in SQL. Will only run SQL tests. - _SQL_TESTS_ONLY=true -fi - fi -fi - -set -o pipefail -trap 'handle_error $LINENO' ERR - -echo -echo = -echo Running Apache RAT checks -echo = - -CURRENT_BLOCK=$BLOCK_RAT - -./dev/check-license - -echo -echo = -echo Running Scala style checks -echo = - -CURRENT_BLOCK=$BLOCK_SCALA_STYLE - -./dev/lint-scala - -echo -echo = -echo Running Python style checks -echo = - -CURRENT_BLOCK=$BLOCK_PYTHON_STYLE - -./dev/lint-python - -echo -echo = -echo Building Spark -echo = - -CURRENT_BLOCK=$BLOCK_BUILD - -{ - HIVE_BUILD_ARGS=$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-thriftserver - echo [info] Compile with Hive 0.13.1 - [ -d lib_managed ] rm -rf lib_managed - echo [info] Building Spark with these arguments: $HIVE_BUILD_ARGS - - if [ ${AMPLAB_JENKINS_BUILD_TOOL} == maven ]; then -build/mvn $HIVE_BUILD_ARGS clean package -DskipTests - else -echo -e q\n \ - | build/sbt $HIVE_BUILD_ARGS package assembly/assembly streaming-kafka-assembly/assembly \ - | grep -v -e info.*Resolving -e warn.*Merging -e info.*Including - fi -} - -echo -echo
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r30630897 --- Diff: dev/run-tests.py --- @@ -0,0 +1,418 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = \ +os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) +USER_HOME_DIR = os.environ.get(HOME) + +SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL) +AMPLAB_JENKINS = os.environ.get(AMPLAB_JENKINS) + +SBT_OUTPUT_FILTER = re.compile(^.*[info].*Resolving + | + + ^.*[warn].*Merging + | + + ^.*[info].*Including) + + +def get_error_codes(err_code_file): +Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith(readonly)] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print [error] running, cmd, ; received return code, retcode --- End diff -- From @nchammas above: Python 2.6 is the oldest version of Python that Spark officially supports. We also added Python 3 support recently, so ideally this script should be able to run on 2.6+ and 3.3+, but I think it's fine to start with just 2.6+ since this is a developer script. Given that maybe we should explicitly call out `python2` at the top? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r30630935 --- Diff: dev/run-tests.py --- @@ -0,0 +1,418 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = \ +os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) --- End diff -- Roger, will fix! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r30634117 --- Diff: dev/run-tests.py --- @@ -0,0 +1,418 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = \ +os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) +USER_HOME_DIR = os.environ.get(HOME) + +SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL) --- End diff -- This was something @pwendell was looking for and was included with [SPARK-3355](https://issues.apache.org/jira/browse/SPARK-3355) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-103613947 @rxin @pwendell @srowen could I get a few more eyes on this? Getting tricky to keep fixing merge conflicts and backporting them into this script :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-102196732 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-101720684 @shaneknapp honestly I hadn't thought about it, but since this should be capable of running on developers' boxes I would assume we should keep it agnostic. Is there a specific version of Python that Spark dictates is needed just for builds? If so we should match to that I would say. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-101762800 @nchammas you're correct on SPARK-6908. I hadn't pulled in #5955 as it wasn't already closed / merged so wasn't sure if that was something the committers wanted or not. Assumed it was, but figured I'd wait to see. If there's consensus on that though I'll be happy to add it in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-101455226 Now that the `branch-1.4` was cut, could I get a few eyes on this one? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-101035868 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-99234851 @ankurdave when you get a chance can you review this? I know its been a while, but I finally had a chance to get back to this and rework it given your above comments. The only issue I had was with the Iterator[(VertexId, A)] and, instead, assumed a structure of RDD[(VertexId, A)] with, possibly, duplicate keys. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-99234568 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-98175544 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-98228719 @pwendell @srowen @nchammas @rxin can I get a final review on this? It's LGTM and I've tested all the major error cases we need this script to report on. Recognize this is a highly critical piece of code when it comes to the whole Spark ecosystem though so I'd rather make sure we get more eyes on it / nits taken care of now before we look to merge into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-98230750 @shaneknapp forgot to add you into this as well (sorry)! Esp. since you're dealing with `stdout` and `stderr` issues right now I want to make sure this doesn't add any excess bloat to that (shouldn't...). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: redir stderr better, remove unused code, bette...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5817#issuecomment-98239787 LGTM for whenever this can get reviewed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97883074 jenkins, for the last time, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7214] Reserve space for unrolling even ...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5784#issuecomment-97910902 thanks for that clarity @shaneknapp. Looks like I don't have access to see the Jenkins environment variables from the link you sent (unless it became stale before I clicked), but I'll look for the review note and provide what I can! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97852085 jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7214] Reserve space for unrolling even ...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5784#issuecomment-97845313 @shaneknapp well, to start, I couldn't agree more that something looks very fishy with `pr_public_classes`. That was a direct port from the previous code though which makes it even more interesting :/ To address the SHA1 hash its getting pulled from, as I'm sure you know, [this code here](https://github.com/jenkinsci/ghprb-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ghprb/GhprbTrigger.java#L170) which, I'll admit, is interesting in and of itself in that it could produce the SHA1 as an actual hash **or** what we see above (in the case that the patch can merge without conflict into master). That said `pr_public_classes` only relies on the `ghprbActualCommit` and not the SHA1 so unless that were empty somehow I'm not immediately sure how this could be happening (of which it isn't empty according to Jenkins). My only thought, and I'm hoping you could shed some light here, would be a possible race condition from some shared state on each Jenkins box such that the Bash calls (or environment variables) aren't atomic to the PR they're building. Thoughts on that? I'll continue to dig and see what I can find. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97845733 Oh, we also showed a failed build scenario too. Going to finish with PySpark tests and SparkR tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97844015 @nchammas just wanted to get out a rolling count of what we've tested thus far: 1. Failed tests 2. Failed MiMa excludes 3. Failed scala style checks 4. Failed Apache RAT checks Will continue today to hopefully finish up the last bit! Let me know if I missed anything! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97514854 jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97571451 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97514720 @nchammas good catch, turns out there was a bug with what error code was being returned to properly have `dev/run-tests-jenkins` reporting the correct error message. Just pushed up a fix to hopefully handle this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97540839 jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97587252 :+1: jenkins, lets keep this going, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97590384 jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97580071 jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97600636 jenkins, looking for a break in mima! retest this please! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97110985 @nchammas any other thoughts? I think we've got a pretty solid start wrt the refactor here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-97184852 jenkinbox, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-96827964 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29155851 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess + +spark_proj_root = \ +os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) +user_home_dir = os.environ.get(HOME) + +sbt_maven_profile_args_env = SBT_MAVEN_PROFILES_ARGS +amplab_jenkins_build_tool_env = AMPLAB_JENKINS_BUILD_TOOL +amplab_jenkins_build_tool = os.environ.get(amplab_jenkins_build_tool_env) +amplab_jenkins = os.environ.get(AMPLAB_JENKINS) + +resolving_re = ^.*[info].*Resolving +merging_re = ^.*[warn].*Merging +including_re = ^.*[info].*Including +sbt_output_filter = re.compile(resolving_re + | + + merging_re + | + + including_re) + + +def get_error_codes(err_code_file): +Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith(readonly)] +return dict(err_codes) + + +def rm_r(path): +Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_output(cmd) +except subprocess.CalledProcessError as e: +print [error] running, e.cmd, ; received return code, e.returncode +sys.exit(e.returncode) + + +def set_sbt_maven_profile_args(): +Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine + +# base environment values for sbt_maven_profile_args_env which will be appended on +sbt_maven_profile_args_base = [-Pkinesis-asl] + +sbt_maven_profile_arg_dict = { +hadoop1.0 : [-Dhadoop.version=1.0.4], +hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1], +hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0], +hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE): +os.environ[sbt_maven_profile_args_env] = \ + .join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[sbt_maven_profile_args_env] = \ + .join(sbt_maven_profile_arg_dict.get(hadoop2.3, []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +Find and return the given program by its absolute path or 'None' +- from: http://stackoverflow.com/a/377028; + +fpath, fname = os.path.split(program
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-96718084 @rxin Thanks and taken care of! @nchammas First, thanks a ton for all the Python reviews (I know it can be tedious)! Second, to your point about removing the Bash-isms, you're completely right in that I left them in for **this PR** such that we can get incremental improvement to the codebase. Once I tackle SPARK-7018 (e.g. `dev/run-tests-jenkins`) I think I'll be able to slowly move some of this old Bash necessity out. Feedback on that being the right path? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-96726126 Roger that. Let me look into using `pipes.quote` for the `sbt` output. Do we know what's up with Jenkins right now? I saw a thread a while back talking about a power outage at Berkeley, but thought I saw a message from Shane saying everything was back to normal. Is that not the case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29163677 --- Diff: dev/run-tests.py --- @@ -0,0 +1,413 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess + +SPARK_PROJ_ROOT = \ +os.path.join(os.path.dirname(os.path.realpath(__file__)), ..) +USER_HOME_DIR = os.environ.get(HOME) + +SBT_MAVEN_PROFILE_ARGS_ENV = SBT_MAVEN_PROFILES_ARGS +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get(AMPLAB_JENKINS_BUILD_TOOL) +AMPLAB_JENKINS = os.environ.get(AMPLAB_JENKINS) + +SBT_OUTPUT_FILTER = re.compile(^.*[info].*Resolving + | + + ^.*[warn].*Merging + | + + ^.*[info].*Including) + + +def get_error_codes(err_code_file): +Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith(readonly)] +return dict(err_codes) + + +def rm_r(path): +Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +print [error] running, e.cmd, ; received return code, e.returncode +sys.exit(e.returncode) + + +def set_sbt_maven_profile_args(): +Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = [-Pkinesis-asl] + +sbt_maven_profile_arg_dict = { +hadoop1.0 : [-Dhadoop.version=1.0.4], +hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1], +hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0], +hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ + .join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ + .join(sbt_maven_profile_arg_dict.get(hadoop2.3, []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +Find and return the given program by its absolute path or 'None' +- from: http://stackoverflow.com/a/377028; + +fpath, fname = os.path.split(program) + +if fpath: +if is_exe(program): +return program +else: +for path in os.environ.get(PATH).split
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29105800 --- Diff: dev/run-tests --- @@ -17,239 +17,394 @@ # limitations under the License. # -# Go to the Spark project root directory -FWDIR=$(cd `dirname $0`/..; pwd) -cd $FWDIR +import os +import re +import shutil +import subprocess as sp + +# Set the Spark project root directory +spark_proj_root = os.path.abspath(..) +# Set the user 'HOME' directory +user_home_dir = os.environ.get(HOME) +# Set the sbt maven profile arguments environment variable name +sbt_maven_profile_args_env = SBT_MAVEN_PROFILES_ARGS +# Set the amplab jenkins build tool environment variable name +amplab_jenkins_build_tool_env = AMPLAB_JENKINS_BUILD_TOOL +# Set the amplab jenkins build tool environment value +amplab_jenkins_build_tool = os.environ.get(amplab_jenkins_build_tool_env) +# Set whether we're on an Amplab Jenkins box by checking for a specific +# environment variable +amplab_jenkins = os.environ.get(AMPLAB_JENKINS) +# Set the pattern for sbt output e.g. [info] Resolving ... +resolving_re = ^.*[info].*Resolving +# Set the pattern for sbt output e.g. [warn] Merging ... +merging_re = ^.*[warn].*Merging +# Set the pattern for sbt output e.g. [info] Including ... +including_re = ^.*[info].*Including +# Compile the various regex patterns into a filter +sbt_output_filter = re.compile(resolving_re + | + + merging_re + | + + including_re) + +def get_error_codes(err_code_file): +Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith(readonly)] +return dict(err_codes) + +def rm_r(path): +Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + +def lineno(): +Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059; + +return inspect.currentframe().f_back.f_lineno + +def set_sbt_maven_profile_args(): +Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine + +# base environment values for sbt_maven_profile_args_env which will be appended on +sbt_maven_profile_args_base = [-Pkinesis-asl] + +sbt_maven_profile_arg_dict = { +hadoop1.0 : [-Dhadoop.version=1.0.4], +hadoop2.0 : [-Dhadoop.version=2.0.0-mr1-cdh4.1.1], +hadoop2.2 : [-Pyarn, -Phadoop-2.2, -Dhadoop.version=2.2.0], +hadoop2.3 : [-Pyarn, -Phadoop-2.3, -Dhadoop.version=2.3.0], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get(AMPLAB_JENKINS_BUILD_PROFILE): +os.environ[sbt_maven_profile_args_env] = \ + .join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[sbt_maven_profile_args_env] = \ + .join(sbt_maven_profile_arg_dict.get(hadoop2.3, []) + + sbt_maven_profile_args_base) + +def is_exe(path): +Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028; + +return os.path.isfile(path) and os.access(path, os.X_OK) + +def which(program): +Find and return the given program by its absolute path or 'None' +- from: http://stackoverflow.com/a/377028; + +fpath, fname = os.path.split(program) + +if fpath: +if is_exe(program): +return program +else: +for path in os.environ.get(PATH).split(os.pathsep): +path = path.strip('') +exe_file = os.path.join(path, program) +if is_exe(exe_file): +return exe_file +return None + +def determine_java_executable(): +Will return the *best* path possible for a 'java' executable or `None` + +java_home = os.environ.get(JAVA_HOME) + +# check if there is an executable at $JAVA_HOME/bin/java +java_exe = which(os.path.join(java_home, bin/java)) +# if the java_exe
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/5694 [SPARK-7017][Build][Project Infra]: Refactor dev/run-tests into Python All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step! /cc @srowen @pwendell @nchammas You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-7017 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5694.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5694 commit 6126c4f4d97db16b0ed6a95c60fae1fff44e2afe Author: Brennon York brennon.y...@capitalone.com Date: 2015-04-24T21:27:54Z refactored run-tests into python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][HOTFIX][SPARK-4123]: Fix bug in PR depen...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5443#issuecomment-92518053 @shaneknapp can you help me understand how Jenkins is doing the checkouts? I'm seeing the PR builder outputting: ``` Building remotely on amp-jenkins-worker-06 (centos) in workspace /home/jenkins/workspace/SparkPullRequestBuilder git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository git config remote.origin.url https://github.com/apache/spark.git # timeout=10 Fetching upstream changes from https://github.com/apache/spark.git git --version # timeout=10 git fetch --tags --progress https://github.com/apache/spark.git +refs/pull/5443/*:refs/remotes/origin/pr/5443/* # timeout=15 git rev-parse origin/pr/5443/merge^{commit} # timeout=10 git branch -a --contains c5916336e6aff94dd3abfc9a0a41a2528c765fce # timeout=10 git rev-parse remotes/origin/pr/5443/merge^{commit} # timeout=10 Checking out Revision c5916336e6aff94dd3abfc9a0a41a2528c765fce (origin/pr/5443/merge) git config core.sparsecheckout # timeout=10 git checkout -f c5916336e6aff94dd3abfc9a0a41a2528c765fce ``` although I'm a bit confused what checkout I should switch between if, say, I want to, from a PR, checkout the `master` branch, then switch back to the given PR branch, then possibly back to `master`, and finally back to the PR again. I'm currently doing what I believe is correct [here](https://github.com/apache/spark/blob/master/dev/tests/pr_new_dependencies.sh#L42) although there are times when the checkout from `master` back to the current PR fails, producing odd dependency reports. I've noticed that Jenkins is using the `-f` flag which I've added, but wanted to see if you had any thoughts into the matter. Further, I've added `echo` statements to dump the `ghprbActualCommit`, the `sha1`, and the output of `git rev-parse HEAD`. Each are different commit hashes which makes me further think this is the cause for all the errors. Again, any advice? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5096#issuecomment-90827465 We can certainly set the timeout to be something larger. Let me take a look at the previous builds and see if I can find a good timeout number and if there might be anything else we can do. @pwendell any other ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5096#issuecomment-91028275 @shivaram a few things after looking at the build code some more... 1. The timeout value comes from the line [here in `dev/run-tests-jenkins`](https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L50). Its currently set at 120 minutes and **doesn't** include the time it takes for PR's to be tested against the master branch (i.e. for dependencies). We could certainly up that value, but I'd ask that since, I'm assuming, the `dev/run-tests` script on this PR runs all the new SparkR tests (plus any additional for core Spark you've added), that you run `dev/run-tests` locally and, for whatever additional time is needed, update the timeout in `dev/run-tests-jenkins` for this PR. The impetus for running locally first is that I'd much rather get a baseline for what it takes for all the new tests to run and then add 15ish minutes for fluff rather than throw a number into the wind. 2. Completely agree we should get some timing metrics for the various PR tests (thanks for the idea!). I'll generate a JIRA for that and take a look soon. That said, just to reiterate, those tests **are not** holding up the actual Spark test suite from finishing unless Jenkins has some deeper timing hooks than I know about. I assume though that it's merely a factor of the large corpus tests that were likely added in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SPARK-4123]: Updated to fix bug where...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/5269 [HOTFIX][SPARK-4123]: Updated to fix bug where multiple dependencies added breaks Github output Currently there is a bug whereby if a new patch introduces more than one new dependency (or removes more than one) it breaks the Github post output (see [this build](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29399/consoleFull)). This hotfix will remove `awk` print statements in place of `printf` so as not to automatically add the newline character. It is then escaped and added directly at the end of the `awk` statement. This should take a failed build output such as: ```json api_response: { message: Problems parsing JSON, documentation_url: https://developer.github.com/v3; } data: {body: [Test build #29400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull) for PR 5266 at commit [`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a).\n * This patch **passes all tests**.\n * This patch merges cleanly.\n * This patch adds the following public classes _(experimental)_:\n * `class IDF extends Estimator[IDFModel] with IDFParams `\n * `class Normalizer extends UnaryTransformer[Vector, Vector, Normalizer] `\n\n * This patch **adds the following new dependencies:**\n * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar`\n * This patch **removes the following dependencies:**\n * `avro-1.7.6.jar` * `breeze-macros_2.10-0.11.1.jar` * `breeze_2.10-0.11.1.jar`} ``` and turn it into: ```json api_response: { message: Problems parsing JSON, documentation_url: https://developer.github.com/v3; } data: {body: [Test build #29400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29400/consoleFull) for PR 5266 at commit [`2aa4be0`](https://github.com/apache/spark/commit/2aa4be0e1d7ce052f8c901c6d9462c611c3a920a).\n * This patch **passes all tests**.\n * This patch merges cleanly.\n * This patch adds the following public classes _(experimental)_:\n * `class IDF extends Estimator[IDFModel] with IDFParams `\n * `class Normalizer extends UnaryTransformer[Vector, Vector, Normalizer] `\n\n * This patch **adds the following new dependencies:**\n * `avro-1.7.7.jar`\n * `breeze-macros_2.10-0.11.2.jar`\n * `breeze_2.10-0.11.2.jar`\n * This patch **removes the following dependencies:**\n * `avro-1.7.6.jar`\n * `breeze-macros_2.10-0.11.1.jar`\n * `breeze_2.10-0.11.1.jar`} ``` I've tested this locally and all worked. /cc @srowen @pwendell @nchammas You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark HOTFIX-SPARK-4123 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5269 commit a4410680b9cc1ed4616127f320f51198783c250c Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-30T15:37:01Z Updated awk to use printf and to manually insert newlines so that the JSON github string when posted is corrected --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SPARK-4123]: Updated to fix bug where...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5269#issuecomment-87766986 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-87827641 To test #5269 I'm going to rerun these Jenkins tests as this is a prime example of that bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6528][ML] Add IDF transformer
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5266#issuecomment-87827693 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra]: Show new dependen...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-87025972 Thanks for the update guys. Per the consensus I moved the tests have started message to before the PR tests run. Also, @srowen, updated all items per your comments. Any additional thoughts all? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-86747670 /cc @pwendell @srowen @nchammas All complete. You can check out build 29250 up a few to get what the output would be like if a new dependency were added. One issue which I'd love to get some opinion on... Right now the initial post message to Github (i.e. the Test build started + patch merges cleanly) will take up to 20 minutes to post **if** any `pom.xml` files were changed because it will then run and build both the current PR plus the master branch. This is purely because the This patch merges cleanly output is from a `pr_test` and runs in the core test loop. The easiest option to reflect the original way things have been posted would be to move the `pr_merge_ability` test out of the main test loop and have it execute independently. The other option would be to merely post the Test build started at ... message and remove the merges cleanly portion to be posted with in the post-test message. I'll admit I'm more in favor of the latter option as I think it keeps things clean as well as the fact that merges cleanly is slightly ambiguous given that Github reports on this as well. Thoughts? Whichever way we go I can get that final change up and then I'd say this is ready for review into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-86178368 @ankurdave, thanks for the clarification. Let me take a second stab at this given what you stated and I should have something much more in line with the original thought! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5093#discussion_r27158731 --- Diff: dev/tests/pr_new_dependencies.sh --- @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script follows the base format for testing pull requests against +# another branch and returning results to be published. More details can be +# found at dev/run-tests-jenkins. +# +# Arg1: The Github Pull Request Actual Commit +#+ known as `ghprbActualCommit` in `run-tests-jenkins` +# Arg2: The SHA1 hash +#+ known as `sha1` in `run-tests-jenkins` +# + +ghprbActualCommit=$1 +sha1=$2 + +MVN_BIN=`pwd`/build/mvn +CURR_CP_FILE=my-classpath.txt +MASTER_CP_FILE=master-classpath.txt + +${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \ --- End diff -- Sounds like a plan. Once I get this working in a state I like I'll set a gate to check all pom.xml files for changes and, if any show changes, will go ahead and execute the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5175#issuecomment-85739639 /cc @maropu @ankurdave @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/5175 [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference Adds a `Graph#minus` method which will return only unique `VertexId`'s from the calling `VertexRDD`. For example: ``` Set((0L,0),(1L,1)).minus(Set((1L,1),(2L,2))) Set((0L,0)) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-6510 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5175 commit 7227c0ffd8a2ea93a3dcb28440c912921ff14380 Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-24T22:59:28Z beginning work on minus functionality commit aaa030b3ff04738f5ffd38b6fec3f92043359b3a Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-24T23:14:54Z completed graph#minus functionality commit 6575d927cd36076db7797a12d45d6bb98f1bf43e Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-24T23:16:09Z updated mima exclude --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6510][GraphX]: Add Graph#minus method t...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5175#issuecomment-85788983 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85789231 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5142#discussion_r27046758 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala --- @@ -154,7 +154,30 @@ abstract class VertexRDD[VD]( * @return a VertexRDD containing the results of `f` */ def leftZipJoin[VD2: ClassTag, VD3: ClassTag] - (other: VertexRDD[VD2])(f: (VertexId, VD, Option[VD2]) = VD3): VertexRDD[VD3] + (other: VertexRDD[VD2]) + (f: (VertexId, VD, Option[VD2]) = VD3) +: VertexRDD[VD3] + + /** + * Left joins this RDD with another VertexRDD with the same index. This function will fail if + * both VertexRDDs do not share the same index. The resulting vertex set contains an entry for --- End diff -- Very true. Rereading the docs it looks like they haven't been updated with verbage of a few of the more recent bug fixes. Will add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5142#discussion_r27045011 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala --- @@ -136,6 +136,31 @@ private[graphx] abstract class VertexPartitionBaseOps leftJoin(createUsingIndex(other))(f) } + def leftJoinWithFold[VD2: ClassTag, VD3: ClassTag, A] + (other: Self[VD2], acc: A) --- End diff -- I'm pretty sure @ankurdave was looking at the `*WithFold` style addition and not changing the original method names to keep backwards compatibility as best as possible. I agree its a bit confusing, but I figure maintaining backwards compatibility as best as best is the better option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5142#discussion_r27045035 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala --- @@ -154,7 +154,30 @@ abstract class VertexRDD[VD]( * @return a VertexRDD containing the results of `f` */ def leftZipJoin[VD2: ClassTag, VD3: ClassTag] - (other: VertexRDD[VD2])(f: (VertexId, VD, Option[VD2]) = VD3): VertexRDD[VD3] + (other: VertexRDD[VD2]) + (f: (VertexId, VD, Option[VD2]) = VD3) +: VertexRDD[VD3] + + /** + * Left joins this RDD with another VertexRDD with the same index. This function will fail if + * both VertexRDDs do not share the same index. The resulting vertex set contains an entry for + * each vertex in `this`. + * If `other` is missing any vertex in this VertexRDD, `f` is passed `None`. + * + * @tparam VD2 the attribute type of the other VertexRDD + * @tparam VD3 the attribute type of the resulting VertexRDD + * @tparam A the type of the given starting value and accumulator + * + * @param other the other VertexRDD with which to join. + * @param acc the initial value for the accumulator + * @param f the function mapping a vertex id and its attributes in this and the other vertex set + * to a new vertex attribute. + * @return a VertexRDD containing the results of `f` + */ + def leftZipJoinWithFold[VD2: ClassTag, VD3: ClassTag, A] + (other: VertexRDD[VD2], acc: A) + (f: (A, VertexId, VD, Option[VD2]) = VD3) --- End diff -- Very good point. Will update that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5093#discussion_r26983769 --- Diff: dev/tests/pr_new_dependencies.sh --- @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script follows the base format for testing pull requests against +# another branch and returning results to be published. More details can be +# found at dev/run-tests-jenkins. +# +# Arg1: The Github Pull Request Actual Commit +#+ known as `ghprbActualCommit` in `run-tests-jenkins` +# Arg2: The SHA1 hash +#+ known as `sha1` in `run-tests-jenkins` +# + +ghprbActualCommit=$1 +sha1=$2 + +MVN_BIN=`pwd`/build/mvn +CURR_CP_FILE=my-classpath.txt +MASTER_CP_FILE=master-classpath.txt + +${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \ --- End diff -- Yeah, its required :/ I've tested without it and it fails at building `spark-networking`. This adds on, for each run (of which there are two) around 4.5 mins, so 9mins added to the build time. I also looked at seeing what `sbt` could output, but couldn't find anything. Further thought about this as a special case test and to grab the output from the generic build of spark that happens for each PR, but with having to build against the `master` branch as well that didn't seem like a much better option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org