Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Nicholas Chammas
On Mon, Oct 13, 2014 at 11:57 AM, Patrick Wendell pwend...@gmail.com wrote: That would even work for imports as well, you'd just have a thing where if anyone modified some imports they would have to fix all the imports in that file. It's at least worth a try. OK, that sounds like a fair

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
Thanks for doing this work Shane. So is Jenkins in the new datacenter now? Do you know if the problems with checking out patches from GitHub should be resolved now? Here's an example from the past hour https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21702/console . Nick On

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
Ah, that sucks. Thank you for looking into this. On Mon, Oct 13, 2014 at 5:43 PM, shane knapp skn...@berkeley.edu wrote: On Mon, Oct 13, 2014 at 2:28 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for doing this work Shane. So is Jenkins in the new datacenter now? Do you

Re: new jenkins update + tentative release date

2014-10-13 Thread Nicholas Chammas
this to 20 minutes... let's see if that helps. On Mon, Oct 13, 2014 at 2:48 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, that sucks. Thank you for looking into this. On Mon, Oct 13, 2014 at 5:43 PM, shane knapp skn...@berkeley.edu wrote: On Mon, Oct 13, 2014 at 2:28 PM

Re: Trouble running tests

2014-10-10 Thread Nicholas Chammas
. The hive/test part takes the longest, so I usually leave that out until just before submitting unless my changes are hive specific. On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas nicholas.cham...@gmail.com javascript:_e(%7B%7D,'cvml','nicholas.cham...@gmail.com'); wrote: _RUN_SQL_TESTS needs

Re: Trouble running tests

2014-10-09 Thread Nicholas Chammas
_RUN_SQL_TESTS needs to be true as well. Those two _... variables set get correctly when tests are run on Jenkins. They’re not meant to be manipulated directly by testers. Did you want to run SQL tests only locally? You can try faking being Jenkins by setting AMPLAB_JENKINS=true before calling

spark-prs and mesos/spark-ec2

2014-10-09 Thread Nicholas Chammas
Does it make sense to point the Spark PR review board to read from mesos/spark-ec2 as well? PRs submitted against that repo may reference Spark JIRAs and need review just like any other Spark PR. Nick

Re: Unneeded branches/tags

2014-10-08 Thread Nicholas Chammas
. On Tue, Oct 7, 2014 at 6:27 PM, Reynold Xin r...@databricks.com wrote: Those branches are no longer active. However, I don't think we can delete branches from github due to the way ASF mirroring works. I might be wrong there. On Tue, Oct 7, 2014 at 6:25 PM, Nicholas Chammas

Re: Extending Scala style checks

2014-10-08 Thread Nicholas Chammas
: Disallow trailing spaces https://issues.apache.org/jira/browse/SPARK-3850. Nick On Tue, Oct 7, 2014 at 4:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: For starters, do we have a list of all the Scala style rules that are currently not enforced automatically but are likely well

Re: Extending Scala style checks

2014-10-07 Thread Nicholas Chammas
@gmail.com wrote: Since we can easily catch the list of all changed files in a PR, I think we can start with adding the no trailing space check for newly changed files only? On 10/2/14 9:24 AM, Nicholas Chammas wrote: Yeah, I remember that hell when I added PEP 8 to the build checks

Unneeded branches/tags

2014-10-07 Thread Nicholas Chammas
Just curious: Are there branches and/or tags on the repo that we don’t need anymore? What are the scala-2.9 and streaming branches for, for example? And do we still need branches for older versions of Spark that we are not backporting stuff to, like branch-0.5? Nick ​

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread Nicholas Chammas
-cli-commands.html On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for posting that script, Patrick. It looks like a good place to start. Regarding Docker vs. Packer, as I understand it you can use Packer to create Docker containers at the same time

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-04 Thread Nicholas Chammas
tests. I'm not sure if the long term place for this would be inside the spark codebase or a community library or what. But it would definitely be very valuable to have if someone wanted to take it on. - Patrick On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-03 Thread Nicholas Chammas
template. That's very cool. I'll be looking into this. Nick On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for the update, Nate. I'm looking forward to seeing how these projects turn out. David, Packer looks very, very interesting. I'm gonna look

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nicholas Chammas
Nate -Original Message- From: David Rowe [mailto:davidr...@gmail.com] Sent: Thursday, October 02, 2014 4:44 PM To: Nicholas Chammas Cc: dev; Shivaram Venkataraman Subject: Re: EC2 clusters ready in launch time + 30 seconds I think this is exactly what packer is for. See e.g. http

Re: amplab jenkins is down

2014-10-01 Thread Nicholas Chammas
On Thu, Sep 4, 2014 at 4:19 PM, shane knapp skn...@berkeley.edu wrote: on a side note, this incident will be accelerating our plan to move the entire jenkins infrastructure in to a managed datacenter environment. this will be our major push over the next couple of weeks. more details about

Re: do MIMA checking before all test cases start?

2014-10-01 Thread Nicholas Chammas
move it first. Wouldn't hurt. On Thu, Sep 25, 2014 at 6:39 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It might still make sense to make this change if MIMA checks are always relatively quick, for the same reason we do style checks first. On Thu, Sep 25, 2014 at 12:25 AM, Nan

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
, Oct 1, 2014 at 2:01 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As discussed here https://github.com/apache/spark/pull/2619, it would be good to extend our Scala style checks to programmatically enforce as many of our style rules as possible. Does anyone know if it's

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
at 6:13 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, since there appears to be a built-in rule for end-of-line whitespace, Michael and Cheng, y'all should be able to add this in pretty easily. Nick On Wed, Oct 1, 2014 at 6:37 PM, Patrick Wendell pwend...@gmail.com wrote

Re: Extending Scala style checks

2014-10-01 Thread Nicholas Chammas
Does anyone know if Scala has something equivalent to autopep8 https://pypi.python.org/pypi/autopep8? It would help patch up the existing code base a lot quicker as we add in new style rules. ​ On Wed, Oct 1, 2014 at 9:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, I remember

thank you for reviewing our patches

2014-09-26 Thread Nicholas Chammas
I recently came across this mailing list post by Linus Torvalds https://lkml.org/lkml/2004/12/20/255 about the value of reviewing even “trivial” patches. The following passages stood out to me: I think that much more important than the patch is the fact that people get used to the notion that

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Nicholas Chammas
That is correct. Aliases in the SELECT clause can only be referenced in the ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the statement, like concat() in this case. A more elegant alternative, which is probably not available in Spark SQL yet, is to use Common Table

Re: Tests and Test Infrastructure

2014-09-14 Thread Nicholas Chammas
I fully support this. A smoothly running test infrastructure helps everybody’s work just flow better. The Jenkins Pull Request Builder is mostly functioning again. However, we are working on a simpler technical pipeline for testing patches, as this plug-in has been a constant source of downtime

don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
Would it make sense to have Jenkins *not* trigger tests when the only files that have changed are .md files (example https://github.com/apache/spark/pull/2367)? Those don’t even need RAT checks, right? I can make this change if it makes sense. Nick ​

Re: don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
We could still have Jenkins post a message to the effect of “this patch only modifies .md files; no tests will be run”. ​ On Fri, Sep 12, 2014 at 3:48 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Would it make sense to have Jenkins *not* trigger tests when the only files that have

Re: Announcing Spark 1.1.0!

2014-09-11 Thread Nicholas Chammas
Nice work everybody! I'm looking forward to trying out this release! On Thu, Sep 11, 2014 at 8:12 PM, Patrick Wendell pwend...@gmail.com wrote: I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark's largest

Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-10 Thread Nicholas Chammas
I'm looking forward to this. :) Looks like Jenkins is having trouble triggering builds for new commits or after user requests (e.g. https://github.com/apache/spark/pull/2339#issuecomment-55165937). Hopefully that will be resolved tomorrow. Nick On Tue, Sep 9, 2014 at 5:00 PM, shane knapp

Re: jenkins failed all tests?

2014-09-07 Thread Nicholas Chammas
Yeah, it feels like Jenkins has become a lot more flaky recently. Or maybe it’s just our tests. Here are some more examples: - https://github.com/apache/spark/pull/2310#issuecomment-54741169 - https://github.com/apache/spark/pull/2313#issuecomment-54752766 Nick ​ On Sun, Sep 7, 2014 at

trimming unnecessary test output

2014-09-06 Thread Nicholas Chammas
Continuing the discussion started here https://github.com/apache/spark/pull/2279, I’m wondering if people already know that certain test output is unnecessary and should be trimmed. For example https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19917/consoleFull, I see a bunch of

Scala's Jenkins setup looks neat

2014-09-06 Thread Nicholas Chammas
After reading Erik's email, I found this Scala PR https://github.com/scala/scala/pull/3963 and immediately noticed a few cool things: - Jenkins is hooked directly into GitHub somehow, so you get the All is well message in the merge status window, presumably based on the last test status

Re: Scala's Jenkins setup looks neat

2014-09-06 Thread Nicholas Chammas
no for security reasons. On Saturday, September 6, 2014, Nicholas Chammas nicholas.cham...@gmail.com wrote: After reading Erik's email, I found this Scala PR https://github.com/scala/scala/pull/3963 and immediately noticed a few cool things: - Jenkins is hooked directly into GitHub somehow

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
. that's exactly the behavior i saw earlier, and will be figuring out first thing tomorrow morning. i bet it's an environment issues on the slaves. On Thu, Sep 4, 2014 at 7:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Looks like during the last build https://amplab.cs.berkeley.edu

Re: amplab jenkins is down

2014-09-05 Thread Nicholas Chammas
are triggering builds. On Fri, Sep 5, 2014 at 1:23 PM, shane knapp skn...@berkeley.edu wrote: it's looking like everything except the pull request builders are working. i'm going to be working on getting this resolved today. On Fri, Sep 5, 2014 at 8:18 AM, Nicholas Chammas nicholas.cham

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-04 Thread Nicholas Chammas
On Thu, Sep 4, 2014 at 1:50 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: There is a regression when using pyspark to read data from HDFS. Could you open a JIRA http://issues.apache.org/jira/ with a brief repro? We'll look into it. (You could also provide a repro in a separate

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote: AND WE'RE UP! sorry that this took so long... i'll send out a

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
, Nicholas Chammas nicholas.cham...@gmail.com wrote: Woohoo! Thanks Shane. Do you know if queued PR builds will automatically be picked up? Or do we have to ping the Jenkinmensch manually from each PR? Nick On Thu, Sep 4, 2014 at 5:37 PM, shane knapp skn...@berkeley.edu wrote

Re: amplab jenkins is down

2014-09-04 Thread Nicholas Chammas
and see if that fixes things. On Thu, Sep 4, 2014 at 4:56 PM, shane knapp skn...@berkeley.edu wrote: looking On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble https://amplab.cs.berkeley.edu/jenkins/view/Pull

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Nicholas Chammas
On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell pwend...@gmail.com wrote: == What default changes should I be aware of? == 1. The default value of spark.io.compression.codec is now snappy -- Old behavior can be restored by switching to lzf 2. PySpark now performs external spilling during

spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Nicholas Chammas
Spawned by this discussion https://github.com/apache/spark/pull/1120#issuecomment-54305831. See these 2 lines in spark_ec2.py: - spark_ec2 L42 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42 - spark_ec2 L566

Re: hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread Nicholas Chammas
Hi Shane! Thank you for doing the Jenkins upgrade last week. It's nice to know that infrastructure is gonna get some dedicated TLC going forward. Welcome aboard! Nick On Tue, Sep 2, 2014 at 1:35 PM, shane knapp skn...@berkeley.edu wrote: so, i had a meeting w/the databricks guys on friday

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Nicholas Chammas
In light of the discussion on SPARK-, I'll revoke my -1 vote. The issue does not appear to be serious. On Sun, Aug 31, 2014 at 5:14 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: -1: I believe I've found a regression from 1.0.2. The report is captured in SPARK- https

Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
What do people think of running the Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/ (repo https://github.com/amplab/benchmark) as part of preparing every new release of Spark? We'd run it just for Spark and effectively use it as another type of test to track any performance progress

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
wrote: Hi Nicholas, At Databricks we already run https://github.com/databricks/spark-perf for each release, which is a more comprehensive performance test suite. Matei On September 1, 2014 at 8:22:05 PM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: What do people think of running

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
PM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: Oh, that's sweet. So, a related question then. Did those tests pick up the performance issue reported in SPARK-? Does it make sense to add a new test to cover that case? On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Nicholas Chammas
-1: I believe I've found a regression from 1.0.2. The report is captured in SPARK- https://issues.apache.org/jira/browse/SPARK-. On Sat, Aug 30, 2014 at 6:07 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0!

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Nicholas Chammas
On Sun, Aug 31, 2014 at 6:38 PM, chutium teng@gmail.com wrote: has anyone tried to build it on hadoop.version=2.0.0-mr1-cdh4.3.0 or hadoop.version=1.0.3-mapr-3.0.3 ? Is the behavior you're seeing a regression from 1.0.2, or does 1.0.2 have this same problem? Nick

Re: Handling stale PRs

2014-08-30 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
There were several formatting and typographical errors in the SQL docs that I've fixed in this PR https://github.com/apache/spark/pull/2201. Dunno if we want to roll that into the release. On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote: Okay I'll plan to add cdh4

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
, Patrick Wendell pwend...@gmail.com wrote: Hey Nicholas, Thanks for this, we can merge in doc changes outside of the actual release timeline, so we'll make sure to loop those changes in before we publish the final 1.1 docs. - Patrick On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue,

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
features. The source is at https://github.com/databricks/spark-pr-dashboard (pull requests and issues welcome!) On August 27, 2014 at 2:11:41 PM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I

Re: jenkins maintenance/downtime, aug 28th, 730am-9am PDT

2014-08-27 Thread Nicholas Chammas
Looks like we're currently at 1.568 so we should be getting a nice slew of UI tweaks and bug fixes. Neat! On Wed, Aug 27, 2014 at 7:13 PM, shane knapp skn...@berkeley.edu wrote: tomorrow morning i will be upgrading jenkins to the latest/greatest (1.577). at 730am, i will put jenkins in to a

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
the app so folks can contribute to it. - Josh On August 26, 2014 at 8:16:46 AM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form

spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Nicholas Chammas
I downloaded the source code release for 1.0.2 from here http://spark.apache.org/downloads.html and launched an EC2 cluster using spark-ec2. After the cluster finishes launching, I fire up the shell and check the version: scala sc.version res1: String = 1.0.1 The startup banner also shows the

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
with Spark QA’s credentials in order to allow it to post comments on issues, etc. - Josh On August 26, 2014 at 11:38:08 AM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: OK, that sounds pretty cool. Josh, Do you see this app as encompassing or supplanting the functionality I

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Nicholas Chammas
=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078 Might be a small win to push this work to a bot ASF manages if we can get access to it (and if we have no concerns about depending on an another external service). Nick On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks

Handling stale PRs

2014-08-25 Thread Nicholas Chammas
Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave

Re: Spark Contribution

2014-08-23 Thread Nicholas Chammas
with a pointer in README.md? or keep it both places? On Sat, Aug 23, 2014 at 1:08 AM, Reynold Xin r...@databricks.com javascript:; wrote: Great idea. Added the link https://github.com/apache/spark/blob/master/README.md On Thu, Aug 21, 2014 at 4:06 PM, Nicholas Chammas nicholas.cham

Re: Spark Contribution

2014-08-21 Thread Nicholas Chammas
We should add this link to the readme on GitHub btw. 2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com님이 작성한 메시지: The Apache Spark wiki on how to contribute should be great place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark - Henry On Thu, Aug 21,

Re: -1s on pull requests?

2014-08-15 Thread Nicholas Chammas
On Sun, Aug 3, 2014 at 4:35 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. This is now captured in this JIRA issue https

Re: Tests failing

2014-08-15 Thread Nicholas Chammas
Shivaram, Can you point us to an example of that happening? The Jenkins console output, that is. Nick On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Also I think Jenkins doesn't post build timeouts to github. Is there anyway we can fix that ? On

Re: Tests failing

2014-08-15 Thread Nicholas Chammas
in its own timeout. It might be possible to just use this: http://linux.die.net/man/1/timeout - Patrick On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: OK, I've captured this in SPARK-3076 https://issues.apache.org/jira/browse/SPARK-3076. Patrick

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-08-13 Thread Nicholas Chammas
On a related note, I recently heard about Distributed R https://github.com/vertica/DistributedR, which is coming out of HP/Vertica and seems to be their proposition for machine learning at scale. It would be interesting to see some kind of comparison between that and MLlib (and perhaps also

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-11 Thread Nicholas Chammas
are very slow. - Patrick On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It looks like this script doesn't catch PRs that are opened and *then* have the JIRA issue ID added to the name. Would it be easy to somehow have the script trigger on PR name

Unit tests in 5 minutes

2014-08-08 Thread Nicholas Chammas
Howdy, Do we think it's both feasible and worthwhile to invest in getting our unit tests to finish in under 5 minutes (or something similarly brief) when run by Jenkins? Unit tests currently seem to take anywhere from 30 min to 2 hours. As people add more tests, I imagine this time will only

Re: -1s on pull requests?

2014-08-05 Thread Nicholas Chammas
1. Include the commit hash in the tests have started/completed FYI: Looks like Xiangrui's already got a JIRA issue for this. SPARK-2622: Add Jenkins build numbers to SparkQA messages https://issues.apache.org/jira/browse/SPARK-2622 2. Pin a message to the start or end of the PR Should new

Re: -1s on pull requests?

2014-08-03 Thread Nicholas Chammas
On Mon, Jul 21, 2014 at 4:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: This also happens when something accidentally gets merged after the tests have started but before tests have passed. Some improvements to SparkQA https://github.com/SparkQA could help with this. May I suggest:

Re: -1s on pull requests?

2014-08-03 Thread Nicholas Chammas
On Sun, Aug 3, 2014 at 11:29 PM, Patrick Wendell pwend...@gmail.com wrote: Nick - Any interest in doing these? this is all doable from within the spark repo itself because our QA harness scripts are in there: https://github.com/apache/spark/blob/master/dev/run-tests-jenkins If not, could you

Re: ASF JIRA is down for maintenance

2014-08-02 Thread Nicholas Chammas
Seems to be back up now. On Sat, Aug 2, 2014 at 2:06 AM, Patrick Wendell pwend...@gmail.com wrote: Please don't let this prevent you from merging patches, just keep a list and we can update the JIRA later. - Patrick

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-29 Thread Nicholas Chammas
- spun up an EC2 cluster successfully using spark-ec2 - tested S3 file access from that cluster successfully +1 ​ On Tue, Jul 29, 2014 at 1:46 AM, Henry Saputra henry.sapu...@gmail.com wrote: NOTICE and LICENSE files look good Hashes and sigs look good No executable in the source

Re: JIRA content request

2014-07-29 Thread Nicholas Chammas
+1 on using JIRA workflows to manage the backlog, and +9000 on having decent descriptions for all JIRA issues. On Tue, Jul 29, 2014 at 7:48 PM, Sean Owen so...@cloudera.com wrote: How about using a JIRA status like Documentation Required to mean burden's on you to elaborate with a motivation

Re: Fraud management system implementation

2014-07-28 Thread Nicholas Chammas
This sounds more like a user list https://spark.apache.org/community.html question. This is the dev list, where people discuss things related to contributing code and such to Spark. On Mon, Jul 28, 2014 at 10:15 AM, jitendra shelar jitendra.shelar...@gmail.com wrote: Hi, I am new to spark.

Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-23 Thread Nicholas Chammas
://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request

Contributing to Spark needs PySpark build/test instructions

2014-07-21 Thread Nicholas Chammas
Contributing to Spark https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark needs a line or two about building and testing PySpark. A call out of run-tests, for example, would be helpful for new contributors to PySpark. Nick ​

Re: Contributing to Spark needs PySpark build/test instructions

2014-07-21 Thread Nicholas Chammas
For the record, the triggering discussion is here https://github.com/apache/spark/pull/1505#issuecomment-49671550. I assumed that sbt/sbt test covers all the tests required before submitting a patch, and it appears that it doesn’t. ​ On Mon, Jul 21, 2014 at 6:42 PM, Nicholas Chammas

Re: Contributing to Spark needs PySpark build/test instructions

2014-07-21 Thread Nicholas Chammas
, Reynold Xin r...@databricks.com wrote: I added an automated testing section: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting Can you take a look to see if it is what you had in mind? On Mon, Jul 21, 2014 at 3:54 PM, Nicholas Chammas

Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-20 Thread Nicholas Chammas
That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests

Re: small (yet major) change going in: broadcasting RDD to reduce task size

2014-07-17 Thread Nicholas Chammas
On Thu, Jul 17, 2014 at 1:23 AM, Stephen Haberman stephen.haber...@gmail.com wrote: I'd be ecstatic if more major changes were this well/succinctly explained Ditto on that. The summary of user impact was very nice. It would be good to repeat that on the user list or release notes when this

ec2 clusters launched at 9fe693b5b6 are broken (?)

2014-07-14 Thread Nicholas Chammas
Just launched an EC2 cluster from git hash 9fe693b5b6ed6af34ee1e800ab89c8a11991ea38. Calling take() on an RDD accessing data in S3 yields the following error output. I understand that NoClassDefFoundError errors may mean something in the deployment was messed up. Is that correct? When I launch a

Re: ec2 clusters launched at 9fe693b5b6 are broken (?)

2014-07-14 Thread Nicholas Chammas
this is related to the issues you posted in a separate thread. On Mon, Jul 14, 2014 at 6:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Just launched an EC2 cluster from git hash 9fe693b5b6ed6af34ee1e800ab89c8a11991ea38. Calling take() on an RDD accessing data in S3

Re: EC2 clusters ready in launch time + 30 seconds

2014-07-12 Thread Nicholas Chammas
On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico n...@reactor8.com wrote: Starting to work through some automation/config stuff for spark stack on EC2 with a project, will be focusing the work through the apache bigtop effort to start, can then share with spark community directly as things

EC2 clusters ready in launch time + 30 seconds

2014-07-10 Thread Nicholas Chammas
Hi devs! Right now it takes a non-trivial amount of time to launch EC2 clusters. Part of this time is spent starting the EC2 instances, which is out of our control. Another part of this time is spent installing stuff on and configuring the instances. This, we can control. I’d like to explore

<    1   2   3   4   5