[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2018-09-17 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/13440 @srowen @thunterdb this PR passes all tests and merges cleanly -- can you take another look? It's been open for quite a while now

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-10-18 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/13440 I agree with @felixcheung -- @srowen or @thunterdb, can you take a look at this? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-08-23 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/13440 @HyukjinKwon @thunterdb can you all take a look at this? It's been under review for quite a long time! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-07-21 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/13440 @thunterdb can you take a look at this now that 2.2 is out? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15150: [SPARK-17595] [MLLib] Use a bounded priority queu...

2016-09-20 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15150#discussion_r79641840 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -580,10 +581,14 @@ class Word2VecModel private[spark] ( ind

[GitHub] spark pull request #15150: [SPARK-17595] [MLLib] Use a bounded priority queu...

2016-09-20 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15150#discussion_r79637489 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -580,10 +581,14 @@ class Word2VecModel private[spark] ( ind

[GitHub] spark pull request #15150: [SPARK-17595] [MLLib] Use a bounded priority queu...

2016-09-20 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15150#discussion_r79600882 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -580,7 +581,11 @@ class Word2VecModel private[spark] ( ind += 1

[GitHub] spark issue #15150: [SPARK-17595] [MLLib] Use a bounded priority queue to fi...

2016-09-19 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/15150 Thanks for the feedback, @srowen! I've made the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15150: Use a bounded priority queue to find synonyms in ...

2016-09-19 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/15150 Use a bounded priority queue to find synonyms in Word2VecModel ## What changes were proposed in this pull request? The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements

[GitHub] spark issue #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms no long...

2016-09-16 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/15105 Thanks for the feedback @srowen! I think 18e6bfe addresses everything from a code perspective, but it missed removing the comment about assuming that distinct words have distinct vector

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-16 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r79157601 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -227,7 +227,7 @@ class Word2VecModel private[ml] ( */ @Since

[GitHub] spark issue #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms no long...

2016-09-15 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/15105 @srowen Yes. But if we're querying for words similar to a given vector, then there's no word to filter out (and, indeed, no way to know which word we might want to filter out if multiple words might

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r79005872 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -566,8 +582,8 @@ class Word2VecModel private[spark

[GitHub] spark issue #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms no long...

2016-09-15 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/15105 Can we assume in general that distinct words will not have identical (viz., by `.equals`) vector representations? I ask because the behavior in the current documentation (in that pyspark test I

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r78995496 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -566,8 +582,8 @@ class Word2VecModel private[spark

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r78983451 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -566,8 +582,8 @@ class Word2VecModel private[spark

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r78981374 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -68,6 +69,21 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r78981066 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -566,8 +582,8 @@ class Word2VecModel private[spark

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-15 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/15105#discussion_r78973934 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -566,8 +582,8 @@ class Word2VecModel private[spark

[GitHub] spark issue #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms no long...

2016-09-14 Thread willb
Github user willb commented on the issue: https://github.com/apache/spark/pull/15105 Thanks, @hhbyyh. This is what I get for running `sbt "testOnly ..."` I'll push a fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #15105: [SPARK-17548] [MLlib] Word2VecModel.findSynonyms ...

2016-09-14 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/15105 [SPARK-17548] [MLlib] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector ## What changes were proposed in this pull request? This pull request

[GitHub] spark pull request: SPARK-2863: [SQL] Add facilities for function-...

2014-10-29 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2768#issuecomment-60934954 I've rebased this to fix merge conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2863: [SQL] Add facilities for function-...

2014-10-13 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2768#issuecomment-58893902 Two open questions, and the latter is more relevant: is requiring that actuals are casted to the types of formals too restrictive? Is it likely to lead to type-coercion

[GitHub] spark pull request: SPARK-2863: [SQL] Add facilities for function-...

2014-10-11 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/2768 SPARK-2863: [SQL] Add facilities for function-argument coercion This commit adds the `SignedFunction` trait and modifies the `Sqrt` expression class to use it for coercing its argument

[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...

2014-09-26 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/2547 SPARK-3699: SQL and Hive console tasks now clean up appropriately The sbt tasks sql/console and hive/console will now `stop()` the `SparkContext` upon exit. Previously, they left an ugly stack

[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...

2014-09-26 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2547#issuecomment-57006526 Looks like Jenkins and GitHub are having a minor disagreement: FATAL: Failed to fetch from https://github.com/apache/spark.git --- If your project is set up

[GitHub] spark pull request: SPARK-3699: SQL and Hive console tasks now cle...

2014-09-26 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2547#issuecomment-57032615 This patch only changes the sbt build file; it adds no classes and should not have caused any tests to fail. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56121260 @andrewor14 This bothers me too, but in a slightly different way: calling the parameter “overhead” when it really refers to how to scale requested memory

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-17 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55902476 Here's a somewhat-related concern: it seems like JVM overhead is unlikely to scale linearly with requested heap size, so maybe a straight-up 15% isn't a great default

[GitHub] spark pull request: Fix postfixOps warnings in the test suite

2014-09-11 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-55345856 Hi @jkbradley; thanks for taking a look. Here are the warnings as I see them when compiling tests on the immediate ancestor of my branch, which is 56e009d (I'm running

[GitHub] spark pull request: Fix postfixOps warnings in the test suite

2014-09-11 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-55351655 Hey @andrewor14, thanks for the reply. First off, I absolutely agree with @srowen's comment on #1330 that `import`s (not compiler flags) are the right way to handle

[GitHub] spark pull request: Fix postfixOps warnings in the test suite

2014-09-11 Thread willb
Github user willb closed the pull request at: https://github.com/apache/spark/pull/1323 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-3423: [SQL] Implement BETWEEN for SQLPar...

2014-09-05 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/2295 SPARK-3423: [SQL] Implement BETWEEN for SQLParser You can merge this pull request into a Git repository by running: $ git pull https://github.com/willb/spark sql-between Alternatively you can

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-09-03 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2220#issuecomment-54303443 This failure (in `SparkSubmitSuite`) appears unrelated to my patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-09-02 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2220#issuecomment-54165469 @concretevitamin I cherry-picked @aarondav's fix (and added a very simple fix to handle cases that it didn't). --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-08-31 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/2220 SPARK-3329: [SQL] Don't depend on Hive SET pair ordering. This fixes some possible spurious test failures in `HiveQuerySuite` by comparing sets of key-value pairs as sets, rather than as lists. You

[GitHub] spark pull request: SPARK-3329: [SQL] Don't depend on Hive SET pai...

2014-08-31 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2220#issuecomment-53996991 Thanks, @concretevitamin! I'll close this one then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: spark-729: predictable closure capture

2014-08-29 Thread willb
Github user willb closed the pull request at: https://github.com/apache/spark/pull/1322 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: spark-729: predictable closure capture

2014-08-29 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1322#issuecomment-53903366 @mateiz sure; I've tracked down the problem but am a bit stumped by how to fix it. I'll reopen when I have a solution. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-28 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1750#issuecomment-53753423 @marmbrus I've rebased this atop the current master and added support for casting string-valued SQRT arguments to double and a test case in SQLQuerySuite for the same

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-27 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1750#discussion_r16815625 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -935,6 +936,7 @@ private[hive] object HiveQl { case Token(DIV(), left

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-06 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1750#discussion_r15894307 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -935,6 +936,7 @@ private[hive] object HiveQl { case Token(DIV(), left

[GitHub] spark pull request: Fix postfixOps warnings in the test suite

2014-08-06 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-51378794 Can someone take a look at this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-04 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1750#discussion_r15791823 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -935,6 +936,7 @@ private[hive] object HiveQl { case Token(DIV(), left

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1750#issuecomment-51143752 @marmbrus I'll file a JIRA for that and am happy to put it at the front of my plate; sounds like a fun problem! --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-02 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1750 SPARK-2813: [SQL] Implement SQRT() directly in Catalyst This PR adds a native implementation for SQL SQRT() and thus avoids delegating this function to Hive. You can merge this pull request

[GitHub] spark pull request: SPARK-2226: [SQL] transform HAVING clauses wit...

2014-07-23 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49876099 Thanks, @marmbrus; this works now and I understand where things were going wrong (and also, hopefully, how to use Catalyst more idiomatically). --- If your project is set

[GitHub] spark pull request: SPARK-2226: [SQL] transform HAVING clauses wit...

2014-07-23 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49912686 @marmbrus, thanks again; sorry to have missed those earlier! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: spark-729: predictable closure capture

2014-07-23 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1322#issuecomment-49913436 @mateiz, I think there's a memory blowup somewhere in this patch as it is and am trying to track it down. Coincidentally, it's what I was switching context back to when I

[GitHub] spark pull request: spark-729: predictable closure capture

2014-07-23 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1322#issuecomment-49926246 Thanks, @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2226: [SQL] transform HAVING clauses wit...

2014-07-22 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49726353 @pwendell Sure, and I'll do this in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2226: [SQL] transform HAVING clauses wit...

2014-07-22 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49726591 @marmbrus I've made the changes you suggested and moved the rule to the Resolution batch but now the newly-inserted attribute references remain unresolved after analysis

[GitHub] spark pull request: SPARK-2226: [SQL] transform HAVING clauses wit...

2014-07-22 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49783600 @marmbrus I've pushed it to https://github.com/willb/spark/tree/spark-2226-broken -- if I split out the `UnresolvedHavingClauseAttributes` to a separate batch and add

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-21 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15172123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,34 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1497#issuecomment-49542704 Thanks, @rxin! I've made these changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with unre...

2014-07-19 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1497 SPARK-2226: transform HAVING clauses with unresolvable attributes This commit adds an analyzer rule to 1. find expressions in `HAVING` clause filters that depend on unresolved attributes

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49159849 That was my thought as well, @egraldlo. Thanks for submitting this, @chutium! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2407: Added Parser of SQL SUBSTR()

2014-07-16 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1442#issuecomment-49160982 @egraldlo, couldn't it be `(SUBSTR | SUBSTRING) ~ // ... ` in that case? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-15 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1413#issuecomment-49023679 @aarondav @pwendell Yes, with this patch I'm able to enable the YourKit features that were causing crashes before. I'll submit an update to fix the bracket style and cc

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1419 Reformat multi-line closure argument. You can merge this pull request into a Git repository by running: $ git pull https://github.com/willb/spark reformat-2486 Alternatively you can review

[GitHub] spark pull request: Reformat multi-line closure argument.

2014-07-15 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1419#issuecomment-49024982 (See discussion on #1413; cc @aarondav and @pwendell.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-48971559 Thanks, Michael. I've fixed the conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-14 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1413 SPARK-2486: Utils.getCallSite is now resilient to bogus frames When running Spark under certain instrumenting profilers, Utils.getCallSite could crash with an NPE. This commit makes it more

[GitHub] spark pull request: SPARK-2486: Utils.getCallSite is now resilient...

2014-07-14 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1413#discussion_r14916362 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -809,7 +809,11 @@ private[spark] object Utils extends Logging { */ def

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14782197 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -207,3 +210,64 @@ case class StartsWith(left

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14782431 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -860,6 +860,7 @@ private[hive] object HiveQl { val BETWEEN = (?i

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-48643276 Thanks for the comments, @concretevitamin. I've made the changes from your comments and will think about reducing branch overhead before pushing an update. --- If your

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14783468 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -207,3 +210,64 @@ case class StartsWith(left

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14783626 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -207,3 +210,64 @@ case class StartsWith(left

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1359#issuecomment-48648670 @concretevitamin Inlining `Substring.slice` seems to make a nontrivial difference (~11.5% speedup) on a very simple `Substring.eval()` microbenchmark. --- If your project

[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-10 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14802394 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -207,3 +207,71 @@ case class StartsWith(left

[GitHub] spark pull request: Fix (some of the) warnings in the test suite

2014-07-08 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-48326403 @rxin Done; I also updated the comment to reflect the narrower focus after eliminating overlap with #1153. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request: spark-729: predictable closure capture

2014-07-07 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1322 spark-729: predictable closure capture SPARK-729 concerns when free variables in closure arguments to transformations are captured. Currently, it is possible for closures to get the environment

[GitHub] spark pull request: Fix (some of the) warnings in the test suite

2014-07-07 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1323 Fix (some of the) warnings in the test suite This PR fixes three classes of compiler and deprecation warnings in the test suite: * `expectResult` is currently deprecated and has been

[GitHub] spark pull request: Fix (some of the) warnings in the test suite

2014-07-07 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-48243538 @srowen sorry, I hadn't noticed the other PR! I think the `postfixOps` change in mine is disjoint, though. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-29 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/143#discussion_r14329726 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -153,6 +153,18 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-29 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-47459912 Sorry, I missed FailureSuite. I have a fix but ran out of battery before I could push. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-23 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-46860741 Can someone take another look at this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46677366 Thanks for the quick review and patch, @rxin! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46724443 @rxin, re: the former, seems like most implementations signal this as an error. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46725581 OK, I wasn't sure if strict Hive compatibility was the goal. I'm happy to take these tickets. Thanks again! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2225] Turn HAVING without GROUP BY into...

2014-06-20 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1161#issuecomment-46727408 LGTM; this is basically exactly what I did (https://github.com/willb/spark/commit/b272f6be925ba50741e0a5093244926ea4a7a9a8) --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/1136 SPARK-2180: support HAVING clauses in Hive queries This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46615691 @rxin, I'm not 100% sure but I think it's a problem with local map/reduce (the stack trace isn't too informative, but it's the same as the one for tests

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46635597 Thanks for the catch, @rxin! I'll make the change and add tests for it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46640150 So I've added a cast in cases in which non-boolean expressions are supplied to having expressions. It appears that `Cast(_, BooleanType)` isn't idempotent, though

[GitHub] spark pull request: SPARK-2180: support HAVING clauses in Hive que...

2014-06-19 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/1136#issuecomment-46642661 Thanks! I'm happy to put together a preliminary patch as well, but probably won't be able to take a look until tomorrow morning. --- If your project is set up

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-18 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-46498345 I just rebased this branch atop master so it could be tested again. I see that it failed under Jenkins. However, I am unable to reproduce the local metrics failure in my

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-06-18 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-46500775 Thanks @rxin! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-05-15 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-42712135 I'm not able to reproduce the above failure locally (either on OS X or Linux). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-05-15 Thread willb
GitHub user willb reopened a pull request: https://github.com/apache/spark/pull/143 SPARK-897: preemptively serialize closures These commits cause `ClosureCleaner.clean` to attempt to serialize the cleaned closure with the default closure serializer and throw a `SparkException

[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-13 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/717#issuecomment-43009698 Thanks, @rxin! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-12 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/717#issuecomment-42911366 @rxin, yes! (I was firing up a debugger in case you wanted to know exactly where it was returning to.) --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-05-11 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-42703799 I'd like to reopen this PR, since #189 had to be reverted and is still in flight, but these fixes are independently useful. I've rebased the branch atop the current master

[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-10 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/717 SPARK-571: forbid return statements in cleaned closures This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements and raises an exception if it finds any

[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-10 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/717#issuecomment-42730126 Thanks for the quick feedback, @andrewor14! I'll make those changes. I'll also look in to see if I can figure out why that repartition test is failing; oddly, it's

[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-10 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/717#issuecomment-42760025 Thanks for the clarification, @pwendell. @andrewor14, I've grouped my imports and collapsed that function. Thanks again. --- If your project is set up

[GitHub] spark pull request: SPARK-1501: Ensure assertions in Graph.apply a...

2014-04-15 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/415 SPARK-1501: Ensure assertions in Graph.apply are asserted. The Graph.apply test in GraphSuite had some assertions in a closure in a graph transformation. As a consequence, these assertions never

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-10 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-40070735 @pwendell That's odd, since I didn't see any failures locally either. I haven't rebased this branch in a while, though, so I can check again. Can you send me a link

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-04-09 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-40035997 This is subsumed by #189. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

  1   2   >