[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20980 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89005/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20980 **[Test build #89005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89005/testReport)** for PR 20980 at commit [`8783b2b`](https://github.com/apache/spark/commit/8783b2b76d6e2b2848d874676d68e76c5f360e8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20925 **[Test build #89002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89002/testReport)** for PR 20925 at commit [`d208e33`](https://github.com/apache/spark/commit/d208e33e57683e60c72f6a81bc65086faf6595e9). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89002/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2056/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20980 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20980 **[Test build #89005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89005/testReport)** for PR 20980 at commit [`8783b2b`](https://github.com/apache/spark/commit/8783b2b76d6e2b2848d874676d68e76c5f360e8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20980 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18998: [SPARK-21748][ML] Migrate the implementation of H...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18998#discussion_r179903481 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala --- @@ -93,11 +97,21 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String) @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { val outputSchema = transformSchema(dataset.schema) -val hashingTF = new feature.HashingTF($(numFeatures)).setBinary($(binary)) -// TODO: Make the hashingTF.transform natively in ml framework to avoid extra conversion. -val t = udf { terms: Seq[_] => hashingTF.transform(terms).asML } +val hashUDF = udf { (terms: Seq[_]) => + val ids = terms.map { term => --- End diff -- @sethah Hi, thank all for your review and comments. However, since it has been a quite long time with no activity, is it a good idea to close the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2017/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89004/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89004/testReport)** for PR 20989 at commit [`3d8858a`](https://github.com/apache/spark/commit/3d8858ae6b60fb7453eb501c54d8f3f1e6612880). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class SchemaColumnConvertNotSupportedException extends RuntimeException ` * `class QueryExecutionException(message: String, cause: Throwable = null)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2017/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2055/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89004/testReport)** for PR 20989 at commit [`3d8858a`](https://github.com/apache/spark/commit/3d8858ae6b60fb7453eb501c54d8f3f1e6612880). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20995: [SPARK-23882][Core] UTF8StringSuite.writeToOutputStreamU...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20995 @ueshin, sorry for my mistake again. I will fix this at #20871 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89001/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20858 **[Test build #89001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89001/testReport)** for PR 20858 at commit [`090929f`](https://github.com/apache/spark/commit/090929f5e35e1f8aec3e83484cc8227a0436e5d7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Concat(children: Seq[Expression]) extends Expression ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2016/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89003/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2016/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89003/testReport)** for PR 20989 at commit [`d9f46d3`](https://github.com/apache/spark/commit/d9f46d35ba8aa4ae730fe63d81e18b2452d55d05). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2054/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89003/testReport)** for PR 20989 at commit [`d9f46d3`](https://github.com/apache/spark/commit/d9f46d35ba8aa4ae730fe63d81e18b2452d55d05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20825: add impurity stats in tree leaf node debug string
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20825 I actually would prefer not to merge this change since it could blow up the size of the strings printed for some classification tasks with large numbers of labels. If people want to debug, they could trace through the tree manually. Alternatively, I'd be OK with adding an optional argument which tells toDebugString to include the stats. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20986: [SPARK-23864][SQL] Add unsafe object writing to U...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20986#discussion_r179897021 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -103,42 +106,27 @@ protected final void zeroOutPaddingBytes(int numBytes) { public abstract void write(int ordinal, Decimal input, int precision, int scale); public final void write(int ordinal, UTF8String input) { -final int numBytes = input.numBytes(); -final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes); - -// grow the global buffer before writing data. -grow(roundedSize); - -zeroOutPaddingBytes(numBytes); - -// Write the bytes to the variable length portion. -input.writeToMemory(getBuffer(), cursor()); - -setOffsetAndSize(ordinal, numBytes); - -// move the cursor forward. -increaseCursor(roundedSize); +writeUnalignedBytes(ordinal, input.getBaseObject(), input.getBaseOffset(), input.numBytes()); } public final void write(int ordinal, byte[] input) { write(ordinal, input, 0, input.length); } public final void write(int ordinal, byte[] input, int offset, int numBytes) { -final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(input.length); --- End diff -- Good catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2053/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20925 **[Test build #89002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89002/testReport)** for PR 20925 at commit [`d208e33`](https://github.com/apache/spark/commit/d208e33e57683e60c72f6a81bc65086faf6595e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179892764 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -99,17 +100,27 @@ */ private boolean allowsMixedArguments; + /** + * This constructor is used when creating a user-configurable launcher. It allows the + * spark-submit argument list to be modified after creation. + */ SparkSubmitCommandBuilder() { -this.sparkArgs = new ArrayList<>(); this.isAppResourceReq = true; this.isExample = false; +this.parsedArgs = new ArrayList<>(); +this.userArgs = new ArrayList<>(); } + /** + * This constructor is used when invoking spark-submit; it parses and validates arguments + * provided by the user on the command line. + */ SparkSubmitCommandBuilder(List args) { this.allowsMixedArguments = false; -this.sparkArgs = new ArrayList<>(); +this.parsedArgs = new ArrayList<>(); boolean isExample = false; List submitArgs = args; +this.userArgs = null; --- End diff -- If you want to take a stab at refactoring... I'm not so sure you'd be able to make things much better though, since the parameters just control shared logic that is applied later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179892170 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -88,7 +88,8 @@ SparkLauncher.NO_RESOURCE); } - final List sparkArgs; + final List userArgs; --- End diff -- That's overkill for final fields. Even more if those fields are package-private. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179892080 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S } private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = { --- End diff -- The intent is to "exit" the submission process (even if there's no "exit" in some cases). The different name would also feel weird given the "exitCode" parameter. So even if not optimal I prefer the current name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on the issue: https://github.com/apache/spark/pull/20327 closing this PR since the bind bug is fixed, the rest is achievable per configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20858 **[Test build #89001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89001/testReport)** for PR 20858 at commit [`090929f`](https://github.com/apache/spark/commit/090929f5e35e1f8aec3e83484cc8227a0436e5d7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20994 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20994 Thanks for reviewing! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20828 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20828 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88999/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20828 **[Test build #88999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88999/testReport)** for PR 20828 at commit [`6d424ff`](https://github.com/apache/spark/commit/6d424ff67f22581ebbf240ac54089d1dee8e82b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel shou...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/20968 @BryanCutler Thank you very much for your help! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/20816 @gatorsmile Do I need to sync this branch and let the tests run again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20999 thanks @gatorsmile , I missed them. I see that #19691 is still open and waiting for review. Probably I should close this one and we can go on on that PR. But I have seen no activity on it for a while, is there any reason? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20986: [SPARK-23864][SQL] Add unsafe object writing to U...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20986#discussion_r179867664 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -103,42 +106,27 @@ protected final void zeroOutPaddingBytes(int numBytes) { public abstract void write(int ordinal, Decimal input, int precision, int scale); public final void write(int ordinal, UTF8String input) { -final int numBytes = input.numBytes(); -final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes); - -// grow the global buffer before writing data. -grow(roundedSize); - -zeroOutPaddingBytes(numBytes); - -// Write the bytes to the variable length portion. -input.writeToMemory(getBuffer(), cursor()); - -setOffsetAndSize(ordinal, numBytes); - -// move the cursor forward. -increaseCursor(roundedSize); +writeUnalignedBytes(ordinal, input.getBaseObject(), input.getBaseOffset(), input.numBytes()); } public final void write(int ordinal, byte[] input) { write(ordinal, input, 0, input.length); } public final void write(int ordinal, byte[] input, int offset, int numBytes) { -final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(input.length); --- End diff -- I am accidentally fixing a bug here :) cc @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2014/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89000/testReport)** for PR 20989 at commit [`cb789ff`](https://github.com/apache/spark/commit/cb789ff821dc78b589f2ae806c963b2e1a8c2cff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89000/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2014/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2052/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20989 **[Test build #89000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89000/testReport)** for PR 20989 at commit [`cb789ff`](https://github.com/apache/spark/commit/cb789ff821dc78b589f2ae806c963b2e1a8c2cff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/19193 Let me check other databases and come up with a summary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20992 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20992 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88995/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20992 **[Test build #88995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88995/testReport)** for PR 20992 at commit [`64c5d23`](https://github.com/apache/spark/commit/64c5d23c269885a4d90346ef5e1efcfcd0748511). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88997/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #88997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88997/testReport)** for PR 20937 at commit [`3b30ce0`](https://github.com/apache/spark/commit/3b30ce036fbd2a8d6b9b2cf40a418624ecccda25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20999 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20999 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88996/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20999 **[Test build #88996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88996/testReport)** for PR 20999 at commit [`b57a5d1`](https://github.com/apache/spark/commit/b57a5d1797dbe206aeb0a4d2a24ccd0c73845dc8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20629 @holdenk I am not sure I got 100% what you meant, so I'll try to answer but let me know if I missed something. The problem of doing 2 passes is related to cluster centers. The API of `ClusteringEvaluator` (as of any `Evaluator`) is very simple: it is has a method which gets a `Dataset` and returns a value. So, unlike the method here - which is part of the `KMeansModel` and it can get the cluster centers from it -, there is no clue about the cluster centers: computing them is easy but it requires a pass on the dataset (this is the extra pass I mentioned). An alternative to this is adding a `setClusterCenters` method on the `ClusteringEvaluator`, but I am not sure whether this is worth since they are needed only for this metric, while for the others so far (the Silhouette measure) they are useless. Moreover, this metric was introduced explicitly as a temp fix because we were missing any other (better) evaluation metric and it was supposed to be dismissed once a better evaluation metric would have been introduced (please see the related JIRA and PR). So I am not sure that introducing a new method specifically for this metric is a good idea. What do you think? Were you suggesting this second option? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20874 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20874 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88993/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel shou...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20968 merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20874 **[Test build #88993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88993/testReport)** for PR 20874 at commit [`0112d03`](https://github.com/apache/spark/commit/0112d03a88edca49117946c221c4ef86ca1f7221). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerMod...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20968 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20280 Hey @BryanCutler is this still on your radar? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20908: [WIP][SPARK-23672][PYTHON] Document support for n...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/20908#discussion_r179843510 --- Diff: python/pyspark/sql/tests.py --- @@ -3966,6 +3967,15 @@ def random_udf(v): random_udf = random_udf.asNondeterministic() return random_udf +def test_pandas_udf_tokenize(self): +from pyspark.sql.functions import pandas_udf +tokenize = pandas_udf(lambda s: s.apply(lambda str: str.split(' ')), --- End diff -- @HyukjinKwon It doesn't, but given that the old documentation implied that the ionization usecase wouldn't work I thought it would be good to illustrate that it does in a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20701 ping @sethah - what do you think about if this needs a separate training summary trait? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20945 @susanxhuynh @vanzin It seems to me that if SPARK-20982 is fixed then from what I see all secret stores I searched provide an http API: https://github.com/kubernetes/kubernetes/blob/09f321c80bfc9bca63a5530b56d7a1a3ba80ba9b/pkg/kubectl/cmd/util/factory_client_access.go#L473 https://www.vaultproject.io/api/index.html https://docs.openshift.org/latest/rest_api/api/v1.Secret.html https://v1-9.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#secret-v1-core https://docs.mesosphere.com/1.8/administration/secrets/secrets-api/ So generating DTs at the first spark submit and then using an http API should be good enough, although all envs like k8s or DC/OS usually have a cli utility to do the job. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20629 So when you say "second pass over the data" - from looking at this it seems like it would could do this with just a second map to look up the predictions in the already computed cluster centers, not a stage boundary, so that probably wouldn't be all that expensive given how Spark does pipe-lining unless I'm mussing something. This would mean that we'd have to have people set the cluster centers from their model when they wanted to do that evaluation type but given that the evaluate wouldn't be able to recover the cluster centers from a test that differed from the training set I think that would be reasonable. That being said its been awhile since I've looked at the evaluator code so I could be coming out of left field. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179816905 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S } private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = { --- End diff -- Consider renaming the method. What about printUsageAndThrowException? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179832847 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -99,17 +100,27 @@ */ private boolean allowsMixedArguments; + /** + * This constructor is used when creating a user-configurable launcher. It allows the + * spark-submit argument list to be modified after creation. + */ SparkSubmitCommandBuilder() { -this.sparkArgs = new ArrayList<>(); this.isAppResourceReq = true; this.isExample = false; +this.parsedArgs = new ArrayList<>(); +this.userArgs = new ArrayList<>(); } + /** + * This constructor is used when invoking spark-submit; it parses and validates arguments + * provided by the user on the command line. + */ SparkSubmitCommandBuilder(List args) { this.allowsMixedArguments = false; -this.sparkArgs = new ArrayList<>(); +this.parsedArgs = new ArrayList<>(); boolean isExample = false; List submitArgs = args; +this.userArgs = null; --- End diff -- Consider Collections.emptyList(). I see these two constructors covers two different use cases. An abstract base class with two derived classes could express this two uses cases better but I know it is out of scope for now. Does it make sense to create a Jira ticket for refactoring this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179814761 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S } --- End diff -- This might be a good candidate to use your new error method instead of throwing the Exception directly. It might happen there is client catching both Exception and SparkException and doing very different things but I guess that is very unlikely case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179825806 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -88,7 +88,8 @@ SparkLauncher.NO_RESOURCE); } - final List sparkArgs; + final List userArgs; --- End diff -- Consider making it private and accessing via methods. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20925#discussion_r179834098 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -400,6 +419,11 @@ private boolean isThriftServer(String mainClass) { private class OptionParser extends SparkSubmitOptionParser { boolean isAppResourceReq = true; +boolean errorOnUnknownArgs; --- End diff -- private --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88994/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20816 **[Test build #88994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88994/testReport)** for PR 20816 at commit [`7fe9329`](https://github.com/apache/spark/commit/7fe93295df5627f2fc4e712b71aa9ce75383d410). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20319 @smurakozi Thanks for the PR! I have bandwidth to review this now. Do you have time to rebase this to fix the merge conflicts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20987 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20828 **[Test build #88999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88999/testReport)** for PR 20828 at commit [`6d424ff`](https://github.com/apache/spark/commit/6d424ff67f22581ebbf240ac54089d1dee8e82b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179824556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -102,10 +102,11 @@ object KolmogorovSmirnovTest { */ @Since("2.4.0") @varargs - def test(dataset: DataFrame, sampleCol: String, distName: String, params: Double*): DataFrame = { + def test(dataset: Dataset[_], sampleCol: String, distName: String, params: Double*) --- End diff -- nit: This doesn't fit scala style; please get familiar with the style we use for multi-line function headers! Just check out other parts of MLlib for examples. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179831482 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +:param dataset: + a dataset or a dataframe containing the sample of data to test. +:param sampleCol: + Name of sample column in dataset, of any numerical type. +:param distName: + a `string` name for a theoretical distribution, currently only support "norm". +:param params: + a list of `Double` values specifying the parameters to be used for the theoretical + distribution +:return: + A dataframe that contains the Kolmogorov-Smirnov test result for the input sampled data. + This DataFrame will contain a single Row with the following fields: + - `pValue: Double` + - `statistic: Double` + +>>> from pyspark.ml.stat import KolmogorovSmirnovTest +>>> dataset = [[-1.0], [0.0], [1.0]] +>>> dataset = spark.createDataFrame(dataset, ['sample']) +>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 0.0, 1.0).collect()[0] --- End diff -- nit: use first() instead of collect()[0] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179833156 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +:param dataset: + a dataset or a dataframe containing the sample of data to test. +:param sampleCol: + Name of sample column in dataset, of any numerical type. +:param distName: + a `string` name for a theoretical distribution, currently only support "norm". +:param params: + a list of `Double` values specifying the parameters to be used for the theoretical --- End diff -- I realized we should list what the parameters are, both here and in the Scala docs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88991/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179830986 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +:param dataset: + a dataset or a dataframe containing the sample of data to test. --- End diff -- nit: dataset -> Dataset, dataframe -> DataFrame (It's nice to write class names the way they are defined.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179832593 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -81,7 +81,7 @@ object KolmogorovSmirnovTest { * Java-friendly version of `test(dataset: DataFrame, sampleCol: String, cdf: Double => Double)` */ @Since("2.4.0") - def test(dataset: DataFrame, sampleCol: String, + def test(dataset: Dataset[_], sampleCol: String, cdf: Function[java.lang.Double, java.lang.Double]): DataFrame = { --- End diff -- I guess I missed this before. Would you mind fixing the scala style here too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179824228 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -59,7 +59,7 @@ object KolmogorovSmirnovTest { * distribution of the sample data and the theoretical distribution we can provide a test for the * the null hypothesis that the sample data comes from that theoretical distribution. * - * @param dataset a `DataFrame` containing the sample of data to test + * @param dataset A dataset or a dataframe containing the sample of data to test --- End diff -- nit: It's nicer to keep single back quotes ``` `DataFrame` ``` to make these show up as code in docs for clarity. No need to get rid of that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20904#discussion_r179832114 --- Diff: python/pyspark/ml/stat.py --- @@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCorrObj.corr(*args)) +class KolmogorovSmirnovTest(object): +""" +.. note:: Experimental + +Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled from a continuous +distribution. + +By comparing the largest difference between the empirical cumulative +distribution of the sample data and the theoretical distribution we can provide a test for the +the null hypothesis that the sample data comes from that theoretical distribution. + +:param dataset: --- End diff -- I see you're following the example of ChiSquareTest, but this Param documentation belongs with the test method, not the class. Could you please shift it? (Feel free to correct it for ChiSquareTest here or in another PR.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20987 **[Test build #88991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/testReport)** for PR 20987 at commit [`b387552`](https://github.com/apache/spark/commit/b387552f7c2a546ac7290be6da007678875814d7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20828 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88998/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org