[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20980
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89005/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20980
  
**[Test build #89005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89005/testReport)**
 for PR 20980 at commit 
[`8783b2b`](https://github.com/apache/spark/commit/8783b2b76d6e2b2848d874676d68e76c5f360e8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20925
  
**[Test build #89002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89002/testReport)**
 for PR 20925 at commit 
[`d208e33`](https://github.com/apache/spark/commit/d208e33e57683e60c72f6a81bc65086faf6595e9).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89002/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2056/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20980
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20980
  
**[Test build #89005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89005/testReport)**
 for PR 20980 at commit 
[`8783b2b`](https://github.com/apache/spark/commit/8783b2b76d6e2b2848d874676d68e76c5f360e8b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...

2018-04-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20980
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18998: [SPARK-21748][ML] Migrate the implementation of H...

2018-04-06 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18998#discussion_r179903481
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 
---
@@ -93,11 +97,21 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 val outputSchema = transformSchema(dataset.schema)
-val hashingTF = new 
feature.HashingTF($(numFeatures)).setBinary($(binary))
-// TODO: Make the hashingTF.transform natively in ml framework to 
avoid extra conversion.
-val t = udf { terms: Seq[_] => hashingTF.transform(terms).asML }
+val hashUDF = udf { (terms: Seq[_]) =>
+  val ids = terms.map { term =>
--- End diff --

@sethah Hi, thank all for your review and comments. However, since it has 
been a quite long time with no activity, is it a good idea to close the PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2017/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89004/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89004 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89004/testReport)**
 for PR 20989 at commit 
[`3d8858a`](https://github.com/apache/spark/commit/3d8858ae6b60fb7453eb501c54d8f3f1e6612880).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class SchemaColumnConvertNotSupportedException extends 
RuntimeException `
  * `class QueryExecutionException(message: String, cause: Throwable = 
null)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2017/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2055/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89004/testReport)**
 for PR 20989 at commit 
[`3d8858a`](https://github.com/apache/spark/commit/3d8858ae6b60fb7453eb501c54d8f3f1e6612880).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20995: [SPARK-23882][Core] UTF8StringSuite.writeToOutputStreamU...

2018-04-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20995
  
@ueshin, sorry for my mistake again. I will fix this at #20871



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89001/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #89001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89001/testReport)**
 for PR 20858 at commit 
[`090929f`](https://github.com/apache/spark/commit/090929f5e35e1f8aec3e83484cc8227a0436e5d7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Concat(children: Seq[Expression]) extends Expression `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2016/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89003/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2016/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89003 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89003/testReport)**
 for PR 20989 at commit 
[`d9f46d3`](https://github.com/apache/spark/commit/d9f46d35ba8aa4ae730fe63d81e18b2452d55d05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2054/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89003/testReport)**
 for PR 20989 at commit 
[`d9f46d3`](https://github.com/apache/spark/commit/d9f46d35ba8aa4ae730fe63d81e18b2452d55d05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20825: add impurity stats in tree leaf node debug string

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20825
  
I actually would prefer not to merge this change since it could blow up the 
size of the strings printed for some classification tasks with large numbers of 
labels.  If people want to debug, they could trace through the tree manually.

Alternatively, I'd be OK with adding an optional argument which tells 
toDebugString to include the stats.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20986: [SPARK-23864][SQL] Add unsafe object writing to U...

2018-04-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20986#discussion_r179897021
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -103,42 +106,27 @@ protected final void zeroOutPaddingBytes(int 
numBytes) {
   public abstract void write(int ordinal, Decimal input, int precision, 
int scale);
 
   public final void write(int ordinal, UTF8String input) {
-final int numBytes = input.numBytes();
-final int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes);
-
-// grow the global buffer before writing data.
-grow(roundedSize);
-
-zeroOutPaddingBytes(numBytes);
-
-// Write the bytes to the variable length portion.
-input.writeToMemory(getBuffer(), cursor());
-
-setOffsetAndSize(ordinal, numBytes);
-
-// move the cursor forward.
-increaseCursor(roundedSize);
+writeUnalignedBytes(ordinal, input.getBaseObject(), 
input.getBaseOffset(), input.numBytes());
   }
 
   public final void write(int ordinal, byte[] input) {
 write(ordinal, input, 0, input.length);
   }
 
   public final void write(int ordinal, byte[] input, int offset, int 
numBytes) {
-final int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(input.length);
--- End diff --

Good catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2053/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20925
  
**[Test build #89002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89002/testReport)**
 for PR 20925 at commit 
[`d208e33`](https://github.com/apache/spark/commit/d208e33e57683e60c72f6a81bc65086faf6595e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179892764
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -99,17 +100,27 @@
*/
   private boolean allowsMixedArguments;
 
+  /**
+   * This constructor is used when creating a user-configurable launcher. 
It allows the
+   * spark-submit argument list to be modified after creation.
+   */
   SparkSubmitCommandBuilder() {
-this.sparkArgs = new ArrayList<>();
 this.isAppResourceReq = true;
 this.isExample = false;
+this.parsedArgs = new ArrayList<>();
+this.userArgs = new ArrayList<>();
   }
 
+  /**
+   * This constructor is used when invoking spark-submit; it parses and 
validates arguments
+   * provided by the user on the command line.
+   */
   SparkSubmitCommandBuilder(List args) {
 this.allowsMixedArguments = false;
-this.sparkArgs = new ArrayList<>();
+this.parsedArgs = new ArrayList<>();
 boolean isExample = false;
 List submitArgs = args;
+this.userArgs = null;
--- End diff --

If you want to take a stab at refactoring... I'm not so sure you'd be able 
to make things much better though, since the parameters just control shared 
logic that is applied later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179892170
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -88,7 +88,8 @@
   SparkLauncher.NO_RESOURCE);
   }
 
-  final List sparkArgs;
+  final List userArgs;
--- End diff --

That's overkill for final fields. Even more if those fields are 
package-private.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179892080
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   }
 
   private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): 
Unit = {
--- End diff --

The intent is to "exit" the submission process (even if there's no "exit" 
in some cases). The different name would also feel weird given the "exitCode" 
parameter. So even if not optimal I prefer the current name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-04-06 Thread gerashegalov
Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/20327
  
closing this PR since the bind bug is fixed, the rest is achievable per 
configuration. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #89001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89001/testReport)**
 for PR 20858 at commit 
[`090929f`](https://github.com/apache/spark/commit/090929f5e35e1f8aec3e83484cc8227a0436e5d7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.

2018-04-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20994


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20994: [SPARK-21898][ML][FOLLOWUP] Fix Scala 2.12 build.

2018-04-06 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20994
  
Thanks for reviewing! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20828
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88999/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20828
  
**[Test build #88999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88999/testReport)**
 for PR 20828 at commit 
[`6d424ff`](https://github.com/apache/spark/commit/6d424ff67f22581ebbf240ac54089d1dee8e82b0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel shou...

2018-04-06 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/20968
  
@BryanCutler Thank you very much for your help!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-06 Thread maryannxue
Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/20816
  
@gatorsmile Do I need to sync this branch and let the tests run again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-04-06 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20999
  
thanks @gatorsmile , I missed them. I see that #19691 is still open and 
waiting for review. Probably I should close this one and we can go on on that 
PR. But I have seen no activity on it for a while, is there any reason?

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20986: [SPARK-23864][SQL] Add unsafe object writing to U...

2018-04-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20986#discussion_r179867664
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -103,42 +106,27 @@ protected final void zeroOutPaddingBytes(int 
numBytes) {
   public abstract void write(int ordinal, Decimal input, int precision, 
int scale);
 
   public final void write(int ordinal, UTF8String input) {
-final int numBytes = input.numBytes();
-final int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes);
-
-// grow the global buffer before writing data.
-grow(roundedSize);
-
-zeroOutPaddingBytes(numBytes);
-
-// Write the bytes to the variable length portion.
-input.writeToMemory(getBuffer(), cursor());
-
-setOffsetAndSize(ordinal, numBytes);
-
-// move the cursor forward.
-increaseCursor(roundedSize);
+writeUnalignedBytes(ordinal, input.getBaseObject(), 
input.getBaseOffset(), input.numBytes());
   }
 
   public final void write(int ordinal, byte[] input) {
 write(ordinal, input, 0, input.length);
   }
 
   public final void write(int ordinal, byte[] input, int offset, int 
numBytes) {
-final int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(input.length);
--- End diff --

I am accidentally fixing a bug here :)

cc @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2014/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89000/testReport)**
 for PR 20989 at commit 
[`cb789ff`](https://github.com/apache/spark/commit/cb789ff821dc78b589f2ae806c963b2e1a8c2cff).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89000/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2014/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2052/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20989
  
**[Test build #89000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89000/testReport)**
 for PR 20989 at commit 
[`cb789ff`](https://github.com/apache/spark/commit/cb789ff821dc78b589f2ae806c963b2e1a8c2cff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2018-04-06 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/19193
  
Let me check other databases and come up with a summary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20992
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20992
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88995/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20992: [SPARK-23779][SQL] TaskMemoryManager and UnsafeSorter re...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20992
  
**[Test build #88995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88995/testReport)**
 for PR 20992 at commit 
[`64c5d23`](https://github.com/apache/spark/commit/64c5d23c269885a4d90346ef5e1efcfcd0748511).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88997/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20937
  
**[Test build #88997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88997/testReport)**
 for PR 20937 at commit 
[`3b30ce0`](https://github.com/apache/spark/commit/3b30ce036fbd2a8d6b9b2cf40a418624ecccda25).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88996/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #88996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88996/testReport)**
 for PR 20999 at commit 
[`b57a5d1`](https://github.com/apache/spark/commit/b57a5d1797dbe206aeb0a4d2a24ccd0c73845dc8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-04-06 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20629
  
@holdenk I am not sure I got 100% what you meant, so I'll try to answer but 
let me know if I missed something.

The problem of doing 2 passes is related to cluster centers. The API of 
`ClusteringEvaluator` (as of any `Evaluator`) is very simple: it is has a 
method which gets a `Dataset` and returns a value. So, unlike the method here - 
which is part of the `KMeansModel` and it can get the cluster centers from it 
-, there is no clue about the cluster centers: computing them is easy but it 
requires a pass on the dataset (this is the extra pass I mentioned).

An alternative to this is adding a `setClusterCenters` method on the 
`ClusteringEvaluator`, but I am not sure whether this is worth since they are 
needed only for this metric, while for the others so far (the Silhouette 
measure) they are useless. Moreover, this metric was introduced explicitly as a 
temp fix because we were missing any other (better) evaluation metric and it 
was supposed to be dismissed once a better evaluation metric would have been 
introduced (please see the related JIRA and PR). So I am not sure that 
introducing a new method specifically for this metric is a good idea.

What do you think? Were you suggesting this second option?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88993/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerModel shou...

2018-04-06 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20968
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20874
  
**[Test build #88993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88993/testReport)**
 for PR 20874 at commit 
[`0112d03`](https://github.com/apache/spark/commit/0112d03a88edca49117946c221c4ef86ca1f7221).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20968: [SPARK-23828][ML][PYTHON]PySpark StringIndexerMod...

2018-04-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20968


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20280
  
Hey @BryanCutler is this still on your radar?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20908: [WIP][SPARK-23672][PYTHON] Document support for n...

2018-04-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20908#discussion_r179843510
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3966,6 +3967,15 @@ def random_udf(v):
 random_udf = random_udf.asNondeterministic()
 return random_udf
 
+def test_pandas_udf_tokenize(self):
+from pyspark.sql.functions import pandas_udf
+tokenize = pandas_udf(lambda s: s.apply(lambda str: str.split(' 
')),
--- End diff --

@HyukjinKwon It doesn't, but given that the old documentation implied that 
the ionization usecase wouldn't work I thought it would be good to illustrate 
that it does in a test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-04-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20701
  
ping @sethah - what do you think about if this needs a separate training 
summary trait?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20945: [SPARK-23790][Mesos] fix metastore connection issue

2018-04-06 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/20945
  
@susanxhuynh @vanzin 
It seems to me that if SPARK-20982 is fixed then from what I see all secret 
stores I searched provide an http API:

https://github.com/kubernetes/kubernetes/blob/09f321c80bfc9bca63a5530b56d7a1a3ba80ba9b/pkg/kubectl/cmd/util/factory_client_access.go#L473
https://www.vaultproject.io/api/index.html
https://docs.openshift.org/latest/rest_api/api/v1.Secret.html

https://v1-9.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#secret-v1-core
https://docs.mesosphere.com/1.8/administration/secrets/secrets-api/

So generating DTs at the first spark submit and then using an http API 
should be good enough, although all envs like k8s or DC/OS usually have a cli 
utility to do the job.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-04-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20629
  
So when you say "second pass over the data" - from looking at this it seems 
like it would could do this with just a second map to look up the predictions 
in the already computed cluster centers, not a stage boundary, so that probably 
wouldn't be all that expensive given how Spark does pipe-lining unless I'm 
mussing something.

This would mean that we'd have to have people set the cluster centers from 
their model when they wanted to do that evaluation type but given that the 
evaluate wouldn't be able to recover the cluster centers from a test that 
differed from the training set I think that would be reasonable.

That being said its been awhile since I've looked at the evaluator code so 
I could be coming out of left field.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179816905
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   }
 
   private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): 
Unit = {
--- End diff --

Consider renaming the method.  What about printUsageAndThrowException?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179832847
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -99,17 +100,27 @@
*/
   private boolean allowsMixedArguments;
 
+  /**
+   * This constructor is used when creating a user-configurable launcher. 
It allows the
+   * spark-submit argument list to be modified after creation.
+   */
   SparkSubmitCommandBuilder() {
-this.sparkArgs = new ArrayList<>();
 this.isAppResourceReq = true;
 this.isExample = false;
+this.parsedArgs = new ArrayList<>();
+this.userArgs = new ArrayList<>();
   }
 
+  /**
+   * This constructor is used when invoking spark-submit; it parses and 
validates arguments
+   * provided by the user on the command line.
+   */
   SparkSubmitCommandBuilder(List args) {
 this.allowsMixedArguments = false;
-this.sparkArgs = new ArrayList<>();
+this.parsedArgs = new ArrayList<>();
 boolean isExample = false;
 List submitArgs = args;
+this.userArgs = null;
--- End diff --

Consider Collections.emptyList(). I see these two constructors covers two 
different use cases. An abstract base class with two derived classes could 
express this two uses cases better but I know it is out of scope for now. Does 
it make sense to create a Jira ticket for refactoring this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179814761
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 }
--- End diff --

This might be a good candidate to use your new error method instead of 
throwing the Exception directly. It might happen there is client catching both 
Exception and SparkException and doing very different things but I guess that 
is very unlikely case.   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179825806
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -88,7 +88,8 @@
   SparkLauncher.NO_RESOURCE);
   }
 
-  final List sparkArgs;
+  final List userArgs;
--- End diff --

Consider making it private and accessing via methods.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20925: [SPARK-22941][core] Do not exit JVM when submit f...

2018-04-06 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20925#discussion_r179834098
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -400,6 +419,11 @@ private boolean isThriftServer(String mainClass) {
   private class OptionParser extends SparkSubmitOptionParser {
 
 boolean isAppResourceReq = true;
+boolean errorOnUnknownArgs;
--- End diff --

private


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88994/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20816
  
**[Test build #88994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88994/testReport)**
 for PR 20816 at commit 
[`7fe9329`](https://github.com/apache/spark/commit/7fe93295df5627f2fc4e712b71aa9ce75383d410).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-04-06 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20319
  
@smurakozi Thanks for the PR!  I have bandwidth to review this now.  Do you 
have time to rebase this to fix the merge conflicts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20987
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20828
  
**[Test build #88999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88999/testReport)**
 for PR 20828 at commit 
[`6d424ff`](https://github.com/apache/spark/commit/6d424ff67f22581ebbf240ac54089d1dee8e82b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179824556
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -102,10 +102,11 @@ object KolmogorovSmirnovTest {
*/
   @Since("2.4.0")
   @varargs
-  def test(dataset: DataFrame, sampleCol: String, distName: String, 
params: Double*): DataFrame = {
+  def test(dataset: Dataset[_], sampleCol: String, distName: String, 
params: Double*)
--- End diff --

nit: This doesn't fit scala style; please get familiar with the style we 
use for multi-line function headers!  Just check out other parts of MLlib for 
examples.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179831482
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"):
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+:param dataset:
+  a dataset or a dataframe containing the sample of data to test.
+:param sampleCol:
+  Name of sample column in dataset, of any numerical type.
+:param distName:
+  a `string` name for a theoretical distribution, currently only 
support "norm".
+:param params:
+  a list of `Double` values specifying the parameters to be used for 
the theoretical
+  distribution
+:return:
+  A dataframe that contains the Kolmogorov-Smirnov test result for the 
input sampled data.
+  This DataFrame will contain a single Row with the following fields:
+  - `pValue: Double`
+  - `statistic: Double`
+
+>>> from pyspark.ml.stat import KolmogorovSmirnovTest
+>>> dataset = [[-1.0], [0.0], [1.0]]
+>>> dataset = spark.createDataFrame(dataset, ['sample'])
+>>> ksResult = KolmogorovSmirnovTest.test(dataset, 'sample', 'norm', 
0.0, 1.0).collect()[0]
--- End diff --

nit: use first() instead of collect()[0]


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179833156
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"):
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+:param dataset:
+  a dataset or a dataframe containing the sample of data to test.
+:param sampleCol:
+  Name of sample column in dataset, of any numerical type.
+:param distName:
+  a `string` name for a theoretical distribution, currently only 
support "norm".
+:param params:
+  a list of `Double` values specifying the parameters to be used for 
the theoretical
--- End diff --

I realized we should list what the parameters are, both here and in the 
Scala docs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20987
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88991/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179830986
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"):
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+:param dataset:
+  a dataset or a dataframe containing the sample of data to test.
--- End diff --

nit: dataset -> Dataset, dataframe -> DataFrame (It's nice to write class 
names the way they are defined.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179832593
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -81,7 +81,7 @@ object KolmogorovSmirnovTest {
* Java-friendly version of `test(dataset: DataFrame, sampleCol: String, 
cdf: Double => Double)`
*/
   @Since("2.4.0")
-  def test(dataset: DataFrame, sampleCol: String,
+  def test(dataset: Dataset[_], sampleCol: String,
 cdf: Function[java.lang.Double, java.lang.Double]): DataFrame = {
--- End diff --

I guess I missed this before.  Would you mind fixing the scala style here 
too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179824228
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -59,7 +59,7 @@ object KolmogorovSmirnovTest {
* distribution of the sample data and the theoretical distribution we 
can provide a test for the
* the null hypothesis that the sample data comes from that theoretical 
distribution.
*
-   * @param dataset a `DataFrame` containing the sample of data to test
+   * @param dataset A dataset or a dataframe containing the sample of data 
to test
--- End diff --

nit: It's nicer to keep single back quotes ``` `DataFrame` ``` to make 
these show up as code in docs for clarity.  No need to get rid of that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff te...

2018-04-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20904#discussion_r179832114
  
--- Diff: python/pyspark/ml/stat.py ---
@@ -134,6 +134,65 @@ def corr(dataset, column, method="pearson"):
 return _java2py(sc, javaCorrObj.corr(*args))
 
 
+class KolmogorovSmirnovTest(object):
+"""
+.. note:: Experimental
+
+Conduct the two-sided Kolmogorov Smirnov (KS) test for data sampled 
from a continuous
+distribution.
+
+By comparing the largest difference between the empirical cumulative
+distribution of the sample data and the theoretical distribution we 
can provide a test for the
+the null hypothesis that the sample data comes from that theoretical 
distribution.
+
+:param dataset:
--- End diff --

I see you're following the example of ChiSquareTest, but this Param 
documentation belongs with the test method, not the class.  Could you please 
shift it?  (Feel free to correct it for ChiSquareTest here or in another PR.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20987: [SPARK-23816][CORE] Killed tasks should ignore FetchFail...

2018-04-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20987
  
**[Test build #88991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/testReport)**
 for PR 20987 at commit 
[`b387552`](https://github.com/apache/spark/commit/b387552f7c2a546ac7290be6da007678875814d7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20828: [SPARK-23687][SS] Add a memory source for continuous pro...

2018-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20828
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88998/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >