[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132368646
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

When I modified it to 1600, the result is:
max function length of wholestagecodegen: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


codegen = F467 /  507  1.4  
   712.7   1.0X
codegen = T maxLinesPerFunction = 16003191 / 3238  0.2  
  4868.7   0.1X
codegen = T maxLinesPerFunction = 1500 449 /  482  1.5  
   685.2   1.0X


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132368484
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

I think it can tested by " max function length of wholestagecodegen" added 
in AggregateBenchmark.scala, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18810
  
Btw, can you change `[sql]` to `[SQL]` in title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132367400
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

I tend to not change current behavior of whole-stage codegen. This might 
suddenly let user codes not run in whole-stage codegen unintentionally. Shall 
we make `-1` as default and skip function length check if this config is 
negative?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132367041
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

I'm not confident about this default value. Is it too small?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-08-09 Thread zhengruifeng
Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/17995


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132366896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -89,6 +89,14 @@ object CodeFormatter {
 }
 new CodeAndComment(code.result().trim(), map)
   }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
+val commentReg =
+  ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" +   // strip /*comment*/
+"""([ |\t]*?\/\/[\s\S]*?\n)""").r   // strip //comment
--- End diff --

nit: align `// strip //comment` with above `// strip /*comment*/`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132366187
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

We need to add a test in which we create a query with long generated 
function, and check if whole-stage codegen is disabled for it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365359
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a function that will be supported before" +
+  " deactivating whole-stage codegen.")
--- End diff --

Ok,updated,thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80475/testReport)**
 for PR 18810 at commit 
[`ce544a5`](https://github.com/apache/spark/commit/ce544a56dbeaa9fecb66706f3d2bad97280835bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
--- End diff --

Ok,updated,thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+val codeWithoutComments = 
CodeFormatter.stripExtraNewLinesAndComments(code)
+codeWithoutComments.count(_ == '\n') > 
SQLConf.get.maxLinesPerFunction
+  }
+}
+  }
+  /**
--- End diff --

Ok, added, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132364612
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Btw, some strange behaviors might occur:

scala> dfFromFile.filter($"_corrupt_record".isNotNull).show
+-+---+
|field|_corrupt_record|
+-+---+
| null| {"field": "3"}|
+-+---+

scala> 
dfFromFile.filter($"_corrupt_record".isNotNull).select("_corrupt_record").show
+---+
|_corrupt_record|
+---+
+---+





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132363994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+val codeWithoutComments = 
CodeFormatter.stripExtraNewLinesAndComments(code)
+codeWithoutComments.count(_ == '\n') > 
SQLConf.get.maxLinesPerFunction
+  }
+}
+  }
+  /**
--- End diff --

Add one more space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132363687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Oh. Got it. One issue for this behavior is we can't easily to only retrieve 
corrupt records by queries like `dfFromFile.select("_corrupt_record")`. This 
behavior is also inconsistent with RDD-based manipulation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132363283
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Ah, I mean they produced 0 and 3 for each as described in the PR 
description. I just double checked. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132361425
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

I've not tried 1.6.3 or 1.5.2. So @HyukjinKwon do you mean above code 
returns 1 for isNotNull and 2 for isNull?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80472/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132361043
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1572,7 +1588,8 @@ def test_java_params(self):
 for name, cls in inspect.getmembers(module, inspect.isclass):
 if not name.endswith('Model') and issubclass(cls, 
JavaParams)\
 and not inspect.isabstract(cls):
-self.check_params(cls())
+# NOTE: disable check_params_exist until there is 
parity with Scala API
+ParamTests.check_params(self, cls(), 
check_params_exist=False)
--- End diff --

This skips param test for Model. Should we do similar check to all models?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)**
 for PR 18810 at commit 
[`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18900
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132360895
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
--- End diff --

> isTooLongGeneratedFunction

Nit: remove `()` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-09 Thread debugger87
Github user debugger87 commented on the issue:

https://github.com/apache/spark/pull/18900
  
@cloud-fan could you please help me to review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-09 Thread debugger87
GitHub user debugger87 opened a pull request:

https://github.com/apache/spark/pull/18900

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

## What changes were proposed in this pull request?

Set createTime for every hive partition created in Spark SQL, which could 
be used to manage data lifecycle in Hive warehouse.

## How was this patch tested?

No tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/debugger87/spark 
fix/set-create-time-for-hive-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18900


commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6
Author: debugger87 
Date:   2017-08-10T04:17:00Z

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132360710
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a function that will be supported before" +
+  " deactivating whole-stage codegen.")
--- End diff --

> The maximum lines of a single Java function generated by whole-stage 
codegen. When the generated function exceeds this threshold, the whole-stage 
codegen is deactivated for this subtree of the current query plan. 

Could you also update the code comments in the other places based on my 
above update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132360643
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", 
labelCol="label", predictionCol="pred
 super(MultilayerPerceptronClassifier, self).__init__()
 self._java_obj = self._new_java_obj(
 
"org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid)
-self._setDefault(maxIter=100, tol=1E-4, blockSize=128, 
stepSize=0.03, solver="l-bfgs")
+self._setDefault(maxIter=100, tol=1E-6, blockSize=128, 
stepSize=0.03, solver="l-bfgs")
--- End diff --

Looks like 1e-6 is correct default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80471/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)**
 for PR 18810 at commit 
[`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132360069
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+@staticmethod
+def check_params(test_self, py_stage, check_params_exist=True):
+"""
+Checks common requirements for Params.params:
+  - set of params exist in Java and Python and are ordered by names
+  - param parent has the same UID as the object's UID
+  - default param value from Java matches value in Python
+  - optionally check if all params from Java also exist in Python
+"""
+py_stage_str = "%s %s" % (type(py_stage), py_stage)
+if not hasattr(py_stage, "_to_java"):
+return
+java_stage = py_stage._to_java()
+if java_stage is None:
+return
+test_self.assertEqual(py_stage.uid, java_stage.uid(), 
msg=py_stage_str)
+if check_params_exist:
+param_names = [p.name for p in py_stage.params]
+java_params = list(java_stage.params())
+java_param_names = [jp.name() for jp in java_params]
+test_self.assertEqual(
+param_names, sorted(java_param_names),
+"Param list in Python does not match Java for %s:\nJava = 
%s\nPython = %s"
+% (py_stage_str, java_param_names, param_names))
--- End diff --

Line 436-443 is the only change to `check_params`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132359678
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,15 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+val existLongFunction = ctx.existTooLongFunction
+if (existLongFunction) {
+  logWarning(s"Found too long generated codes and JIT optimization 
might not work, " +
+s"Whole-stage codegen disabled for this plan, " +
+s"You can change the config spark.sql.codegen.MaxFunctionLength " +
+s"to adjust the function length limit:\n "
--- End diff --

Please remove the useless `s`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132359369
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -144,7 +158,9 @@ def _transfer_params_from_java(self):
 if self._java_obj.hasParam(param.name):
 java_param = self._java_obj.getParam(param.name)
 # SPARK-14931: Only check set params back to avoid default 
params mismatch.
-if self._java_obj.isSet(java_param):
+if self._java_obj.isSet(java_param) or (
+# SPARK-10931: Temporary fix for params that have 
a default in Java
+self._java_obj.hasDefault(java_param) and not 
self.isDefined(param)):
--- End diff --

This change will make a default value for a param in java side as an 
user-provided param value in python side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...

2017-08-09 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/17972
  
@MLnick Any updates on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132358656
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -263,7 +284,8 @@ def _fit_java(self, dataset):
 
 def _fit(self, dataset):
 java_model = self._fit_java(dataset)
-return self._create_model(java_model)
+model = self._create_model(java_model)
+return self._copyValues(model)
--- End diff --

Here I think it is going to copy values from the estimator to the created 
model. So I think we assume that the params in estimator and model are the same?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80470/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)**
 for PR 18810 at commit 
[`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17849
  
Sorry, let me try and take a look tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132357684
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
 paramMap.put([pair])
 return paramMap
 
+def _create_params_from_java(self):
+"""
+SPARK-10931: Temporary fix to create params that are defined in 
the Java obj but not here
+"""
+java_params = list(self._java_obj.params())
+from pyspark.ml.param import Param
+for java_param in java_params:
+java_param_name = java_param.name()
+if not hasattr(self, java_param_name):
--- End diff --

If self contains a same name attribute which is not a `Param`, should we 
process it like throw exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18544
  
**[Test build #80474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80474/testReport)**
 for PR 18544 at commit 
[`c41475e`](https://github.com/apache/spark/commit/c41475e3c5a217e5778bbddcd1b4a4210ce5d180).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132357070
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

I am actually rather -0 on this change. Both the current way and the 
previous way sound not quite compelling to me but the current way at least does 
arguably unnecessary parsing tries and we started to have this behaviour long 
time ago.. (at least I tried this in 1.6.3 and 1.5.2):

```scala
import org.apache.spark.sql.types._

val schema = new StructType().add("field", ByteType).add("_corrupt_record", 
StringType)
val file = "/tmp/sample.json"
val dfFromFile = sqlContext.read.schema(schema).json(file)
dfFromFile.filter($"_corrupt_record".isNotNull).count()
dfFromFile.filter($"_corrupt_record".isNull).count()
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)**
 for PR 18899 at commit 
[`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80473/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132355421
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
 paramMap.put([pair])
 return paramMap
 
+def _create_params_from_java(self):
+"""
+SPARK-10931: Temporary fix to create params that are defined in 
the Java obj but not here
+"""
+java_params = list(self._java_obj.params())
+from pyspark.ml.param import Param
+for java_param in java_params:
+java_param_name = java_param.name()
+if not hasattr(self, java_param_name):
+param = Param(self, java_param_name, java_param.doc())
+setattr(param, "created_from_java_param", True)
--- End diff --

BTW, would you mind if I ask where `created_from_java_param` is used? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-08-09 Thread weiqingy
Github user weiqingy commented on the issue:

https://github.com/apache/spark/pull/17342
  
@steveloughran Thanks Steve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/18893
  
Since they're both small and this is already open I'd say leave it, unless 
someone ends up having issues with one of the fixes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132352189
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

What do you think? @cloud-fan @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-09 Thread lvdongr
Github user lvdongr commented on the issue:

https://github.com/apache/spark/pull/18756
  
You mean we can provide the different type of values with  different 
default values? like  int  with 0 ,and string with "" ?Or we set the default 
values when define the table?  @gatorsmile @maropu  I set the default to Null 
,because the "insert into ..." sentence in hive handle in this way, and I want 
to correspond with Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18895
  
@byakuinss Please add a doc test in `DataFrame.replace`. There is an 
example `df4.na.replace('Alice', None).show()`. We want to make sure it works 
with default value. Thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17849
  
Oh, wait, this looks not requiring ML bit much. Will try to give a pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17849
  
I am rather a backend developer and work together with data scientists. So, 
my ML knowledge is limited (am studying hard :)). Will leave few comments 
together if there are some nits and someone starts to review so that they can 
be addressed together. 

cc @viirya who I believe knows ML bit and @zero323 who I believe should be 
able to review this (but now is inactive though), are you maybe able to make a 
pass for this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)**
 for PR 18899 at commit 
[`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread mpjlu
GitHub user mpjlu opened a pull request:

https://github.com/apache/spark/pull/18899

[SPARK-21680][ML][MLLIB]optimzie Vector coompress

## What changes were proposed in this pull request?

When use Vector.compressed to change a Vector to SparseVector, the 
performance is very low comparing with Vector.toSparse.
This is because you have to scan the value three times using 
Vector.compressed, but you just need two times when use Vector.toSparse.
When the length of the vector is large, there is significant performance 
difference between this two method.

## How was this patch tested?

The existing UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpjlu/spark optVectorCompress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18899.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18899


commit 5dc5c89242a0c2a5ac6a693c3703eef8ee160616
Author: Peng Meng 
Date:   2017-08-10T01:59:17Z

optimzie Vector coompress




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18648
  
ping @jiangxb1987 @cloud-fan anymore suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80468/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18630
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18630
  
**[Test build #80468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80468/testReport)**
 for PR 18630 at commit 
[`c0b0a7d`](https://github.com/apache/spark/commit/c0b0a7d79ca27bbcf91245b3d80070d5d4188174).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18893
  
@ajbozarth do we need another pr to separate these? if necessary,  I will 
do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,18 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if the length of codegen function is too long or not
--- End diff --

Ok, I have modified it, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347148
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,18 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if the length of codegen function is too long or not
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.MaxFunctionLength, it will return true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+CodeFormatter.stripExtraNewLines(code).count(_ == '\n') > 
SQLConf.get.maxFunctionLength
--- End diff --

Ok, I have modified it to count lines without comments and extra new lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_FUNCTION_LEN = 
buildConf("spark.sql.codegen.MaxFunctionLength")
--- End diff --

Ok, I have modified it, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)**
 for PR 18810 at commit 
[`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347018
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -301,6 +301,61 @@ class AggregateBenchmark extends BenchmarkBase {
 */
   }
 
+  ignore("max function length of wholestagecodegen") {
+val N = 20 << 15
+
+val benchmark = new Benchmark("max function length of 
wholestagecodegen", N)
+def f(): Unit = sparkSession.range(N)
+  .selectExpr(
+"id",
+"(id & 1023) as k1",
+"cast(id & 1023 as double) as k2",
+"cast(id & 1023 as int) as k3",
+"case when id > 100 and id <= 200 then 1 else 0 end as v1",
+"case when id > 200 and id <= 300 then 1 else 0 end as v2",
+"case when id > 300 and id <= 400 then 1 else 0 end as v3",
+"case when id > 400 and id <= 500 then 1 else 0 end as v4",
+"case when id > 500 and id <= 600 then 1 else 0 end as v5",
+"case when id > 600 and id <= 700 then 1 else 0 end as v6",
+"case when id > 700 and id <= 800 then 1 else 0 end as v7",
+"case when id > 800 and id <= 900 then 1 else 0 end as v8",
+"case when id > 900 and id <= 1000 then 1 else 0 end as v9",
+"case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
+"case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
+"case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
+"case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
+"case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
+"case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
+"case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
+"case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
+"case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
+  .groupBy("k1", "k2", "k3")
+  .sum()
+  .collect()
+
+benchmark.addCase(s"codegen = F") { iter =>
+  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
+  f()
+}
+
+benchmark.addCase(s"codegen = T") { iter =>
+  sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
+  sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "1")
--- End diff --

Ok, I have added a test use the default number 1500, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18893
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)**
 for PR 18810 at commit 
[`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)**
 for PR 18895 at commit 
[`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80469/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)**
 for PR 18810 at commit 
[`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18756
  
In the most cases of `SELECT` statements,  `default_value` is `NULL` by 
default. So, I firstly thought non-specified columns were filled with `NULL`. 
Anyway, we still have any chance to implement the concept of `DEFAULT`, too?
```
postgresql doc:
DEFAULT default_expr
...
The default expression will be used in any insert operation that does not 
specify a value 
for the column. If there is no default for a column, then the default is 
null.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)**
 for PR 18895 at commit 
[`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
Could we add the example in the doctest (under 1362L) so that this can be 
tested and shown in the documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18882: [SPARK-21652][SQL] Filter out meaningless constraints in...

2017-08-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18882
  
Any activity for cost-based inference? Anyway, thanks! I'll close this for 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18882: [SPARK-21652][SQL] Filter out meaningless constra...

2017-08-09 Thread maropu
Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/18882


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18882: [SPARK-21652][SQL] Filter out meaningless constraints in...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18882
  
Thanks for working on it, but the inferred one is not useless. The removal 
has to be cost based.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...

2017-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18820
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18898: [SPARK-21245][ML] Resolve code duplication for classific...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18898
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18898: [SPARK-21245][ML] Resolve code duplication for cl...

2017-08-09 Thread bravo-zhang
GitHub user bravo-zhang opened a pull request:

https://github.com/apache/spark/pull/18898

[SPARK-21245][ML] Resolve code duplication for classification/regression 
summarizers

## Why the change?

In several places (LogReg, LinReg, SVC) in Spark ML, we collect summary 
information about training data using `MultivariateOnlineSummarizer` and 
`MulticlassSummarizer`. We have the same code appearing in several places 
(including test suites). We can eliminate this by creating a common 
implementation.

## What changes were proposed in this pull request?

1. A new class `ml.stat.Summarizers.scala` with `def 
getRegressionSummarizers` and `def getClassificationSummarizers` that provides 
a pair of feature and label summarizers.
This centralizes the duplicated code in: `LinearRegression`, `LinearSVC`, 
`LogisticRegression` and `DifferentiableLossAggregatorSuite`.
2. Moves `MultiClassSummarizer.scala`(and testSuite) out of 
`LogisticRegression.scala` to new file `ml.stat.MultiClassSummarizer.scala`, 
because it is also used by `LinearSVC` and can be generalized.

## How was this patch tested?

`ml.stat.SummarizersSuite.scala`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bravo-zhang/spark spark-21245

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18898


commit 1f5209f7e40c520e1c6b6b5943ef87fde7d5b254
Author: bravo-zhang 
Date:   2017-08-09T16:05:23Z

Resolve code duplication for classification/regression summarizers




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18630
  
Ok, thanks for checking.  It doesn't look like it's coming from your 
changes, so I'm sure it's just me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18734
  
huzzah! I'm in the middle of getting some code working for a talk tomorrow 
so I'll circle back on this on Friday. If @davies has any opinions though it 
would be great to hear them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

2017-08-09 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18734#discussion_r132334455
  
--- Diff: python/pyspark/cloudpickle.py ---
@@ -397,42 +625,7 @@ def save_global(self, obj, name=None, 
pack=struct.pack):
 
 typ = type(obj)
 if typ is not obj and isinstance(obj, (type, types.ClassType)):
-d = dict(obj.__dict__)  # copy dict proxy to a dict
-if not isinstance(d.get('__dict__', None), property):
-# don't extract dict that are properties
-d.pop('__dict__', None)
-d.pop('__weakref__', None)
-
-# hack as __new__ is stored differently in the __dict__
-new_override = d.get('__new__', None)
-if new_override:
-d['__new__'] = obj.__new__
-
-# workaround for namedtuple (hijacked by PySpark)
-if getattr(obj, '_is_namedtuple_', False):
-self.save_reduce(_load_namedtuple, (obj.__name__, 
obj._fields))
-return
-
-self.save(_load_class)
-self.save_reduce(typ, (obj.__name__, obj.__bases__, 
{"__doc__": obj.__doc__}), obj=obj)
-d.pop('__doc__', None)
-# handle property and staticmethod
-dd = {}
-for k, v in d.items():
--- End diff --

Gentle re-ping to @davies - do you have an opinion on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
This is how I build things:

./build/mvn -Pmesos -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean 
package

#  -DskipTests clean package

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre/

./dev/make-distribution.sh --name 18630 --tgz -Phadoop-2.7 -Pmesos 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
@BryanCutler sure check here, it works:
https://gist.github.com/skonto/dc2070d1529c97ec5de32e99983a834f


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18630
  
Maybe it was just something with my env - but I was running it locally, can 
you just verify that works too?  Just don't specify the `--master` conf and run 
out of your spark home dir


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
> spark-2.3.0-SNAPSHOT-bin-18630/bin$ ./spark-shell --verbose --master 
spark://ip-10-10-1-79:7077 
Using properties file: null
Parsed arguments:
  master  spark://ip-10-10-1-79:7077
  deployMode  null
  executorMemory  null
  executorCores   null
  totalExecutorCores  null
  propertiesFile  null
  driverMemorynull
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesnull
  mainClass   org.apache.spark.repl.Main
  primaryResource spark-shell
  nameSpark shell
  childArgs   []
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  


Main class:
org.apache.spark.repl.Main
Arguments:

System properties:
(SPARK_SUBMIT,true)
(spark.app.name,Spark shell)
(spark.jars,)
(spark.submit.deployMode,client)
(spark.master,spark://ip-10-10-1-79:7077)
Classpath elements:



Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/08/09 23:28:03 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://10.10.1.79:4040
Spark context available as 'sc' (master = spark://ip-10-10-1-79:7077, app 
id = app-20170809232804-0003).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80467/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16158
  
**[Test build #80467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80467/testReport)**
 for PR 16158 at commit 
[`72aea62`](https://github.com/apache/spark/commit/72aea626bb1fef4a2834e1054bac99451f04c0e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18630
  
Yeah, just by running `bin/spark-shell` it failed immediately with that 
error.  I double-check by rebuilding and same thing but I'm not sure if was 
something from your changes or not.  Are you able to startup the shell?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
@BryanCutler you just started spark shell and it failed? How can I 
reproduce it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18630
  
Sure, python support could be added at a later point, I was just thinking 
if it was only a small addition to what's already here, but no problem.  Btw, 
after checking out this PR I tried spark-shell and got the error below.  Not 
sure if it was my environment, but after switching back to master it worked fine
```
bin/spark-shell 
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/logging/LogFactory
at org.apache.hadoop.conf.Configuration.(Configuration.java:178)
at 
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:324)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:155)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.logging.LogFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18630
  
I wasn't really expecting python support to be added here. I wonder if 
there's a bug open for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132330891
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -263,7 +284,8 @@ def _fit_java(self, dataset):
 
 def _fit(self, dataset):
 java_model = self._fit_java(dataset)
-return self._create_model(java_model)
+model = self._create_model(java_model)
+return self._copyValues(model)
--- End diff --

This is the crucial line being added in this PR.  Without this, if a Python 
model defines a param (matching one from Scala), then when the model is fit in 
Scala that param value will never be sent back to Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
@BryanCutler @vanzin  to make things testable DriverWrapper needs 
refactoring from a quick look I took. 
 py files are resolved in client mode, let's fix it in another PR (I could 
do it). The docs 
(https://spark.apache.org/docs/latest/submitting-applications.html) state:
"Currently, standalone mode does not support cluster mode for Python 
applications."
So is the file distribution the only thing to do? I havent scoped the work 
needed to support python apps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

2017-08-09 Thread rgbkrk
Github user rgbkrk commented on the issue:

https://github.com/apache/spark/pull/18734
  
Just a note that we just shipped the fixes from @HyukjinKwon within 
cloudpickle (as v0.4.0), so we're at least roughly in sync now. 😄 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/17849
  
ping @holdenk , also @HyukjinKwon if you are able to take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >