[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132376473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

When we  check "ctx.isTooLongGeneratedFunction" in doExecute, the 
WholeStageCodegenExec node is generated alreay, so there must be 
WholeStageCodegenExec node at this point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18899
  
This isn't what was proposed in the JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132375612
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -986,6 +986,7 @@ private[hive] object HiveClientImpl {
 tpart.setTableName(ht.getTableName)
 tpart.setValues(partValues.asJava)
 tpart.setSd(storageDesc)
+tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt)
--- End diff --

This is to Hive, how about from Hive? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)**
 for PR 18902 at commit 
[`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80478/testReport)**
 for PR 18902 at commit 
[`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132375138
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

We can check if there is a `WholeStageCodegenExec` node in the physical 
plan of the query. `WholeStageCodegenSuite` has few examples you can take a 
look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80477/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18901: [SPARK-21689][YARN] Download user jar from remote...

2017-08-09 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18901#discussion_r132374726
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -516,6 +516,10 @@ object SparkSubmit extends CommandLineUtils {
 if (deployMode == CLIENT || isYarnCluster) {
   childMainClass = args.mainClass
   if (isUserJar(args.primaryResource)) {
+val hadoopConf = new HadoopConfiguration()
+args.primaryResource =
+  Option(args.primaryResource).map(
+downloadFile(_, targetDir, args.sparkProperties, 
hadoopConf)).orNull
 childClasspath += args.primaryResource
   }
   if (args.jars != null) { childClasspath ++= args.jars.split(",") }
--- End diff --

I think in your scenario we should also download jars specified with 
`--jars` to local and add to classpath.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132374541
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

@viirya, it is hard to check  if whole-stage codegen is disabled or not for 
me, would you like to give me some suggestion, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132373300
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

AggregateBenchmark is more like a benchmark than a test. It won't run every 
time. We need a test to prevent regression brought by future change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80476/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80476/testReport)**
 for PR 18810 at commit 
[`08f5ddf`](https://github.com/apache/spark/commit/08f5ddf0442793a63beff7f9e3970fc8bb92a47d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)**
 for PR 18902 at commit 
[`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer

2017-08-09 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/18902

[SPARK-21690][ML] one-pass imputer

## What changes were proposed in this pull request?
parallelize the computation of all columns

## How was this patch tested?
existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark parallelize_imputer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18902


commit c4042ad23bcf94b758ee8c345c15fc85037cbdb9
Author: Zheng RuiFeng 
Date:   2017-06-07T04:26:41Z

create pr

commit 4c35bda0e073084a608df8e8bc28c4dae5a1fc5b
Author: Zheng RuiFeng 
Date:   2017-06-07T04:30:26Z

handle missing

commit 9a6ac59d5191a57a9b0b671414a2dbac1a3c3b3d
Author: Zheng RuiFeng 
Date:   2017-06-07T05:33:30Z

use summary

commit f6f166fef4e17db7e36ccecf41aebe3443e9fef5
Author: Zheng RuiFeng 
Date:   2017-06-07T06:22:58Z

x




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80476/testReport)**
 for PR 18810 at commit 
[`08f5ddf`](https://github.com/apache/spark/commit/08f5ddf0442793a63beff7f9e3970fc8bb92a47d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18901: [SPARK-21689][YARN] Download user jar from remote...

2017-08-09 Thread caneGuy
GitHub user caneGuy opened a pull request:

https://github.com/apache/spark/pull/18901

[SPARK-21689][YARN] Download user jar from remote in case get hadoop token 
…

…failed

## What changes were proposed in this pull request?

When use yarn cluster mode,and we need scan hbase,there will be a case 
which can not work:
If we put user jar on hdfs,when local classpath will has no hbase,which 
will let get hbase token failed.Then later when job submitted to yarn, it will 
failed since has no token to access hbase table.I mock three cases:
1:user jar is on classpath, and has hbase
`17/08/10 13:48:03 INFO security.HadoopFSDelegationTokenProvider: Renewal 
interval is 86400050 for token HDFS_DELEGATION_TOKEN
17/08/10 13:48:03 INFO security.HadoopDelegationTokenManager: Service hive
17/08/10 13:48:03 INFO security.HadoopDelegationTokenManager: Service hbase
17/08/10 13:48:05 INFO security.HBaseDelegationTokenProvider: Attempting to 
fetch HBase security token.`

Logs showing we can get token normally.


2:user jar on hdfs
`17/08/10 13:43:58 WARN security.HBaseDelegationTokenProvider: Class 
org.apache.hadoop.hbase.HBaseConfiguration not found.
17/08/10 13:43:58 INFO security.HBaseDelegationTokenProvider: Failed to get 
token from service hbase
java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.security.token.TokenUtil
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:41)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:112)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:109)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)`


Logs showing we can get token failed with ClassNotFoundException.

If we download user jar from remote first,then things will work 
correctly.So this patch will download user jar from remote when in yarn cluster 
mode.


## How was this patch tested?

Manually tested by execute spark-submit scripts with different user jars.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/caneGuy/spark zhoukang/download-userjar

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18901.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18901


commit 31fb394f983313c2ee767bf68220041fa6c84b2e
Author: zhoukang 
Date:   2017-08-09T10:42:43Z

[SPARK][YARN] Download user jar from remote in case get hadoop token failed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18901: [SPARK-21689][YARN] Download user jar from remote in cas...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18901
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18544
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132370096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -89,6 +89,14 @@ object CodeFormatter {
 }
 new CodeAndComment(code.result().trim(), map)
   }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
+val commentReg =
+  ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" +   // strip /*comment*/
+"""([ |\t]*?\/\/[\s\S]*?\n)""").r   // strip //comment
--- End diff --

Ok,modified, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18544
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80474/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18544
  
**[Test build #80474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80474/testReport)**
 for PR 18544 at commit 
[`c41475e`](https://github.com/apache/spark/commit/c41475e3c5a217e5778bbddcd1b4a4210ce5d180).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132368646
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

When I modified it to 1600, the result is:
max function length of wholestagecodegen: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


codegen = F467 /  507  1.4  
   712.7   1.0X
codegen = T maxLinesPerFunction = 16003191 / 3238  0.2  
  4868.7   0.1X
codegen = T maxLinesPerFunction = 1500 449 /  482  1.5  
   685.2   1.0X


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132368484
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

I think it can tested by " max function length of wholestagecodegen" added 
in AggregateBenchmark.scala, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18810
  
Btw, can you change `[sql]` to `[SQL]` in title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132367400
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

I tend to not change current behavior of whole-stage codegen. This might 
suddenly let user codes not run in whole-stage codegen unintentionally. Shall 
we make `-1` as default and skip function length check if this config is 
negative?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132367041
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,14 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a single Java function generated by 
whole-stage codegen. " +
+  "When the generated function exceeds this threshold, " +
+  "the whole-stage codegen is deactivated for this subtree of the 
current query plan.")
+.intConf
+.createWithDefault(1500)
--- End diff --

I'm not confident about this default value. Is it too small?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-08-09 Thread zhengruifeng
Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/17995


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132366896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -89,6 +89,14 @@ object CodeFormatter {
 }
 new CodeAndComment(code.result().trim(), map)
   }
+
+  def stripExtraNewLinesAndComments(input: String): String = {
+val commentReg =
+  ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" +   // strip /*comment*/
+"""([ |\t]*?\/\/[\s\S]*?\n)""").r   // strip //comment
--- End diff --

nit: align `// strip //comment` with above `// strip /*comment*/`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132366187
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+if (ctx.isTooLongGeneratedFunction) {
+  logWarning("Found too long generated codes and JIT optimization 
might not work, " +
+"Whole-stage codegen disabled for this plan, " +
+"You can change the config spark.sql.codegen.MaxFunctionLength " +
+"to adjust the function length limit:\n "
++ s"$treeString")
+  return child.execute()
+}
--- End diff --

We need to add a test in which we create a query with long generated 
function, and check if whole-stage codegen is disabled for it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365359
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a function that will be supported before" +
+  " deactivating whole-stage codegen.")
--- End diff --

Ok,updated,thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80475/testReport)**
 for PR 18810 at commit 
[`ce544a5`](https://github.com/apache/spark/commit/ce544a56dbeaa9fecb66706f3d2bad97280835bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
--- End diff --

Ok,updated,thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132365436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+val codeWithoutComments = 
CodeFormatter.stripExtraNewLinesAndComments(code)
+codeWithoutComments.count(_ == '\n') > 
SQLConf.get.maxLinesPerFunction
+  }
+}
+  }
+  /**
--- End diff --

Ok, added, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132364612
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Btw, some strange behaviors might occur:

scala> dfFromFile.filter($"_corrupt_record".isNotNull).show
+-+---+
|field|_corrupt_record|
+-+---+
| null| {"field": "3"}|
+-+---+

scala> 
dfFromFile.filter($"_corrupt_record".isNotNull).select("_corrupt_record").show
+---+
|_corrupt_record|
+---+
+---+





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132363994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+val codeWithoutComments = 
CodeFormatter.stripExtraNewLinesAndComments(code)
+codeWithoutComments.count(_ == '\n') > 
SQLConf.get.maxLinesPerFunction
+  }
+}
+  }
+  /**
--- End diff --

Add one more space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132363687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Oh. Got it. One issue for this behavior is we can't easily to only retrieve 
corrupt records by queries like `dfFromFile.select("_corrupt_record")`. This 
behavior is also inconsistent with RDD-based manipulation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132363283
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

Ah, I mean they produced 0 and 3 for each as described in the PR 
description. I just double checked. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132361425
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

I've not tried 1.6.3 or 1.5.2. So @HyukjinKwon do you mean above code 
returns 1 for isNotNull and 2 for isNull?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80472/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132361043
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1572,7 +1588,8 @@ def test_java_params(self):
 for name, cls in inspect.getmembers(module, inspect.isclass):
 if not name.endswith('Model') and issubclass(cls, 
JavaParams)\
 and not inspect.isabstract(cls):
-self.check_params(cls())
+# NOTE: disable check_params_exist until there is 
parity with Scala API
+ParamTests.check_params(self, cls(), 
check_params_exist=False)
--- End diff --

This skips param test for Model. Should we do similar check to all models?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)**
 for PR 18810 at commit 
[`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18900
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132360895
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,19 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if there is a codegen function the lines of which is greater 
than maxLinesPerFunction
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.maxLinesPerFunction, it will return 
true.
+   */
+  def existTooLongFunction(): Boolean = {
--- End diff --

> isTooLongGeneratedFunction

Nit: remove `()` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-09 Thread debugger87
Github user debugger87 commented on the issue:

https://github.com/apache/spark/pull/18900
  
@cloud-fan could you please help me to review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-09 Thread debugger87
GitHub user debugger87 opened a pull request:

https://github.com/apache/spark/pull/18900

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

## What changes were proposed in this pull request?

Set createTime for every hive partition created in Spark SQL, which could 
be used to manage data lifecycle in Hive warehouse.

## How was this patch tested?

No tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/debugger87/spark 
fix/set-create-time-for-hive-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18900


commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6
Author: debugger87 
Date:   2017-08-10T04:17:00Z

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132360710
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_LINES_PER_FUNCTION = 
buildConf("spark.sql.codegen.maxLinesPerFunction")
+.internal()
+.doc("The maximum lines of a function that will be supported before" +
+  " deactivating whole-stage codegen.")
--- End diff --

> The maximum lines of a single Java function generated by whole-stage 
codegen. When the generated function exceeds this threshold, the whole-stage 
codegen is deactivated for this subtree of the current query plan. 

Could you also update the code comments in the other places based on my 
above update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132360643
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", 
labelCol="label", predictionCol="pred
 super(MultilayerPerceptronClassifier, self).__init__()
 self._java_obj = self._new_java_obj(
 
"org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid)
-self._setDefault(maxIter=100, tol=1E-4, blockSize=128, 
stepSize=0.03, solver="l-bfgs")
+self._setDefault(maxIter=100, tol=1E-6, blockSize=128, 
stepSize=0.03, solver="l-bfgs")
--- End diff --

Looks like 1e-6 is correct default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80471/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)**
 for PR 18810 at commit 
[`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132360069
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+@staticmethod
+def check_params(test_self, py_stage, check_params_exist=True):
+"""
+Checks common requirements for Params.params:
+  - set of params exist in Java and Python and are ordered by names
+  - param parent has the same UID as the object's UID
+  - default param value from Java matches value in Python
+  - optionally check if all params from Java also exist in Python
+"""
+py_stage_str = "%s %s" % (type(py_stage), py_stage)
+if not hasattr(py_stage, "_to_java"):
+return
+java_stage = py_stage._to_java()
+if java_stage is None:
+return
+test_self.assertEqual(py_stage.uid, java_stage.uid(), 
msg=py_stage_str)
+if check_params_exist:
+param_names = [p.name for p in py_stage.params]
+java_params = list(java_stage.params())
+java_param_names = [jp.name() for jp in java_params]
+test_self.assertEqual(
+param_names, sorted(java_param_names),
+"Param list in Python does not match Java for %s:\nJava = 
%s\nPython = %s"
+% (py_stage_str, java_param_names, param_names))
--- End diff --

Line 436-443 is the only change to `check_params`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132359678
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -370,6 +370,15 @@ case class WholeStageCodegenExec(child: SparkPlan) 
extends UnaryExecNode with Co
 
   override def doExecute(): RDD[InternalRow] = {
 val (ctx, cleanedSource) = doCodeGen()
+val existLongFunction = ctx.existTooLongFunction
+if (existLongFunction) {
+  logWarning(s"Found too long generated codes and JIT optimization 
might not work, " +
+s"Whole-stage codegen disabled for this plan, " +
+s"You can change the config spark.sql.codegen.MaxFunctionLength " +
+s"to adjust the function length limit:\n "
--- End diff --

Please remove the useless `s`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132359369
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -144,7 +158,9 @@ def _transfer_params_from_java(self):
 if self._java_obj.hasParam(param.name):
 java_param = self._java_obj.getParam(param.name)
 # SPARK-14931: Only check set params back to avoid default 
params mismatch.
-if self._java_obj.isSet(java_param):
+if self._java_obj.isSet(java_param) or (
+# SPARK-10931: Temporary fix for params that have 
a default in Java
+self._java_obj.hasDefault(java_param) and not 
self.isDefined(param)):
--- End diff --

This change will make a default value for a param in java side as an 
user-provided param value in python side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...

2017-08-09 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/17972
  
@MLnick Any updates on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132358656
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -263,7 +284,8 @@ def _fit_java(self, dataset):
 
 def _fit(self, dataset):
 java_model = self._fit_java(dataset)
-return self._create_model(java_model)
+model = self._create_model(java_model)
+return self._copyValues(model)
--- End diff --

Here I think it is going to copy values from the estimator to the created 
model. So I think we assume that the params in estimator and model are the same?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80470/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18810
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)**
 for PR 18810 at commit 
[`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17849
  
Sorry, let me try and take a look tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132357684
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
 paramMap.put([pair])
 return paramMap
 
+def _create_params_from_java(self):
+"""
+SPARK-10931: Temporary fix to create params that are defined in 
the Java obj but not here
+"""
+java_params = list(self._java_obj.params())
+from pyspark.ml.param import Param
+for java_param in java_params:
+java_param_name = java_param.name()
+if not hasattr(self, java_param_name):
--- End diff --

If self contains a same name attribute which is not a `Param`, should we 
process it like throw exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18544
  
**[Test build #80474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80474/testReport)**
 for PR 18544 at commit 
[`c41475e`](https://github.com/apache/spark/commit/c41475e3c5a217e5778bbddcd1b4a4210ce5d180).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132357070
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

I am actually rather -0 on this change. Both the current way and the 
previous way sound not quite compelling to me but the current way at least does 
arguably unnecessary parsing tries and we started to have this behaviour long 
time ago.. (at least I tried this in 1.6.3 and 1.5.2):

```scala
import org.apache.spark.sql.types._

val schema = new StructType().add("field", ByteType).add("_corrupt_record", 
StringType)
val file = "/tmp/sample.json"
val dfFromFile = sqlContext.read.schema(schema).json(file)
dfFromFile.filter($"_corrupt_record".isNotNull).count()
dfFromFile.filter($"_corrupt_record".isNull).count()
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)**
 for PR 18899 at commit 
[`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18899
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80473/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17849#discussion_r132355421
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap):
 paramMap.put([pair])
 return paramMap
 
+def _create_params_from_java(self):
+"""
+SPARK-10931: Temporary fix to create params that are defined in 
the Java obj but not here
+"""
+java_params = list(self._java_obj.params())
+from pyspark.ml.param import Param
+for java_param in java_params:
+java_param_name = java_param.name()
+if not hasattr(self, java_param_name):
+param = Param(self, java_param_name, java_param.doc())
+setattr(param, "created_from_java_param", True)
--- End diff --

BTW, would you mind if I ask where `created_from_java_param` is used? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-08-09 Thread weiqingy
Github user weiqingy commented on the issue:

https://github.com/apache/spark/pull/17342
  
@steveloughran Thanks Steve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/18893
  
Since they're both small and this is already open I'd say leave it, unless 
someone ends up having issues with one of the fixes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

2017-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18865#discussion_r132352189
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
 
 (file: PartitionedFile) => {
-  val parser = new JacksonParser(actualSchema, parsedOptions)
+  // SPARK-21610: when the `requiredSchema` only contains 
`_corrupt_record`,
--- End diff --

What do you think? @cloud-fan @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-09 Thread lvdongr
Github user lvdongr commented on the issue:

https://github.com/apache/spark/pull/18756
  
You mean we can provide the different type of values with  different 
default values? like  int  with 0 ,and string with "" ?Or we set the default 
values when define the table?  @gatorsmile @maropu  I set the default to Null 
,because the "insert into ..." sentence in hive handle in this way, and I want 
to correspond with Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18895
  
@byakuinss Please add a doc test in `DataFrame.replace`. There is an 
example `df4.na.replace('Alice', None).show()`. We want to make sure it works 
with default value. Thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17849
  
Oh, wait, this looks not requiring ML bit much. Will try to give a pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17849
  
I am rather a backend developer and work together with data scientists. So, 
my ML knowledge is limited (am studying hard :)). Will leave few comments 
together if there are some nits and someone starts to review so that they can 
be addressed together. 

cc @viirya who I believe knows ML bit and @zero323 who I believe should be 
able to review this (but now is inactive though), are you maybe able to make a 
pass for this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18899
  
**[Test build #80473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)**
 for PR 18899 at commit 
[`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-09 Thread mpjlu
GitHub user mpjlu opened a pull request:

https://github.com/apache/spark/pull/18899

[SPARK-21680][ML][MLLIB]optimzie Vector coompress

## What changes were proposed in this pull request?

When use Vector.compressed to change a Vector to SparseVector, the 
performance is very low comparing with Vector.toSparse.
This is because you have to scan the value three times using 
Vector.compressed, but you just need two times when use Vector.toSparse.
When the length of the vector is large, there is significant performance 
difference between this two method.

## How was this patch tested?

The existing UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpjlu/spark optVectorCompress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18899.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18899


commit 5dc5c89242a0c2a5ac6a693c3703eef8ee160616
Author: Peng Meng 
Date:   2017-08-10T01:59:17Z

optimzie Vector coompress




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18648
  
ping @jiangxb1987 @cloud-fan anymore suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80468/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18630
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18630
  
**[Test build #80468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80468/testReport)**
 for PR 18630 at commit 
[`c0b0a7d`](https://github.com/apache/spark/commit/c0b0a7d79ca27bbcf91245b3d80070d5d4188174).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18893
  
@ajbozarth do we need another pr to separate these? if necessary,  I will 
do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,18 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if the length of codegen function is too long or not
--- End diff --

Ok, I have modified it, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347148
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -356,6 +356,18 @@ class CodegenContext {
   private val placeHolderToComments = new mutable.HashMap[String, String]
 
   /**
+   * Returns if the length of codegen function is too long or not
+   * It will count the lines of every codegen function, if there is a 
function of length
+   * greater than spark.sql.codegen.MaxFunctionLength, it will return true.
+   */
+  def existTooLongFunction(): Boolean = {
+classFunctions.exists { case (className, functions) =>
+  functions.exists{ case (name, code) =>
+CodeFormatter.stripExtraNewLines(code).count(_ == '\n') > 
SQLConf.get.maxFunctionLength
--- End diff --

Ok, I have modified it to count lines without comments and extra new lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -572,6 +572,13 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val WHOLESTAGE_MAX_FUNCTION_LEN = 
buildConf("spark.sql.codegen.MaxFunctionLength")
--- End diff --

Ok, I have modified it, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)**
 for PR 18810 at commit 
[`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...

2017-08-09 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18810#discussion_r132347018
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -301,6 +301,61 @@ class AggregateBenchmark extends BenchmarkBase {
 */
   }
 
+  ignore("max function length of wholestagecodegen") {
+val N = 20 << 15
+
+val benchmark = new Benchmark("max function length of 
wholestagecodegen", N)
+def f(): Unit = sparkSession.range(N)
+  .selectExpr(
+"id",
+"(id & 1023) as k1",
+"cast(id & 1023 as double) as k2",
+"cast(id & 1023 as int) as k3",
+"case when id > 100 and id <= 200 then 1 else 0 end as v1",
+"case when id > 200 and id <= 300 then 1 else 0 end as v2",
+"case when id > 300 and id <= 400 then 1 else 0 end as v3",
+"case when id > 400 and id <= 500 then 1 else 0 end as v4",
+"case when id > 500 and id <= 600 then 1 else 0 end as v5",
+"case when id > 600 and id <= 700 then 1 else 0 end as v6",
+"case when id > 700 and id <= 800 then 1 else 0 end as v7",
+"case when id > 800 and id <= 900 then 1 else 0 end as v8",
+"case when id > 900 and id <= 1000 then 1 else 0 end as v9",
+"case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
+"case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
+"case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
+"case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
+"case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
+"case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
+"case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
+"case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
+"case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
+  .groupBy("k1", "k2", "k3")
+  .sum()
+  .collect()
+
+benchmark.addCase(s"codegen = F") { iter =>
+  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
+  f()
+}
+
+benchmark.addCase(s"codegen = T") { iter =>
+  sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
+  sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "1")
--- End diff --

Ok, I have added a test use the default number 1500, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...

2017-08-09 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18893
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)**
 for PR 18810 at commit 
[`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)**
 for PR 18895 at commit 
[`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80469/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18810
  
**[Test build #80470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)**
 for PR 18810 at commit 
[`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...

2017-08-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18756
  
In the most cases of `SELECT` statements,  `default_value` is `NULL` by 
default. So, I firstly thought non-specified columns were filled with `NULL`. 
Anyway, we still have any chance to implement the concept of `DEFAULT`, too?
```
postgresql doc:
DEFAULT default_expr
...
The default expression will be used in any insert operation that does not 
specify a value 
for the column. If there is no default for a column, then the default is 
null.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)**
 for PR 18895 at commit 
[`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
Could we add the example in the doctest (under 1362L) so that this can be 
tested and shown in the documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18882: [SPARK-21652][SQL] Filter out meaningless constraints in...

2017-08-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18882
  
Any activity for cost-based inference? Anyway, thanks! I'll close this for 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18882: [SPARK-21652][SQL] Filter out meaningless constra...

2017-08-09 Thread maropu
Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/18882


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >