[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132376473 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- When we check "ctx.isTooLongGeneratedFunction" in doExecute, the WholeStageCodegenExec node is generated alreay, so there must be WholeStageCodegenExec node at this point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 This isn't what was proposed in the JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132375612 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -986,6 +986,7 @@ private[hive] object HiveClientImpl { tpart.setTableName(ht.getTableName) tpart.setValues(partValues.asJava) tpart.setSd(storageDesc) +tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt) --- End diff -- This is to Hive, how about from Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18902 **[Test build #80477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)** for PR 18902 at commit [`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18902 **[Test build #80478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80478/testReport)** for PR 18902 at commit [`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132375138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- We can check if there is a `WholeStageCodegenExec` node in the physical plan of the query. `WholeStageCodegenSuite` has few examples you can take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18902 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80477/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18901: [SPARK-21689][YARN] Download user jar from remote...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/18901#discussion_r132374726 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -516,6 +516,10 @@ object SparkSubmit extends CommandLineUtils { if (deployMode == CLIENT || isYarnCluster) { childMainClass = args.mainClass if (isUserJar(args.primaryResource)) { +val hadoopConf = new HadoopConfiguration() +args.primaryResource = + Option(args.primaryResource).map( +downloadFile(_, targetDir, args.sparkProperties, hadoopConf)).orNull childClasspath += args.primaryResource } if (args.jars != null) { childClasspath ++= args.jars.split(",") } --- End diff -- I think in your scenario we should also download jars specified with `--jars` to local and add to classpath. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18902 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132374541 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- @viirya, it is hard to check if whole-stage codegen is disabled or not for me, would you like to give me some suggestion, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132373300 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- AggregateBenchmark is more like a benchmark than a test. It won't run every time. We need a test to prevent regression brought by future change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80476/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80476/testReport)** for PR 18810 at commit [`08f5ddf`](https://github.com/apache/spark/commit/08f5ddf0442793a63beff7f9e3970fc8bb92a47d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18902 **[Test build #80477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)** for PR 18902 at commit [`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/18902 [SPARK-21690][ML] one-pass imputer ## What changes were proposed in this pull request? parallelize the computation of all columns ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark parallelize_imputer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18902 commit c4042ad23bcf94b758ee8c345c15fc85037cbdb9 Author: Zheng RuiFeng Date: 2017-06-07T04:26:41Z create pr commit 4c35bda0e073084a608df8e8bc28c4dae5a1fc5b Author: Zheng RuiFeng Date: 2017-06-07T04:30:26Z handle missing commit 9a6ac59d5191a57a9b0b671414a2dbac1a3c3b3d Author: Zheng RuiFeng Date: 2017-06-07T05:33:30Z use summary commit f6f166fef4e17db7e36ccecf41aebe3443e9fef5 Author: Zheng RuiFeng Date: 2017-06-07T06:22:58Z x --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80476/testReport)** for PR 18810 at commit [`08f5ddf`](https://github.com/apache/spark/commit/08f5ddf0442793a63beff7f9e3970fc8bb92a47d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18901: [SPARK-21689][YARN] Download user jar from remote...
GitHub user caneGuy opened a pull request: https://github.com/apache/spark/pull/18901 [SPARK-21689][YARN] Download user jar from remote in case get hadoop token ⦠â¦failed ## What changes were proposed in this pull request? When use yarn cluster mode,and we need scan hbase,there will be a case which can not work: If we put user jar on hdfs,when local classpath will has no hbase,which will let get hbase token failed.Then later when job submitted to yarn, it will failed since has no token to access hbase table.I mock three cases: 1ï¼user jar is on classpath, and has hbase `17/08/10 13:48:03 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400050 for token HDFS_DELEGATION_TOKEN 17/08/10 13:48:03 INFO security.HadoopDelegationTokenManager: Service hive 17/08/10 13:48:03 INFO security.HadoopDelegationTokenManager: Service hbase 17/08/10 13:48:05 INFO security.HBaseDelegationTokenProvider: Attempting to fetch HBase security token.` Logs showing we can get token normally. 2ï¼user jar on hdfs `17/08/10 13:43:58 WARN security.HBaseDelegationTokenProvider: Class org.apache.hadoop.hbase.HBaseConfiguration not found. 17/08/10 13:43:58 INFO security.HBaseDelegationTokenProvider: Failed to get token from service hbase java.lang.ClassNotFoundException: org.apache.hadoop.hbase.security.token.TokenUtil at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokens(HBaseDelegationTokenProvider.scala:41) at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:112) at org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:109) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)` Logs showing we can get token failed with ClassNotFoundException. If we download user jar from remote first,then things will work correctly.So this patch will download user jar from remote when in yarn cluster mode. ## How was this patch tested? Manually tested by execute spark-submit scripts with different user jars. You can merge this pull request into a Git repository by running: $ git pull https://github.com/caneGuy/spark zhoukang/download-userjar Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18901 commit 31fb394f983313c2ee767bf68220041fa6c84b2e Author: zhoukang Date: 2017-08-09T10:42:43Z [SPARK][YARN] Download user jar from remote in case get hadoop token failed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18901: [SPARK-21689][YARN] Download user jar from remote in cas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18901 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][SQL]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132370096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -89,6 +89,14 @@ object CodeFormatter { } new CodeAndComment(code.result().trim(), map) } + + def stripExtraNewLinesAndComments(input: String): String = { +val commentReg = + ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" + // strip /*comment*/ +"""([ |\t]*?\/\/[\s\S]*?\n)""").r // strip //comment --- End diff -- Ok,modified, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80474/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #80474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80474/testReport)** for PR 18544 at commit [`c41475e`](https://github.com/apache/spark/commit/c41475e3c5a217e5778bbddcd1b4a4210ce5d180). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132368646 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,14 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") +.internal() +.doc("The maximum lines of a single Java function generated by whole-stage codegen. " + + "When the generated function exceeds this threshold, " + + "the whole-stage codegen is deactivated for this subtree of the current query plan.") +.intConf +.createWithDefault(1500) --- End diff -- When I modified it to 1600, the result is: max function length of wholestagecodegen: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative codegen = F467 / 507 1.4 712.7 1.0X codegen = T maxLinesPerFunction = 16003191 / 3238 0.2 4868.7 0.1X codegen = T maxLinesPerFunction = 1500 449 / 482 1.5 685.2 1.0X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132368484 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- I think it can tested by " max function length of wholestagecodegen" added in AggregateBenchmark.scala, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18810 Btw, can you change `[sql]` to `[SQL]` in title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132367400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,14 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") +.internal() +.doc("The maximum lines of a single Java function generated by whole-stage codegen. " + + "When the generated function exceeds this threshold, " + + "the whole-stage codegen is deactivated for this subtree of the current query plan.") +.intConf +.createWithDefault(1500) --- End diff -- I tend to not change current behavior of whole-stage codegen. This might suddenly let user codes not run in whole-stage codegen unintentionally. Shall we make `-1` as default and skip function length check if this config is negative? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132367041 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,14 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") +.internal() +.doc("The maximum lines of a single Java function generated by whole-stage codegen. " + + "When the generated function exceeds this threshold, " + + "the whole-stage codegen is deactivated for this subtree of the current query plan.") +.intConf +.createWithDefault(1500) --- End diff -- I'm not confident about this default value. Is it too small? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/17995 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132366896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -89,6 +89,14 @@ object CodeFormatter { } new CodeAndComment(code.result().trim(), map) } + + def stripExtraNewLinesAndComments(input: String): String = { +val commentReg = + ("""([ |\t]*?\/\*[\s|\S]*?\*\/[ |\t]*?)|""" + // strip /*comment*/ +"""([ |\t]*?\/\/[\s\S]*?\n)""").r // strip //comment --- End diff -- nit: align `// strip //comment` with above `// strip /*comment*/`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132366187 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,14 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +if (ctx.isTooLongGeneratedFunction) { + logWarning("Found too long generated codes and JIT optimization might not work, " + +"Whole-stage codegen disabled for this plan, " + +"You can change the config spark.sql.codegen.MaxFunctionLength " + +"to adjust the function length limit:\n " ++ s"$treeString") + return child.execute() +} --- End diff -- We need to add a test in which we create a query with long generated function, and check if whole-stage codegen is disabled for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132365359 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,13 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") +.internal() +.doc("The maximum lines of a function that will be supported before" + + " deactivating whole-stage codegen.") --- End diff -- Ok,updated,thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80475/testReport)** for PR 18810 at commit [`ce544a5`](https://github.com/apache/spark/commit/ce544a56dbeaa9fecb66706f3d2bad97280835bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132365401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,19 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if there is a codegen function the lines of which is greater than maxLinesPerFunction + * It will count the lines of every codegen function, if there is a function of length + * greater than spark.sql.codegen.maxLinesPerFunction, it will return true. + */ + def existTooLongFunction(): Boolean = { --- End diff -- Ok,updated,thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132365436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,19 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if there is a codegen function the lines of which is greater than maxLinesPerFunction + * It will count the lines of every codegen function, if there is a function of length + * greater than spark.sql.codegen.maxLinesPerFunction, it will return true. + */ + def existTooLongFunction(): Boolean = { +classFunctions.exists { case (className, functions) => + functions.exists{ case (name, code) => +val codeWithoutComments = CodeFormatter.stripExtraNewLinesAndComments(code) +codeWithoutComments.count(_ == '\n') > SQLConf.get.maxLinesPerFunction + } +} + } + /** --- End diff -- Ok, added, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132364612 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- Btw, some strange behaviors might occur: scala> dfFromFile.filter($"_corrupt_record".isNotNull).show +-+---+ |field|_corrupt_record| +-+---+ | null| {"field": "3"}| +-+---+ scala> dfFromFile.filter($"_corrupt_record".isNotNull).select("_corrupt_record").show +---+ |_corrupt_record| +---+ +---+ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132363994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,19 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if there is a codegen function the lines of which is greater than maxLinesPerFunction + * It will count the lines of every codegen function, if there is a function of length + * greater than spark.sql.codegen.maxLinesPerFunction, it will return true. + */ + def existTooLongFunction(): Boolean = { +classFunctions.exists { case (className, functions) => + functions.exists{ case (name, code) => +val codeWithoutComments = CodeFormatter.stripExtraNewLinesAndComments(code) +codeWithoutComments.count(_ == '\n') > SQLConf.get.maxLinesPerFunction + } +} + } + /** --- End diff -- Add one more space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132363687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- Oh. Got it. One issue for this behavior is we can't easily to only retrieve corrupt records by queries like `dfFromFile.select("_corrupt_record")`. This behavior is also inconsistent with RDD-based manipulation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132363283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- Ah, I mean they produced 0 and 3 for each as described in the PR description. I just double checked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132361425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- I've not tried 1.6.3 or 1.5.2. So @HyukjinKwon do you mean above code returns 1 for isNotNull and 2 for isNull? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80472/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132361043 --- Diff: python/pyspark/ml/tests.py --- @@ -1572,7 +1588,8 @@ def test_java_params(self): for name, cls in inspect.getmembers(module, inspect.isclass): if not name.endswith('Model') and issubclass(cls, JavaParams)\ and not inspect.isabstract(cls): -self.check_params(cls()) +# NOTE: disable check_params_exist until there is parity with Scala API +ParamTests.check_params(self, cls(), check_params_exist=False) --- End diff -- This skips param test for Model. Should we do similar check to all models? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)** for PR 18810 at commit [`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132360895 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,19 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if there is a codegen function the lines of which is greater than maxLinesPerFunction + * It will count the lines of every codegen function, if there is a function of length + * greater than spark.sql.codegen.maxLinesPerFunction, it will return true. + */ + def existTooLongFunction(): Boolean = { --- End diff -- > isTooLongGeneratedFunction Nit: remove `()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...
Github user debugger87 commented on the issue: https://github.com/apache/spark/pull/18900 @cloud-fan could you please help me to review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
GitHub user debugger87 opened a pull request: https://github.com/apache/spark/pull/18900 [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition ## What changes were proposed in this pull request? Set createTime for every hive partition created in Spark SQL, which could be used to manage data lifecycle in Hive warehouse. ## How was this patch tested? No tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/debugger87/spark fix/set-create-time-for-hive-partition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18900 commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6 Author: debugger87 Date: 2017-08-10T04:17:00Z [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132360710 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,13 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxLinesPerFunction") +.internal() +.doc("The maximum lines of a function that will be supported before" + + " deactivating whole-stage codegen.") --- End diff -- > The maximum lines of a single Java function generated by whole-stage codegen. When the generated function exceeds this threshold, the whole-stage codegen is deactivated for this subtree of the current query plan. Could you also update the code comments in the other places based on my above update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132360643 --- Diff: python/pyspark/ml/classification.py --- @@ -1325,7 +1325,7 @@ def __init__(self, featuresCol="features", labelCol="label", predictionCol="pred super(MultilayerPerceptronClassifier, self).__init__() self._java_obj = self._new_java_obj( "org.apache.spark.ml.classification.MultilayerPerceptronClassifier", self.uid) -self._setDefault(maxIter=100, tol=1E-4, blockSize=128, stepSize=0.03, solver="l-bfgs") +self._setDefault(maxIter=100, tol=1E-6, blockSize=128, stepSize=0.03, solver="l-bfgs") --- End diff -- Looks like 1e-6 is correct default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80471/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)** for PR 18810 at commit [`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132360069 --- Diff: python/pyspark/ml/tests.py --- @@ -417,6 +417,54 @@ def test_logistic_regression_check_thresholds(self): LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5] ) +@staticmethod +def check_params(test_self, py_stage, check_params_exist=True): +""" +Checks common requirements for Params.params: + - set of params exist in Java and Python and are ordered by names + - param parent has the same UID as the object's UID + - default param value from Java matches value in Python + - optionally check if all params from Java also exist in Python +""" +py_stage_str = "%s %s" % (type(py_stage), py_stage) +if not hasattr(py_stage, "_to_java"): +return +java_stage = py_stage._to_java() +if java_stage is None: +return +test_self.assertEqual(py_stage.uid, java_stage.uid(), msg=py_stage_str) +if check_params_exist: +param_names = [p.name for p in py_stage.params] +java_params = list(java_stage.params()) +java_param_names = [jp.name() for jp in java_params] +test_self.assertEqual( +param_names, sorted(java_param_names), +"Param list in Python does not match Java for %s:\nJava = %s\nPython = %s" +% (py_stage_str, java_param_names, param_names)) --- End diff -- Line 436-443 is the only change to `check_params`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132359678 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -370,6 +370,15 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co override def doExecute(): RDD[InternalRow] = { val (ctx, cleanedSource) = doCodeGen() +val existLongFunction = ctx.existTooLongFunction +if (existLongFunction) { + logWarning(s"Found too long generated codes and JIT optimization might not work, " + +s"Whole-stage codegen disabled for this plan, " + +s"You can change the config spark.sql.codegen.MaxFunctionLength " + +s"to adjust the function length limit:\n " --- End diff -- Please remove the useless `s` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132359369 --- Diff: python/pyspark/ml/wrapper.py --- @@ -144,7 +158,9 @@ def _transfer_params_from_java(self): if self._java_obj.hasParam(param.name): java_param = self._java_obj.getParam(param.name) # SPARK-14931: Only check set params back to avoid default params mismatch. -if self._java_obj.isSet(java_param): +if self._java_obj.isSet(java_param) or ( +# SPARK-10931: Temporary fix for params that have a default in Java +self._java_obj.hasDefault(java_param) and not self.isDefined(param)): --- End diff -- This change will make a default value for a param in java side as an user-provided param value in python side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...
Github user phatak-dev commented on the issue: https://github.com/apache/spark/pull/17972 @MLnick Any updates on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132358656 --- Diff: python/pyspark/ml/wrapper.py --- @@ -263,7 +284,8 @@ def _fit_java(self, dataset): def _fit(self, dataset): java_model = self._fit_java(dataset) -return self._create_model(java_model) +model = self._create_model(java_model) +return self._copyValues(model) --- End diff -- Here I think it is going to copy values from the estimator to the created model. So I think we assume that the params in estimator and model are the same? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80470/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18810 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)** for PR 18810 at commit [`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17849 Sorry, let me try and take a look tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132357684 --- Diff: python/pyspark/ml/wrapper.py --- @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap): paramMap.put([pair]) return paramMap +def _create_params_from_java(self): +""" +SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here +""" +java_params = list(self._java_obj.params()) +from pyspark.ml.param import Param +for java_param in java_params: +java_param_name = java_param.name() +if not hasattr(self, java_param_name): --- End diff -- If self contains a same name attribute which is not a `Param`, should we process it like throw exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #80474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80474/testReport)** for PR 18544 at commit [`c41475e`](https://github.com/apache/spark/commit/c41475e3c5a217e5778bbddcd1b4a4210ce5d180). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132357070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- I am actually rather -0 on this change. Both the current way and the previous way sound not quite compelling to me but the current way at least does arguably unnecessary parsing tries and we started to have this behaviour long time ago.. (at least I tried this in 1.6.3 and 1.5.2): ```scala import org.apache.spark.sql.types._ val schema = new StructType().add("field", ByteType).add("_corrupt_record", StringType) val file = "/tmp/sample.json" val dfFromFile = sqlContext.read.schema(schema).json(file) dfFromFile.filter($"_corrupt_record".isNotNull).count() dfFromFile.filter($"_corrupt_record".isNull).count() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)** for PR 18899 at commit [`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80473/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Pa...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17849#discussion_r132355421 --- Diff: python/pyspark/ml/wrapper.py --- @@ -135,6 +135,20 @@ def _transfer_param_map_to_java(self, pyParamMap): paramMap.put([pair]) return paramMap +def _create_params_from_java(self): +""" +SPARK-10931: Temporary fix to create params that are defined in the Java obj but not here +""" +java_params = list(self._java_obj.params()) +from pyspark.ml.param import Param +for java_param in java_params: +java_param_name = java_param.name() +if not hasattr(self, java_param_name): +param = Param(self, java_param_name, java_param.doc()) +setattr(param, "created_from_java_param", True) --- End diff -- BTW, would you mind if I ask where `created_from_java_param` is used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user weiqingy commented on the issue: https://github.com/apache/spark/pull/17342 @steveloughran Thanks Steve. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/18893 Since they're both small and this is already open I'd say leave it, unless someone ends up having issues with one of the fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18865#discussion_r132352189 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -114,7 +114,16 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { } (file: PartitionedFile) => { - val parser = new JacksonParser(actualSchema, parsedOptions) + // SPARK-21610: when the `requiredSchema` only contains `_corrupt_record`, --- End diff -- What do you think? @cloud-fan @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 You mean we can provide the different type of values with different default values? like int with 0 ,and string with "" ?Or we set the default values when define the table? @gatorsmile @maropu I set the default to Null ,because the "insert into ..." sentence in hive handle in this way, and I want to correspond with Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18895 @byakuinss Please add a doc test in `DataFrame.replace`. There is an example `df4.na.replace('Alice', None).show()`. We want to make sure it works with default value. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17849 Oh, wait, this looks not requiring ML bit much. Will try to give a pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17849 I am rather a backend developer and work together with data scientists. So, my ML knowledge is limited (am studying hard :)). Will leave few comments together if there are some nits and someone starts to review so that they can be addressed together. cc @viirya who I believe knows ML bit and @zero323 who I believe should be able to review this (but now is inactive though), are you maybe able to make a pass for this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80473/testReport)** for PR 18899 at commit [`5dc5c89`](https://github.com/apache/spark/commit/5dc5c89242a0c2a5ac6a693c3703eef8ee160616). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/18899 [SPARK-21680][ML][MLLIB]optimzie Vector coompress ## What changes were proposed in this pull request? When use Vector.compressed to change a Vector to SparseVector, the performance is very low comparing with Vector.toSparse. This is because you have to scan the value three times using Vector.compressed, but you just need two times when use Vector.toSparse. When the length of the vector is large, there is significant performance difference between this two method. ## How was this patch tested? The existing UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark optVectorCompress Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18899 commit 5dc5c89242a0c2a5ac6a693c3703eef8ee160616 Author: Peng Meng Date: 2017-08-10T01:59:17Z optimzie Vector coompress --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18648 ping @jiangxb1987 @cloud-fan anymore suggestionsï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18630 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80468/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18630 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18630 **[Test build #80468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80468/testReport)** for PR 18630 at commit [`c0b0a7d`](https://github.com/apache/spark/commit/c0b0a7d79ca27bbcf91245b3d80070d5d4188174). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18893 @ajbozarth do we need another pr to separate these? if necessary, I will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132347436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,18 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if the length of codegen function is too long or not --- End diff -- Ok, I have modified it, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132347148 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -356,6 +356,18 @@ class CodegenContext { private val placeHolderToComments = new mutable.HashMap[String, String] /** + * Returns if the length of codegen function is too long or not + * It will count the lines of every codegen function, if there is a function of length + * greater than spark.sql.codegen.MaxFunctionLength, it will return true. + */ + def existTooLongFunction(): Boolean = { +classFunctions.exists { case (className, functions) => + functions.exists{ case (name, code) => +CodeFormatter.stripExtraNewLines(code).count(_ == '\n') > SQLConf.get.maxFunctionLength --- End diff -- Ok, I have modified it to count lines without comments and extra new lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132347198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -572,6 +572,13 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val WHOLESTAGE_MAX_FUNCTION_LEN = buildConf("spark.sql.codegen.MaxFunctionLength") --- End diff -- Ok, I have modified it, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80472/testReport)** for PR 18810 at commit [`d44a2f8`](https://github.com/apache/spark/commit/d44a2f8499b4f7b9235fd138349005a4e3c960a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18810: [SPARK-21603][sql]The wholestage codegen will be ...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/18810#discussion_r132347018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -301,6 +301,61 @@ class AggregateBenchmark extends BenchmarkBase { */ } + ignore("max function length of wholestagecodegen") { +val N = 20 << 15 + +val benchmark = new Benchmark("max function length of wholestagecodegen", N) +def f(): Unit = sparkSession.range(N) + .selectExpr( +"id", +"(id & 1023) as k1", +"cast(id & 1023 as double) as k2", +"cast(id & 1023 as int) as k3", +"case when id > 100 and id <= 200 then 1 else 0 end as v1", +"case when id > 200 and id <= 300 then 1 else 0 end as v2", +"case when id > 300 and id <= 400 then 1 else 0 end as v3", +"case when id > 400 and id <= 500 then 1 else 0 end as v4", +"case when id > 500 and id <= 600 then 1 else 0 end as v5", +"case when id > 600 and id <= 700 then 1 else 0 end as v6", +"case when id > 700 and id <= 800 then 1 else 0 end as v7", +"case when id > 800 and id <= 900 then 1 else 0 end as v8", +"case when id > 900 and id <= 1000 then 1 else 0 end as v9", +"case when id > 1000 and id <= 1100 then 1 else 0 end as v10", +"case when id > 1100 and id <= 1200 then 1 else 0 end as v11", +"case when id > 1200 and id <= 1300 then 1 else 0 end as v12", +"case when id > 1300 and id <= 1400 then 1 else 0 end as v13", +"case when id > 1400 and id <= 1500 then 1 else 0 end as v14", +"case when id > 1500 and id <= 1600 then 1 else 0 end as v15", +"case when id > 1600 and id <= 1700 then 1 else 0 end as v16", +"case when id > 1700 and id <= 1800 then 1 else 0 end as v17", +"case when id > 1800 and id <= 1900 then 1 else 0 end as v18") + .groupBy("k1", "k2", "k3") + .sum() + .collect() + +benchmark.addCase(s"codegen = F") { iter => + sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") + f() +} + +benchmark.addCase(s"codegen = T") { iter => + sparkSession.conf.set("spark.sql.codegen.wholeStage", "true") + sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "1") --- End diff -- Ok, I have added a test use the default number 1500, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18893: [SPARK-21675][WebUI]Add a navigation bar at the bottom o...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18893 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80471/testReport)** for PR 18810 at commit [`d3238e9`](https://github.com/apache/spark/commit/d3238e9800f73b39b55e47419c5409b8111ea080). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)** for PR 18895 at commit [`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18810: [SPARK-21603][sql]The wholestage codegen will be much sl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18810 **[Test build #80470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80470/testReport)** for PR 18810 at commit [`d0c753a`](https://github.com/apache/spark/commit/d0c753a5d3f5fbb5e14da0eebbd5e9bd3778126c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18756 In the most cases of `SELECT` statements, `default_value` is `NULL` by default. So, I firstly thought non-specified columns were filled with `NULL`. Anyway, we still have any chance to implement the concept of `DEFAULT`, too? ``` postgresql doc: DEFAULT default_expr ... The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18895 **[Test build #80469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80469/testReport)** for PR 18895 at commit [`8af1e15`](https://github.com/apache/spark/commit/8af1e15f37c750dda53542b5a854f832ff006773). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18895 Could we add the example in the doctest (under 1362L) so that this can be tested and shown in the documentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18882: [SPARK-21652][SQL] Filter out meaningless constraints in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18882 Any activity for cost-based inference? Anyway, thanks! I'll close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18882: [SPARK-21652][SQL] Filter out meaningless constra...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/18882 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org