[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19848 i dunno what the requirements are -- I was hoping you would know the hadoop committer semantics better than me! I suppose a uuid is really the only get something globally unique, as you could even have multiple independent spark contexts. I have seen a committer creating a temp directory based on the ID, so you could end up with a collision with them both writing to the same dir. anyway, I'm willling to set this aside as a rare case, the fix here is still a huge improvement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19971 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84913/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19971 **[Test build #84913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84913/testReport)** for PR 19971 at commit [`dacb790`](https://github.com/apache/spark/commit/dacb79039c2abb09a082b817d7c52f2e0902de56). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r156987695 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.ui + +import java.lang.{Long => JLong} +import java.util.Date + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import com.fasterxml.jackson.databind.annotation.JsonDeserialize + +import org.apache.spark.{JobExecutionStatus, SparkConf} +import org.apache.spark.scheduler.SparkListener +import org.apache.spark.status.AppStatusPlugin +import org.apache.spark.status.KVUtils.KVIndexParam +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.Utils +import org.apache.spark.util.kvstore.KVStore + +/** + * Provides a view of a KVStore with methods that make it easy to query SQL-specific state. There's + * no state kept in this class, so it's ok to have multiple instances of it in an application. + */ +private[sql] class SQLAppStatusStore( +store: KVStore, +listener: Option[SQLAppStatusListener] = None) { + + def executionsList(): Seq[SQLExecutionUIData] = { +store.view(classOf[SQLExecutionUIData]).asScala.toSeq + } + + def execution(executionId: Long): Option[SQLExecutionUIData] = { +try { + Some(store.read(classOf[SQLExecutionUIData], executionId)) +} catch { + case _: NoSuchElementException => None +} + } + + def executionsCount(): Long = { +store.count(classOf[SQLExecutionUIData]) + } + + def executionMetrics(executionId: Long): Map[Long, String] = { +def metricsFromStore(): Option[Map[Long, String]] = { + val exec = store.read(classOf[SQLExecutionUIData], executionId) + Option(exec.metricValues) +} + +metricsFromStore() + .orElse(listener.flatMap(_.liveExecutionMetrics(executionId))) + // Try a second time in case the execution finished while this method is trying to + // get the metrics. + .orElse(metricsFromStore()) + .getOrElse(Map()) + } + + def planGraph(executionId: Long): SparkPlanGraph = { +store.read(classOf[SparkPlanGraphWrapper], executionId).toSparkPlanGraph() + } + +} + +/** + * An AppStatusPlugin for handling the SQL UI and listeners. + */ +private[sql] class SQLAppStatusPlugin extends AppStatusPlugin { + + override def setupListeners( + conf: SparkConf, + store: KVStore, + addListenerFn: SparkListener => Unit, + live: Boolean): Unit = { +// For live applications, the listener is installed in [[setupUI]]. This also avoids adding +// the listener when the UI is disabled. Force installation during testing, though. +if (!live || Utils.isTesting) { + val listener = new SQLAppStatusListener(conf, store, live, None) + addListenerFn(listener) +} + } + + override def setupUI(ui: SparkUI): Unit = { --- End diff -- Do we have a clear rule about when `setupListeners` is called and when `setupUI` is called? Here we register `SQLAppStatusListener` in both `setupListeners` and `setupUI`, will we register it twice? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19792 Will take another look soon tomorrow. Sorry that is getting delayed again and again but I just realised this code path is a little bit tricky .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19975 Hi, @tdas and @zsxwing . Could you review this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19792 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19792 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84918/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19792 **[Test build #84918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84918/testReport)** for PR 19792 at commit [`6d171dd`](https://github.com/apache/spark/commit/6d171dda179ecdbe95dbc959c961397e08b8b537). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84912/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19980 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19980 **[Test build #84912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84912/testReport)** for PR 19980 at commit [`c4d9a41`](https://github.com/apache/spark/commit/c4d9a41bae0bdcbb8d336a7fca012ad4c7e1efc1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19976: [SPARK-22660][BUILD] Use position() and limit() to fix a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19976 **[Test build #4011 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4011/testReport)** for PR 19976 at commit [`bcc4c1a`](https://github.com/apache/spark/commit/bcc4c1af60c820877e83592f1bd6b38db498a17f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19976: [SPARK-22660][BUILD] Use position() and limit() to fix a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19976 **[Test build #4010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4010/testReport)** for PR 19976 at commit [`bcc4c1a`](https://github.com/apache/spark/commit/bcc4c1af60c820877e83592f1bd6b38db498a17f). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84910/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #84910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84910/testReport)** for PR 19977 at commit [`a252bab`](https://github.com/apache/spark/commit/a252babda0035b32d4fcb8d76cafeadc5e2549a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84919/testReport)** for PR 19811 at commit [`a9d40e9`](https://github.com/apache/spark/commit/a9d40e9665bbe5e31a1c3dbd76e675c01730b487). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19971 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19980 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19792 **[Test build #84918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84918/testReport)** for PR 19792 at commit [`6d171dd`](https://github.com/apache/spark/commit/6d171dda179ecdbe95dbc959c961397e08b8b537). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19971 **[Test build #84917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84917/testReport)** for PR 19971 at commit [`44abd2f`](https://github.com/apache/spark/commit/44abd2f541701a1ff3cc4e10e686f0de56400a98). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...
Github user gberger commented on a diff in the pull request: https://github.com/apache/spark/pull/19792#discussion_r156971910 --- Diff: python/pyspark/sql/types.py --- @@ -1083,7 +1083,11 @@ def _infer_schema(row): elif hasattr(row, "_fields"): # namedtuple items = zip(row._fields, tuple(row)) else: -names = ['_%d' % i for i in range(1, len(row) + 1)] +if names is None: +names = ['_%d' % i for i in range(1, len(row) + 1)] +elif len(names) < len(row): +names = names[:] --- End diff -- Yes, I did not want to modify the original list since `.extend` is an in-place operation. However, session.py#602 already creates a copy of the list passed by the user, so this copying in `_infer_schema` is actually not necessary. Removing now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19971: [SPARK-22774][SQL][Test] Add compilation check in...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19971#discussion_r156970863 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -348,13 +351,41 @@ class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfte "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + private def checkGeneratedCode(plan: SparkPlan): Unit = { +val codegenSubtrees = new collection.mutable.HashSet[WholeStageCodegenExec]() +plan foreach { + case s: WholeStageCodegenExec => +codegenSubtrees += s + case s => s +} +codegenSubtrees.toSeq.foreach { subtree => + val code = subtree.doCodeGen()._2 + try { +// Just check the generated code can be properly compiled +CodeGenerator.compile(code) + } catch { +case e: Exception => + val msg = +s""" + |failed to compile: + |Subtree: + |$subtree + |Generated code: + |${CodeFormatter.format(code)} + """ --- End diff -- Good catch, while I am using IntelliJ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19861#discussion_r156970195 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2UtilsSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2 + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSQLContext + +class DataSourceV2UtilsSuite extends QueryTest with SharedSQLContext { + + private val keyPrefix = "userDefinedDataSource" + + test("method withSessionConfig() should propagate session configs correctly") { +// Only match configs with keys start with "spark.datasource.${keyPrefix}". +withSQLConf(s"spark.datasource.$keyPrefix.foo.bar" -> "false", --- End diff -- We don't need this heavy `withConf` stuff, just ``` val conf = new SQLConf conf.set(...) ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84911/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19979 **[Test build #84911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84911/testReport)** for PR 19979 at commit [`47dccdd`](https://github.com/apache/spark/commit/47dccdd2531f8f84f38c75139d9223c682a089d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AFTSurvivalRegressionSuite extends MLTest with DefaultReadWriteTest ` * `class DecisionTreeRegressorSuite extends MLTest with DefaultReadWriteTest ` * `class GBTRegressorSuite extends MLTest with DefaultReadWriteTest ` * `class GeneralizedLinearRegressionSuite extends MLTest with DefaultReadWriteTest ` * `class IsotonicRegressionSuite extends MLTest with DefaultReadWriteTest ` * ` case class CheckAnswerRowsByFunc(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19861#discussion_r156969856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import java.util.regex.Pattern + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.internal.SQLConf + +private[sql] object DataSourceV2Utils extends Logging { + + /** + * Helper method that turns session configs with config keys that start with + * `spark.datasource.$keyPrefix` into k/v pairs, the k/v pairs will be used to create data source + * options. + * A session config `spark.datasource.$keyPrefix.xxx -> yyy` will be transformed into + * `xxx -> yyy`. + * + * @param keyPrefix the data source config key prefix to be matched + * @param conf the session conf + * @return an immutable map that contains all the extracted and transformed k/v pairs. + */ + def withSessionConfig( --- End diff -- how about we do more things here? ``` def extractSessionConfigs(ds: DataSourceV2, conf: SQLConf) = ds match { case cs: SessionConfigSupport => ... case _ => Map.empty } ``` and the caller side ``` val options = new DataSourceV2Options((extraOptions ++ extractSessionConfigs(ds, conf)).asJava) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19811 Another use case is for frequently-executed variable (e.g. row variable) for performance. We want to keep it as non-array variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19861: [SPARK-22387][SQL] Propagate session configs to d...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19861#discussion_r156969297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -238,7 +239,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { if (classOf[DataSourceV2].isAssignableFrom(cls)) { cls.newInstance() match { case ds: WriteSupport => - val options = new DataSourceV2Options(extraOptions.asJava) + val dataSource = cls.newInstance() --- End diff -- we can use `ds` directly here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156968572 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -138,21 +138,50 @@ class CodegenContext { /** * Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a - * 3-tuple: java type, variable name, code to init it. - * As an example, ("int", "count", "count = 0;") will produce code: + * 2-tuple: java type, variable name. + * As an example, ("int", "count") will produce code: * {{{ * private int count; * }}} - * as a member variable, and add - * {{{ - * count = 0; - * }}} - * to the constructor. + * as a member variable * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = -mutable.ArrayBuffer.empty[(String, String, String)] + val mutableStates: mutable.ArrayBuffer[(String, String)] = +mutable.ArrayBuffer.empty[(String, String)] + + // An map keyed by mutable states' types holds the status of mutableStateArray + val mutableStateArrayMap: mutable.Map[String, MutableStateArrays] = +mutable.Map.empty[String, MutableStateArrays] + + // An array holds the code that will initialize each state + val mutableStateInitCodes: mutable.ArrayBuffer[String] = --- End diff -- Sure, I would appreciate it if you put the link to a note --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19971: [SPARK-22774][SQL][Test] Add compilation check in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19971#discussion_r156968031 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -348,13 +351,41 @@ class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfte "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + private def checkGeneratedCode(plan: SparkPlan): Unit = { +val codegenSubtrees = new collection.mutable.HashSet[WholeStageCodegenExec]() +plan foreach { + case s: WholeStageCodegenExec => +codegenSubtrees += s + case s => s +} +codegenSubtrees.toSeq.foreach { subtree => + val code = subtree.doCodeGen()._2 + try { +// Just check the generated code can be properly compiled +CodeGenerator.compile(code) + } catch { +case e: Exception => + val msg = +s""" + |failed to compile: + |Subtree: + |$subtree + |Generated code: + |${CodeFormatter.format(code)} + """ --- End diff -- Actually this should be generated by IDE automatically... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156967829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -353,8 +353,8 @@ case class FileSourceScanExec( } val numOutputRows = metricTerm(ctx, "numOutputRows") // PhysicalRDD always just has one input -val input = ctx.freshName("input") -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", v => s"$v = inputs[0];", + forceInline = true) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19971: [SPARK-22774][SQL][Test] Add compilation check in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19971#discussion_r156967793 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -348,13 +351,41 @@ class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfte "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") + private def checkGeneratedCode(plan: SparkPlan): Unit = { +val codegenSubtrees = new collection.mutable.HashSet[WholeStageCodegenExec]() +plan foreach { + case s: WholeStageCodegenExec => +codegenSubtrees += s + case s => s +} +codegenSubtrees.toSeq.foreach { subtree => + val code = subtree.doCodeGen()._2 + try { +// Just check the generated code can be properly compiled +CodeGenerator.compile(code) + } catch { +case e: Exception => + val msg = +s""" + |failed to compile: + |Subtree: + |$subtree + |Generated code: + |${CodeFormatter.format(code)} + """ --- End diff -- `strpMargin` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156967588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -110,8 +110,8 @@ case class RowDataSourceScanExec( override protected def doProduce(ctx: CodegenContext): String = { val numOutputRows = metricTerm(ctx, "numOutputRows") // PhysicalRDD always just has one input -val input = ctx.freshName("input") -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", v => s"$v = inputs[0];", + forceInline = true) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156967475 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -68,30 +68,26 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { */ // TODO: return ColumnarBatch.Rows instead override protected def doProduce(ctx: CodegenContext): String = { -val input = ctx.freshName("input") // PhysicalRDD always just has one input -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", + v => s"$v = inputs[0];", forceInline = true) --- End diff -- In the original implementation, an assignment that refers to a variable was inlined. Let me stop inline. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19811 My major concern is how to reason about when we need to force inline. I can understand the use case of keeping the variable name unchanged(e.g. `MapObjects`), but what about other cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156966971 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1205,14 +1203,14 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B // try conf from env, otherwise create a new one val env = s"${classOf[SparkEnv].getName}.get()" val sparkConf = s"new ${classOf[SparkConf].getName}()" -val serializerInit = s""" - if ($env == null) { -$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance(); +val serializer = ctx.addMutableState(serializerInstanceClass, "serializerForDecode", + v => s""" + if ($env == null) { + $v = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance(); } else { - $serializer = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance(); + $v = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance(); } - """ -ctx.addMutableState(serializerInstanceClass, serializer, serializerInit) + """, forceInline = true) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156967021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -138,21 +138,50 @@ class CodegenContext { /** * Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a - * 3-tuple: java type, variable name, code to init it. - * As an example, ("int", "count", "count = 0;") will produce code: + * 2-tuple: java type, variable name. + * As an example, ("int", "count") will produce code: * {{{ * private int count; * }}} - * as a member variable, and add - * {{{ - * count = 0; - * }}} - * to the constructor. + * as a member variable * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = -mutable.ArrayBuffer.empty[(String, String, String)] + val mutableStates: mutable.ArrayBuffer[(String, String)] = +mutable.ArrayBuffer.empty[(String, String)] + + // An map keyed by mutable states' types holds the status of mutableStateArray + val mutableStateArrayMap: mutable.Map[String, MutableStateArrays] = +mutable.Map.empty[String, MutableStateArrays] + + // An array holds the code that will initialize each state + val mutableStateInitCodes: mutable.ArrayBuffer[String] = --- End diff -- yea let's use `code`, I picked `codes` previously by mistake, you can also fix all of them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156966787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1159,14 +1158,14 @@ case class EncodeUsingSerializer(child: Expression, kryo: Boolean) // try conf from env, otherwise create a new one val env = s"${classOf[SparkEnv].getName}.get()" val sparkConf = s"new ${classOf[SparkConf].getName}()" -val serializerInit = s""" - if ($env == null) { -$serializer = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance(); - } else { - $serializer = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance(); - } - """ -ctx.addMutableState(serializerInstanceClass, serializer, serializerInit) +val serializer = ctx.addMutableState(serializerInstanceClass, "serializerForEncode", v => + s""" + |if ($env == null) { + | $v = ($serializerInstanceClass) new $serializerClass($sparkConf).newInstance(); + |} else { + | $v = ($serializerInstanceClass) new $serializerClass($env.conf()).newInstance(); + |} + """, forceInline = true) --- End diff -- In the original implementation, constructor with non-constant value is inlined. Let me stop inline. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19751 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19751 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84909/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156966742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +192,49 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If false and inline is true, the name is not changed + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the outer class, or an array access if the variable is to be + * stored in an array of variables of the same type and initialization. + * There are two use cases. One is to use the original name for global variable instead + * of fresh name. Second is to use the original initialization statement since it is + * complex (e.g. allocate multi-dimensional array or object constructor has varibles). + * Primitive type variables will be inlined into outer class when the total number of + * mutable variables is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * the max size of an array for compaction is given by + * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`. */ - def addMutableState(javaType: String, variableName: String, initCode: String = ""): Unit = { -mutableStates += ((javaType, variableName, initCode)) + def addMutableState( + javaType: String, + variableName: String, + initFunc: String => String = _ => "", + forceInline: Boolean = false, + useFreshName: Boolean = true): String = { +val varName = if (useFreshName) freshName(variableName) else variableName --- End diff -- isn't it an existing problem? Let's fix it in another PR to make this PR more consistent with the previous code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19751 **[Test build #84909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84909/testReport)** for PR 19751 at commit [`b02ea2c`](https://github.com/apache/spark/commit/b02ea2c4a96e27918c02f75c19d6639daae7cb2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 you are right, then I don't know if there is any valid use case for inferring join condition from literals... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156964736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -484,18 +484,17 @@ case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCa override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, time => { val cal = classOf[Calendar].getName - val c = ctx.freshName("cal") val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") - ctx.addMutableState(cal, c, + val c = ctx.addMutableState(cal, "cal", v => s""" - $c = $cal.getInstance($dtu.getTimeZone("UTC")); - $c.setFirstDayOfWeek($cal.MONDAY); - $c.setMinimalDaysInFirstWeek(4); + |$v = $cal.getInstance($dtu.getTimeZone("UTC")); + |$v.setFirstDayOfWeek($cal.MONDAY); + |$v.setMinimalDaysInFirstWeek(4); """) --- End diff -- good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAndRead
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19941 **[Test build #84916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84916/testReport)** for PR 19941 at commit [`41a2c15`](https://github.com/apache/spark/commit/41a2c151c11a0d690626cae2a39623e712aae764). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156960449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- Ah, I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156959928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -138,21 +138,50 @@ class CodegenContext { /** * Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a - * 3-tuple: java type, variable name, code to init it. - * As an example, ("int", "count", "count = 0;") will produce code: + * 2-tuple: java type, variable name. + * As an example, ("int", "count") will produce code: * {{{ * private int count; * }}} - * as a member variable, and add - * {{{ - * count = 0; - * }}} - * to the constructor. + * as a member variable * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = -mutable.ArrayBuffer.empty[(String, String, String)] + val mutableStates: mutable.ArrayBuffer[(String, String)] = +mutable.ArrayBuffer.empty[(String, String)] + + // An map keyed by mutable states' types holds the status of mutableStateArray + val mutableStateArrayMap: mutable.Map[String, MutableStateArrays] = +mutable.Map.empty[String, MutableStateArrays] + + // An array holds the code that will initialize each state + val mutableStateInitCodes: mutable.ArrayBuffer[String] = --- End diff -- I don't think it is a big deal, but I remember a note by @gatorsmile who advised not to use it anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156958793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- You're right, test the query "INSERT OVERWRITE DIRECTORY '/home/liyuanjian/tmp' USING json SELECT 1 AS a, 'c' as b;". ![image](https://user-images.githubusercontent.com/4833765/33997144-6d159446-e11e-11e7-8bae-f485873d84c3.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19893 The last suspicious big group of threads (at least for me) is broadcast-exchange.* but as I've seen this is not false positive because the threadpool never stopped. In BroadcastExchangeExec:141 a new daemon thread pool created in case of broadcast join which dies when jvm dies. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156954383 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -138,21 +138,50 @@ class CodegenContext { /** * Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a - * 3-tuple: java type, variable name, code to init it. - * As an example, ("int", "count", "count = 0;") will produce code: + * 2-tuple: java type, variable name. + * As an example, ("int", "count") will produce code: * {{{ * private int count; * }}} - * as a member variable, and add - * {{{ - * count = 0; - * }}} - * to the constructor. + * as a member variable * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = -mutable.ArrayBuffer.empty[(String, String, String)] + val mutableStates: mutable.ArrayBuffer[(String, String)] = +mutable.ArrayBuffer.empty[(String, String)] + + // An map keyed by mutable states' types holds the status of mutableStateArray + val mutableStateArrayMap: mutable.Map[String, MutableStateArrays] = +mutable.Map.empty[String, MutableStateArrays] + + // An array holds the code that will initialize each state + val mutableStateInitCodes: mutable.ArrayBuffer[String] = --- End diff -- You are right in English. We are seeing some `codes` variable in source files. Is this renaming a big deal? WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156953610 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala --- @@ -219,4 +219,31 @@ class GeneratedProjectionSuite extends SparkFunSuite { // - one is the mutableRow assert(globalVariables.length == 3) } + + test("SPARK-18016: generated projections on wider table requiring state compaction") { +val N = 6000 +val wideRow1 = new GenericInternalRow((0 until N).toArray[Any]) +val schema1 = StructType((1 to N).map(i => StructField("", IntegerType))) +val wideRow2 = new GenericInternalRow( + (0 until N).map(i => UTF8String.fromString(i.toString)).toArray[Any]) +val schema2 = StructType((1 to N).map(i => StructField("", StringType))) +val joined = new JoinedRow(wideRow1, wideRow2) +val joinedSchema = StructType(schema1 ++ schema2) +val nested = new JoinedRow(InternalRow(joined, joined), joined) +val nestedSchema = StructType( + Seq(StructField("", joinedSchema), StructField("", joinedSchema)) ++ joinedSchema) + +val safeProj = FromUnsafeProjection(nestedSchema) +val result = safeProj(nested) + +// test generated MutableProjection +val exprs = nestedSchema.fields.zipWithIndex.map { case (f, i) => + BoundReference(i, f.dataType, true) +} +val mutableProj = GenerateMutableProjection.generate(exprs) --- End diff -- cc @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156953675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +192,49 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If false and inline is true, the name is not changed + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the outer class, or an array access if the variable is to be + * stored in an array of variables of the same type and initialization. + * There are two use cases. One is to use the original name for global variable instead + * of fresh name. Second is to use the original initialization statement since it is + * complex (e.g. allocate multi-dimensional array or object constructor has varibles). + * Primitive type variables will be inlined into outer class when the total number of + * mutable variables is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * the max size of an array for compaction is given by + * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`. */ - def addMutableState(javaType: String, variableName: String, initCode: String = ""): Unit = { -mutableStates += ((javaType, variableName, initCode)) + def addMutableState( + javaType: String, + variableName: String, + initFunc: String => String = _ => "", + forceInline: Boolean = false, + useFreshName: Boolean = true): String = { +val varName = if (useFreshName) freshName(variableName) else variableName --- End diff -- Good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19980 LGTM pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19893 **[Test build #84915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84915/testReport)** for PR 19893 at commit [`ef00796`](https://github.com/apache/spark/commit/ef007969a576dd5f84440ce78a325d6617dc5278). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19893 Seems like the new feature caught some false positives in SQL: ``` = THREAD AUDIT POST ACTION CALLED WITHOUT PRE ACTION IN SUITE o.a.s.sql.sources.DataSourceAnalysisSuite = = THREAD AUDIT POST ACTION CALLED WITHOUT PRE ACTION IN SUITE o.a.s.sql.SessionStateSuite = ``` Fixed them as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84907/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19893 **[Test build #84907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84907/testReport)** for PR 19893 at commit [`e2da334`](https://github.com/apache/spark/commit/e2da334560f764af71b6677f2e4c7c772063c229). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19861: [SPARK-22387][SQL] Propagate session configs to data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19861 **[Test build #84914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84914/testReport)** for PR 19861 at commit [`52aaf51`](https://github.com/apache/spark/commit/52aaf51a9ef0d3b2517ee26cff58d7f281433881). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19971 **[Test build #84913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84913/testReport)** for PR 19971 at commit [`dacb790`](https://github.com/apache/spark/commit/dacb79039c2abb09a082b817d7c52f2e0902de56). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19971: [SPARK-22774][SQL][Test] Add compilation check in...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19971#discussion_r156945302 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala --- @@ -29,7 +32,7 @@ import org.apache.spark.util.Utils * This test suite ensures all the TPC-DS queries can be successfully analyzed and optimized * without hitting the max iteration threshold. */ -class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll { +class TPCDSQuerySuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll with Logging { --- End diff -- nit: drop `Logging` and the import --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156943716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -68,30 +68,26 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { */ // TODO: return ColumnarBatch.Rows instead override protected def doProduce(ctx: CodegenContext): String = { -val input = ctx.freshName("input") // PhysicalRDD always just has one input -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", + v => s"$v = inputs[0];", forceInline = true) // metrics val numOutputRows = metricTerm(ctx, "numOutputRows") val scanTimeMetric = metricTerm(ctx, "scanTime") -val scanTimeTotalNs = ctx.freshName("scanTime") -ctx.addMutableState(ctx.JAVA_LONG, scanTimeTotalNs, s"$scanTimeTotalNs = 0;") +val scanTimeTotalNs = ctx.addMutableState(ctx.JAVA_LONG, "scanTime") --- End diff -- I see. Let us leave default value as a comment for clarity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19792#discussion_r156941615 --- Diff: python/pyspark/sql/types.py --- @@ -1083,7 +1083,11 @@ def _infer_schema(row): elif hasattr(row, "_fields"): # namedtuple items = zip(row._fields, tuple(row)) else: -names = ['_%d' % i for i in range(1, len(row) + 1)] +if names is None: +names = ['_%d' % i for i in range(1, len(row) + 1)] +elif len(names) < len(row): +names = names[:] --- End diff -- Hm .. why we do this? to copy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in (0, 2, 3, 4)``. table 'a' ``` a1 a2 1 2 3 3 4 5 ``` table 'b' ``` b1 b2 1 -1 2 -2 3 -4 ``` ``` SELECT * FROM a, b WHERE a1 in (1, 2) AND b1 in (1, 2) // 1 2 1 -1 // 1 2 2 -2 ``` ``` SELECT * FROM a JOIN b ON a.a1 = b.b1 WHERE a1 in (1, 2) AND b1 in (1, 2) // 1 2 1 -1 ``` Do I miss anything? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156941474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +192,49 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If false and inline is true, the name is not changed + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the outer class, or an array access if the variable is to be + * stored in an array of variables of the same type and initialization. + * There are two use cases. One is to use the original name for global variable instead + * of fresh name. Second is to use the original initialization statement since it is + * complex (e.g. allocate multi-dimensional array or object constructor has varibles). + * Primitive type variables will be inlined into outer class when the total number of + * mutable variables is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * the max size of an array for compaction is given by + * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`. */ - def addMutableState(javaType: String, variableName: String, initCode: String = ""): Unit = { -mutableStates += ((javaType, variableName, initCode)) + def addMutableState( + javaType: String, + variableName: String, + initFunc: String => String = _ => "", + forceInline: Boolean = false, + useFreshName: Boolean = true): String = { +val varName = if (useFreshName) freshName(variableName) else variableName --- End diff -- Since I noticed that most of caller sides executes `freshName`, I decided to use the new style that can simply caller code. If a developer want to guarantee the given name is unique at caller site (currently, they are only several cases), it is OK by using `useFreshName = true`. Do we need redundant code at caller side? WDYT? @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156940683 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala --- @@ -219,4 +219,31 @@ class GeneratedProjectionSuite extends SparkFunSuite { // - one is the mutableRow assert(globalVariables.length == 3) } + + test("SPARK-18016: generated projections on wider table requiring state compaction") { +val N = 6000 +val wideRow1 = new GenericInternalRow((0 until N).toArray[Any]) +val schema1 = StructType((1 to N).map(i => StructField("", IntegerType))) +val wideRow2 = new GenericInternalRow( + (0 until N).map(i => UTF8String.fromString(i.toString)).toArray[Any]) +val schema2 = StructType((1 to N).map(i => StructField("", StringType))) +val joined = new JoinedRow(wideRow1, wideRow2) +val joinedSchema = StructType(schema1 ++ schema2) +val nested = new JoinedRow(InternalRow(joined, joined), joined) +val nestedSchema = StructType( + Seq(StructField("", joinedSchema), StructField("", joinedSchema)) ++ joinedSchema) + +val safeProj = FromUnsafeProjection(nestedSchema) +val result = safeProj(nested) + +// test generated MutableProjection +val exprs = nestedSchema.fields.zipWithIndex.map { case (f, i) => + BoundReference(i, f.dataType, true) +} +val mutableProj = GenerateMutableProjection.generate(exprs) --- End diff -- Current `GenerateUnsafeProjection` uses only the fixed number of mutable states. IIUC, we cannot do such a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19980 **[Test build #84912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84912/testReport)** for PR 19980 at commit [`c4d9a41`](https://github.com/apache/spark/commit/c4d9a41bae0bdcbb8d336a7fca012ad4c7e1efc1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSe...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/19980 [SPARK-22785][SQL] remove ColumnVector.anyNullsSet ## What changes were proposed in this pull request? `ColumnVector.anyNullsSet` is not called anywhere except tests, and we can easily replace it with `ColumnVector.numNulls > 0` ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19980 commit c4d9a41bae0bdcbb8d336a7fca012ad4c7e1efc1 Author: Wenchen FanDate: 2017-12-14T12:53:20Z remove ColumnVector.anyNullsSet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19980: [SPARK-22785][SQL] remove ColumnVector.anyNullsSet
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19980 cc @ueshin @kiszk @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156935514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -138,21 +138,50 @@ class CodegenContext { /** * Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a - * 3-tuple: java type, variable name, code to init it. - * As an example, ("int", "count", "count = 0;") will produce code: + * 2-tuple: java type, variable name. + * As an example, ("int", "count") will produce code: * {{{ * private int count; * }}} - * as a member variable, and add - * {{{ - * count = 0; - * }}} - * to the constructor. + * as a member variable * * They will be kept as member variables in generated classes like `SpecificProjection`. */ - val mutableStates: mutable.ArrayBuffer[(String, String, String)] = -mutable.ArrayBuffer.empty[(String, String, String)] + val mutableStates: mutable.ArrayBuffer[(String, String)] = +mutable.ArrayBuffer.empty[(String, String)] + + // An map keyed by mutable states' types holds the status of mutableStateArray + val mutableStateArrayMap: mutable.Map[String, MutableStateArrays] = +mutable.Map.empty[String, MutableStateArrays] + + // An array holds the code that will initialize each state + val mutableStateInitCodes: mutable.ArrayBuffer[String] = --- End diff -- nit: since code is uncountable, maybe we can rename to `mutableStatesInitCode` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19974: [SPARK-22779][sql] Resolve default values for fal...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19974#discussion_r156935029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -280,4 +280,34 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + + test("SPARK-22779: correctly compute default value for fallback configs") { +val fallback = SQLConf.buildConf("spark.sql.__test__.spark_22779") + .fallbackConf(SQLConf.PARQUET_COMPRESSION) --- End diff -- Yea sounds like this PR is fixing a non-existing issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156932945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -71,17 +71,15 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { } override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { -val stopEarly = ctx.freshName("stopEarly") -ctx.addMutableState(ctx.JAVA_BOOLEAN, stopEarly, s"$stopEarly = false;") +val stopEarly = ctx.addMutableState(ctx.JAVA_BOOLEAN, "stopEarly") ctx.addNewFunction("stopEarly", s""" @Override protected boolean stopEarly() { return $stopEarly; } """, inlineToOuterClass = true) -val countTerm = ctx.freshName("count") -ctx.addMutableState(ctx.JAVA_INT, countTerm, s"$countTerm = 0;") +val countTerm = ctx.addMutableState(ctx.JAVA_INT, "count") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156932923 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -71,17 +71,15 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { } override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { -val stopEarly = ctx.freshName("stopEarly") -ctx.addMutableState(ctx.JAVA_BOOLEAN, stopEarly, s"$stopEarly = false;") +val stopEarly = ctx.addMutableState(ctx.JAVA_BOOLEAN, "stopEarly") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156932789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +192,49 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If false and inline is true, the name is not changed + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the outer class, or an array access if the variable is to be + * stored in an array of variables of the same type and initialization. + * There are two use cases. One is to use the original name for global variable instead + * of fresh name. Second is to use the original initialization statement since it is + * complex (e.g. allocate multi-dimensional array or object constructor has varibles). + * Primitive type variables will be inlined into outer class when the total number of + * mutable variables is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * the max size of an array for compaction is given by + * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`. */ - def addMutableState(javaType: String, variableName: String, initCode: String = ""): Unit = { -mutableStates += ((javaType, variableName, initCode)) + def addMutableState( + javaType: String, + variableName: String, + initFunc: String => String = _ => "", + forceInline: Boolean = false, + useFreshName: Boolean = true): String = { +val varName = if (useFreshName) freshName(variableName) else variableName --- End diff -- I think this can be moved in the if for clarity --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156932214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -68,30 +68,26 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { */ // TODO: return ColumnarBatch.Rows instead override protected def doProduce(ctx: CodegenContext): String = { -val input = ctx.freshName("input") // PhysicalRDD always just has one input -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", + v => s"$v = inputs[0];", forceInline = true) // metrics val numOutputRows = metricTerm(ctx, "numOutputRows") val scanTimeMetric = metricTerm(ctx, "scanTime") -val scanTimeTotalNs = ctx.freshName("scanTime") -ctx.addMutableState(ctx.JAVA_LONG, scanTimeTotalNs, s"$scanTimeTotalNs = 0;") +val scanTimeTotalNs = ctx.addMutableState(ctx.JAVA_LONG, "scanTime") val columnarBatchClz = classOf[ColumnarBatch].getName -val batch = ctx.freshName("batch") -ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;") +val batch = ctx.addMutableState(columnarBatchClz, "batch") -val idx = ctx.freshName("batchIdx") -ctx.addMutableState(ctx.JAVA_INT, idx, s"$idx = 0;") -val colVars = output.indices.map(i => ctx.freshName("colInstance" + i)) +val idx = ctx.addMutableState(ctx.JAVA_INT, "batchIdx") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156932172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -68,30 +68,26 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { */ // TODO: return ColumnarBatch.Rows instead override protected def doProduce(ctx: CodegenContext): String = { -val input = ctx.freshName("input") // PhysicalRDD always just has one input -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", + v => s"$v = inputs[0];", forceInline = true) // metrics val numOutputRows = metricTerm(ctx, "numOutputRows") val scanTimeMetric = metricTerm(ctx, "scanTime") -val scanTimeTotalNs = ctx.freshName("scanTime") -ctx.addMutableState(ctx.JAVA_LONG, scanTimeTotalNs, s"$scanTimeTotalNs = 0;") +val scanTimeTotalNs = ctx.addMutableState(ctx.JAVA_LONG, "scanTime") --- End diff -- I see that the initialization is not needed since 0 is the default value, but maybe we can leave it for clarity? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19976: [SPARK-22660][BUILD] Use position() and limit() to fix a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19976 **[Test build #4010 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4010/testReport)** for PR 19976 at commit [`bcc4c1a`](https://github.com/apache/spark/commit/bcc4c1af60c820877e83592f1bd6b38db498a17f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19978: [SPARK-22784][CORE] Configure reading buffer size in Spa...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19978 That seems quite large, but I'm only speculating. If it defaulted to something that's just not tiny, would that help? 1M events? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156930498 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -189,15 +255,37 @@ class CodegenContext { def declareMutableStates(): String = { // It's possible that we add same mutable state twice, e.g. the `mergeExpressions` in // `TypedAggregateExpression`, we should call `distinct` here to remove the duplicated ones. -mutableStates.distinct.map { case (javaType, variableName, _) => +val inlinedStates = mutableStates.distinct.map { case (javaType, variableName) => s"private $javaType $variableName;" -}.mkString("\n") +} + +val arrayStates = mutableStateArrayMap.flatMap { case (javaType, mutableStateArrays) => + val numArrays = mutableStateArrays.arrayNames.size + mutableStateArrays.arrayNames.zipWithIndex.map { case (arrayName, index) => +val length = if (index + 1 == numArrays) { + mutableStateArrays.getCurrentIndex +} else { + CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT --- End diff -- sorry, stupid question. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156930403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,11 +192,49 @@ class CodegenContext { * the list of default imports available. * Also, generic type arguments are accepted but ignored. * @param variableName Name of the field. - * @param initCode The statement(s) to put into the init() method to initialize this field. + * @param initFunc Function includes statement(s) to put into the init() method to initialize + * this field. The argument is the name of the mutable state variable. * If left blank, the field will be default-initialized. + * @param forceInline whether the declaration and initialization code may be inlined rather than + *compacted. Please set `true` into forceInline, if you want to access the + *status fast (e.g. frequently accessed) or if you want to use the original + *variable name + * @param useFreshName If false and inline is true, the name is not changed + * @return the name of the mutable state variable, which is either the original name if the + * variable is inlined to the outer class, or an array access if the variable is to be + * stored in an array of variables of the same type and initialization. + * There are two use cases. One is to use the original name for global variable instead + * of fresh name. Second is to use the original initialization statement since it is + * complex (e.g. allocate multi-dimensional array or object constructor has varibles). + * Primitive type variables will be inlined into outer class when the total number of + * mutable variables is less than `CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD` + * the max size of an array for compaction is given by + * `CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT`. */ - def addMutableState(javaType: String, variableName: String, initCode: String = ""): Unit = { -mutableStates += ((javaType, variableName, initCode)) + def addMutableState( + javaType: String, + variableName: String, + initFunc: String => String = _ => "", + forceInline: Boolean = false, + useFreshName: Boolean = true): String = { +val varName = if (useFreshName) freshName(variableName) else variableName --- End diff -- instead of calling `freshName` here and adding a `useFreshName` parameter, can we follow the previous style and ask the caller side to guarantee the given name is unique? i.e. call `freshName` at caller side --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19971 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84906/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19971: [SPARK-22774][SQL][Test] Add compilation check into TPCD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19971 **[Test build #84906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84906/testReport)** for PR 19971 at commit [`7d2c599`](https://github.com/apache/spark/commit/7d2c5995070dd0ef56d3d75bca30d3fbfc8443d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19979 **[Test build #84911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84911/testReport)** for PR 19979 at commit [`47dccdd`](https://github.com/apache/spark/commit/47dccdd2531f8f84f38c75139d9223c682a089d6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19979: [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression ...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19979 [SPARK-22644][ML][TEST][FOLLOW-UP] ML regression testsuite add StructuredStreaming test ## What changes were proposed in this pull request? ML regression testsuite add StructuredStreaming test In order to make testsuite easier to modify, new helper function added in `MLTest`: ``` def testTransformerByGlobalCheckFunc[A : Encoder]( dataframe: DataFrame, transformer: Transformer, firstResultCol: String, otherResultCols: String*) (globalCheckFunction: Seq[Row] => Unit): Unit ``` ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark ml_stream_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19979 commit 47dccdd2531f8f84f38c75139d9223c682a089d6 Author: WeichenXuDate: 2017-12-14T12:14:56Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19978: [SPARK-22784][CORE] Configure reading buffer size in Spa...
Github user MikhailErofeev commented on the issue: https://github.com/apache/spark/pull/19978 I don't mind to just set it to a higher value. Moreover, the current default value (2048) is small in any case. For my log files, 30M buffer was the best value (a bigger one did not bring a lot of speedup), although for other files the optimal value could be bigger. What do you think? Is it ok to keep the value as 30M? With 50 cores it could eat 1.5G. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156928231 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -110,8 +110,8 @@ case class RowDataSourceScanExec( override protected def doProduce(ctx: CodegenContext): String = { val numOutputRows = metricTerm(ctx, "numOutputRows") // PhysicalRDD always just has one input -val input = ctx.freshName("input") -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", v => s"$v = inputs[0];", + forceInline = true) --- End diff -- why inline --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156928270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -353,8 +353,8 @@ case class FileSourceScanExec( } val numOutputRows = metricTerm(ctx, "numOutputRows") // PhysicalRDD always just has one input -val input = ctx.freshName("input") -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", v => s"$v = inputs[0];", + forceInline = true) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156928205 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -68,30 +68,26 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { */ // TODO: return ColumnarBatch.Rows instead override protected def doProduce(ctx: CodegenContext): String = { -val input = ctx.freshName("input") // PhysicalRDD always just has one input -ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") +val input = ctx.addMutableState("scala.collection.Iterator", "input", + v => s"$v = inputs[0];", forceInline = true) --- End diff -- why inline? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156926848 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala --- @@ -219,4 +219,31 @@ class GeneratedProjectionSuite extends SparkFunSuite { // - one is the mutableRow assert(globalVariables.length == 3) } + + test("SPARK-18016: generated projections on wider table requiring state compaction") { +val N = 6000 +val wideRow1 = new GenericInternalRow((0 until N).toArray[Any]) +val schema1 = StructType((1 to N).map(i => StructField("", IntegerType))) +val wideRow2 = new GenericInternalRow( + (0 until N).map(i => UTF8String.fromString(i.toString)).toArray[Any]) +val schema2 = StructType((1 to N).map(i => StructField("", StringType))) +val joined = new JoinedRow(wideRow1, wideRow2) +val joinedSchema = StructType(schema1 ++ schema2) +val nested = new JoinedRow(InternalRow(joined, joined), joined) +val nestedSchema = StructType( + Seq(StructField("", joinedSchema), StructField("", joinedSchema)) ++ joinedSchema) + +val safeProj = FromUnsafeProjection(nestedSchema) +val result = safeProj(nested) + +// test generated MutableProjection +val exprs = nestedSchema.fields.zipWithIndex.map { case (f, i) => + BoundReference(i, f.dataType, true) +} +val mutableProj = GenerateMutableProjection.generate(exprs) --- End diff -- shall we also test `GenerateUnsafeProjection`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #84910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84910/testReport)** for PR 19977 at commit [`a252bab`](https://github.com/apache/spark/commit/a252babda0035b32d4fcb8d76cafeadc5e2549a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156926555 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala --- @@ -219,4 +219,31 @@ class GeneratedProjectionSuite extends SparkFunSuite { // - one is the mutableRow assert(globalVariables.length == 3) } + + test("SPARK-18016: generated projections on wider table requiring state compaction") { +val N = 6000 +val wideRow1 = new GenericInternalRow((0 until N).toArray[Any]) +val schema1 = StructType((1 to N).map(i => StructField("", IntegerType))) +val wideRow2 = new GenericInternalRow( + (0 until N).map(i => UTF8String.fromString(i.toString)).toArray[Any]) --- End diff -- nit: `Array[Any].fill(N)(i => UTF8String.fromString(i.toString))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156926377 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala --- @@ -219,4 +219,31 @@ class GeneratedProjectionSuite extends SparkFunSuite { // - one is the mutableRow assert(globalVariables.length == 3) } + + test("SPARK-18016: generated projections on wider table requiring state compaction") { +val N = 6000 +val wideRow1 = new GenericInternalRow((0 until N).toArray[Any]) --- End diff -- (0 until N).toArray[Any] -> new Array[Any](N) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156926146 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -401,4 +401,26 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { ctx.addReferenceObj("foo", foo) assert(ctx.mutableStates.isEmpty) } + + test("SPARK-18016: def mutable states by using an array") { +val ctx1 = new CodegenContext +for (i <- 1 to CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD + 10) { + ctx1.addMutableState(ctx1.JAVA_INT, "i", v => s"$v = $i;") +} +assert(ctx1.mutableStates.size == CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD) +// When the number of primitive type mutable states is over the threshold, others are +// allocated into an array + assert(ctx1.mutableStateArrayMap.get(ctx1.JAVA_INT).get.arrayNames.size == 1) +assert(ctx1.mutableStateInitCodes.size == CodeGenerator.OUTER_CLASS_VARIABLES_THRESHOLD + 10) + +val ctx2 = new CodegenContext +for (i <- 1 to CodeGenerator.MUTABLESTATEARRAY_SIZE_LIMIT + 10) { + ctx2.addMutableState("InternalRow[]", "r", v => s"$v = new InternalRow[$i];") +} +// When the number of non-primitive type mutable states is over the threshold, others are +// allocated into a new array + assert(ctx2.mutableStateArrayMap.get("InternalRow[]").get.arrayNames.size == 2) --- End diff -- and `assert(ctx2.mutableStateArrayMap("InternalRow[]").getCurrentIndex == 10)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84905/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org