[GitHub] spark pull request #20564: [SPARK-23378][SQL] move setCurrentDatabase from H...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20564#discussion_r167392786 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -598,8 +598,22 @@ private[hive] class HiveClientImpl( db: String, table: String, newParts: Seq[CatalogTablePartition]): Unit = withHiveState { -val hiveTable = toHiveTable(getTable(db, table), Some(userName)) -shim.alterPartitions(client, table, newParts.map { p => toHivePartition(p, hiveTable) }.asJava) +// Note: Before altering table partitions in Hive, you *must* set the current database +// to the one that contains the table of interest. Otherwise you will end up with the +// most helpful error message ever: "Unable to alter partition. alter is not possible." +// See HIVE-2742 for more detail. +val original = state.getCurrentDatabase +if (databaseExists(db)) { + state.setCurrentDatabase(db) +} else { + throw new NoSuchDatabaseException(db) +} --- End diff -- Instead of copying the codes from `setCurrentDatabase`, why not directly calling it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20564: [SPARK-23378][SQL] move setCurrentDatabase from H...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20564#discussion_r167392724 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1107,11 +1107,6 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } } -// Note: Before altering table partitions in Hive, you *must* set the current database -// to the one that contains the table of interest. Otherwise you will end up with the -// most helpful error message ever: "Unable to alter partition. alter is not possible." -// See HIVE-2742 for more detail. -client.setCurrentDatabase(db) --- End diff -- This does not change the behavior. Just to restore the original current database in the hive client. I do not think the current implementation has any issue, because we do not rely on the current database in the hive client. However, I am also fine to restore the original. This change does not have any harm but just make the codes more stable for the future refactoring. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20407#discussion_r167392674 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -156,6 +156,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ALLOW_NESTEDJOIN_FALLBACK = buildConf("spark.sql.join.broadcastJoinFallback.enabled") +.internal() +.doc("When true (default), if the other options are not available, fallback to try and use " + + "BroadcastNestedLoopJoin as join strategy. This can cause OOM which can be a problem " + + "in some scenarios, eg. when running the thriftserver. Turn to false to disable it: an " + + "AnalysisException will be thrown.") --- End diff -- I see and in general I agree with you, but in this case `BroadcastNestedLoopJoin` is used as a fallback even though we are over the threshold for the broadcast joins. So it is very likely that this is going to throw an OOM. So the rationale of this choice seems to me: I don't have any working join implementation, then just go for this one even though I know in advance that a OOM is likely to occur; anyway I don't have other choices, so if the OOM won't occur I have been lucky and I have been able to run it, otherwise it throws an OOM instead of an `AnalysisException` (so I am not able to run this scenario anyway). And this is fine if we think to normal Spark application, while for long running ones like STS an OOM is much worse than an `AnalysisException`. Therefore, I know that this is not the solution to all OOM problem, but the reason of this PR is that I think that when this rule was introduced, the choice was made "forgetting" of use cases like STS. Do you agree with me? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user mafernandez-stratio commented on the issue: https://github.com/apache/spark/pull/17185 @gatorsmile Great, thanks for the feedback! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20407: [SPARK-23124][SQL] Allow to disable BroadcastNest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20407#discussion_r167392509 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -156,6 +156,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ALLOW_NESTEDJOIN_FALLBACK = buildConf("spark.sql.join.broadcastJoinFallback.enabled") +.internal() +.doc("When true (default), if the other options are not available, fallback to try and use " + + "BroadcastNestedLoopJoin as join strategy. This can cause OOM which can be a problem " + + "in some scenarios, eg. when running the thriftserver. Turn to false to disable it: an " + + "AnalysisException will be thrown.") --- End diff -- OOM could be triggered by various cases. `BroadcastNestedLoopJoin` is being widely used in many cases. If this is the case in your scenario, you can do it in your fork. Also, this PR just helps a very specific case. I do not think it resolves the general issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20561 **[Test build #87279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87279/testReport)** for PR 20561 at commit [`8dab79a`](https://github.com/apache/spark/commit/8dab79a46d9201ec0e43e60d6ef841bf91f4c616). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87280/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20561: [SPARK-23376][SQL] creating UnsafeKVExternalSorter with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20561 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87279/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20531 **[Test build #87280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87280/testReport)** for PR 20531 at commit [`07f2d78`](https://github.com/apache/spark/commit/07f2d781022a44186fe5ab4c5621d80acce0711f). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org