[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17083 Ping @vanzin , do you have any further comments? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17128 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73729/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17128 **[Test build #73729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73729/testReport)** for PR 17128 at commit [`f5daeae`](https://github.com/apache/spark/commit/f5daeae056fdae4ef42282206173f8484498968e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17034 **[Test build #73738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73738/testReport)** for PR 17034 at commit [`31293b2`](https://github.com/apache/spark/commit/31293b2dc9483b8bcf7639420a23fc4f2b219598). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17122 **[Test build #73737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73737/testReport)** for PR 17122 at commit [`6697928`](https://github.com/apache/spark/commit/6697928e4ff8cf93c4b63c9b6e4b18bec4a2f87a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16971 **[Test build #73739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73739/testReport)** for PR 16971 at commit [`2071aae`](https://github.com/apache/spark/commit/2071aaec4cb3805c2cebbf2732f274d881182f3d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103865522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- ok~ let me fix it~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17124: [SPARK-19779][SS]Delete needless tmp file after r...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17124#discussion_r103864707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -282,8 +282,12 @@ private[state] class HDFSBackedStateStoreProvider( // target file will break speculation, skipping the rename step is the only choice. It's still // semantically correct because Structured Streaming requires rerunning a batch should // generate the same output. (SPARK-19677) + // Also, a tmp file of delta file that generated by the first batch after restart --- End diff -- This comment is not 100% correct, this may also happen in a speculation task. This PR is just a follow up to delete the temp file that #17012 forgot to do it. IMO, not need to add a comment for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17124: [SPARK-19779][SS]Delete needless tmp file after r...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17124#discussion_r103865389 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala --- @@ -295,6 +295,28 @@ class StateStoreSuite extends SparkFunSuite with BeforeAndAfter with PrivateMeth provider.getStore(0).commit() } + test("SPARK-19779: A tmp file of delta file should not be reserved on HDFS " + --- End diff -- Instead of adding a new test, I prefer to just add several lines to the above `SPARK-19677: Committing a delta file atop an existing one should not fail on HDFS`. E.g. ``` test("SPARK-19677: Committing a delta file atop an existing one should not fail on HDFS") { val conf = new Configuration() conf.set("fs.fake.impl", classOf[RenameLikeHDFSFileSystem].getName) conf.set("fs.default.name", "fake:///") val provider = newStoreProvider(hadoopConf = conf) provider.getStore(0).commit() provider.getStore(0).commit() // Verify we don't leak temp files val tempFiles = FileUtils.listFiles(new File(provider.id.checkpointLocation), null, true) .asScala.filter(_.getName.contains("temp-")) assert(tempFiles.isEmpty) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103865408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- can we fix it at parser side? cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103865312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- but it will be only used in `getDatabase`, and we can save a metastore call to get the default database. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103865206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop + protected var shouldStopRequired: Boolean = false + --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103865202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop --- End diff -- Updated comments around here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17097: [SPARK-19765][SQL] UNCACHE TABLE should re-cache all cac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17097 **[Test build #73736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73736/testReport)** for PR 17097 at commit [`e881f29`](https://github.com/apache/spark/commit/e881f29bf5839af2f2ed723ccdb77516c795ef90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103864843 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala --- @@ -193,7 +193,7 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue) } - test("changePrecision() on compact decimal should respect rounding mode") { + test("changePrecision/toPrecission on compact decimal should respect rounding mode") { --- End diff -- Thanks, fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103864772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with Serializable { def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else this def floor: Decimal = if (scale == 0) this else { -val value = this.clone() -value.changePrecision( - DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) -value +toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) + .getOrElse(clone()) } def ceil: Decimal = if (scale == 0) this else { -val value = this.clone() -value.changePrecision( - DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_CEILING) -value +toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_CEILING) + .getOrElse(clone()) --- End diff -- See my comment above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103864736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with Serializable { def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else this def floor: Decimal = if (scale == 0) this else { -val value = this.clone() -value.changePrecision( - DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) -value +toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) + .getOrElse(clone()) --- End diff -- You're right, thanks. My suggestion is to raise an internal error if setting new precision in `floor` or `ceil` would fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16910#discussion_r103864544 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1588,6 +1590,153 @@ class HiveDDLSuite } } + test("insert data to a hive serde table which has a non-existing location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a string, b int) + |USING hive + |LOCATION '$dir' --- End diff -- ok~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17132: [SPARK-19792][webui]In the Master Page,the column named ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17132 **[Test build #3591 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3591/testReport)** for PR 17132 at commit [`6794b6b`](https://github.com/apache/spark/commit/6794b6bfc1def36c70471c75f6c2f9b188b23add). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103864520 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala --- @@ -233,6 +233,18 @@ class MathFunctionsSuite extends QueryTest with SharedSQLContext { ) } + test("round/bround with data frame from a local Seq of Product") { +val df = spark.createDataFrame(Seq(NumericRow(BigDecimal("5.9" --- End diff -- Actually, the problem occurs only when creating data frame from `Product`. Unable to reproduce the issue with `Seq(BigDecimal("5.9")).toDF("value")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17076 @yanboliang yeah I agree we can do it in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17076 +1 @MLnick Three lines change, updated here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103864255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with Serializable { def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else this def floor: Decimal = if (scale == 0) this else { -val value = this.clone() -value.changePrecision( - DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) -value +toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 0, ROUND_FLOOR) --- End diff -- Theoretically, it should be `Some`. On the other hand if something goes wrong when setting new precision in `floor` or `ceil`, I would raise an internal error: def floor: Decimal = if (scale == 0) this else { val newPrecision = DecimalType.bounded(precision - scale + 1, 0).precision toPrecision(newPrecision, 0, ROUND_FLOOR).getOrElse( throw new AnalysisException(s"Overflow when setting precision to $newPrecision")) } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103864206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -364,7 +364,12 @@ case class DataSource( catalogTable.get, catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize)) } else { - new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema)) --- End diff -- ok, I think it is more reasonable~ thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103864147 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -27,6 +27,9 @@ import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.random.{ExponentialGenerator, WeibullGenerator} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.types.{ByteType, DecimalType, FloatType, IntegerType, LongType, + ShortType} --- End diff -- The style rule is generally to use `_` when you're importing >= 5 things. You can revert it back, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103863975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- if we pass a defaultDB, it seems like we introduce an instance of defaultDB as we discussed above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17076 @imatiach-msft `LinearRegression`, `LogisticRegression` and `AFTSurvivalRegression` do not have the `lazy` - they only do `private val gradientSumArray ...` so would need to be updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103863856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- ``` queryNoWith : insertInto? queryTerm queryOrganization #singleInsertQuery | fromClause multiInsertQueryBody+ #multiInsertQuery ; ``` Seems we have mixed them together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73731/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17034 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103863771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -223,12 +223,24 @@ final class Decimal extends Ordered[Decimal] with Serializable { } /** + * Create new `Decimal` with given precision and scale. + * + * @return `Some(decimal)` if successful or `None` if overflow would occur + */ + private[sql] def toPrecision(precision: Int, scale: Int, --- End diff -- Fixed, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17034 **[Test build #73731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73731/testReport)** for PR 17034 at commit [`0185b45`](https://github.com/apache/spark/commit/0185b454aaa043406c39d1e8f19c98d3d345a836). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103863738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -339,36 +339,34 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String } /** - * Change the precision / scale in a given decimal to those set in `decimalType` (if any), - * returning null if it overflows or modifying `value` in-place and returning it if successful. + * Create new `Decimal` with precision and scale given in `decimalType` (if any), + * returning null if it overflows or creating a new `value` and returning it if successful. * - * NOTE: this modifies `value` in-place, so don't call it on external data. */ - private[this] def changePrecision(value: Decimal, decimalType: DecimalType): Decimal = { -if (value.changePrecision(decimalType.precision, decimalType.scale)) value else null - } + private[this] def toPrecision(value: Decimal, decimalType: DecimalType): Decimal = +value.toPrecision(decimalType.precision, decimalType.scale).orNull private[this] def castToDecimal(from: DataType, target: DecimalType): Any => Any = from match { case StringType => buildCast[UTF8String](_, s => try { -changePrecision(Decimal(new JavaBigDecimal(s.toString)), target) +toPrecision(Decimal(new JavaBigDecimal(s.toString)), target) } catch { case _: NumberFormatException => null }) case BooleanType => - buildCast[Boolean](_, b => changePrecision(if (b) Decimal.ONE else Decimal.ZERO, target)) + buildCast[Boolean](_, b => toPrecision(if (b) Decimal.ONE else Decimal.ZERO, target)) case DateType => buildCast[Int](_, d => null) // date can't cast to decimal in Hive case TimestampType => // Note that we lose precision here. - buildCast[Long](_, t => changePrecision(Decimal(timestampToDouble(t)), target)) + buildCast[Long](_, t => toPrecision(Decimal(timestampToDouble(t)), target)) case dt: DecimalType => - b => changePrecision(b.asInstanceOf[Decimal].clone(), target) --- End diff -- Nope, there is one more here: case BooleanType => buildCast[Boolean](_, b => toPrecision(if (b) Decimal.ONE else Decimal.ZERO, target)) Both, `ONE` and `ZERO` are singletons so changing precision on themselves is not a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17095#discussion_r103863691 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -254,7 +254,18 @@ class SessionCatalog( val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) val table = formatTableName(tableDefinition.identifier.table) validateName(table) -val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) + +val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) { --- End diff -- Yes, they should all be applied this logic~ database has already contain this logic, shall I add the logic of partition in another pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17132: [SPARK-19792][webui]In the Master Page,the column named ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17132 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103863428 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop + protected var shouldStopRequired: Boolean = false + --- End diff -- Please add a simple comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17132: [SPARK-19792][webui]In the Master Page,the column...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/17132 [SPARK-19792][webui]In the Master Page,the column named âMemory per Nodeâ ,I think it is not all right all right Signed-off-by: liuxian ## What changes were proposed in this pull request? Open the spark web page,in the Master Page ,have two tables:Running Applications table and Completed Applications table, to the column named âMemory per Nodeâ ,I think it is not all right ,because a node may be not have only one executor.So I think that should be named as âMemory per Executorâ.Otherwise easy to let the user misunderstanding ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark wid-lx-0302 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r103863345 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -463,6 +458,8 @@ private class LinearSVCAggregator( */ def add(instance: Instance): this.type = { instance match { case Instance(label, weight, features) => + require(numFeatures == features.size, s"Dimensions mismatch when adding new instance." + +s" Expecting $numFeatures but got ${features.size}.") if (weight == 0.0) return this --- End diff -- Yes good catch - LoR and LinR both have this check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103863351 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop --- End diff -- Btw, the usual style is: /** * * */ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...
Github user wojtek-szymanski commented on a diff in the pull request: https://github.com/apache/spark/pull/17075#discussion_r103863388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -339,36 +339,34 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String } /** - * Change the precision / scale in a given decimal to those set in `decimalType` (if any), - * returning null if it overflows or modifying `value` in-place and returning it if successful. + * Create new `Decimal` with precision and scale given in `decimalType` (if any), + * returning null if it overflows or creating a new `value` and returning it if successful. * - * NOTE: this modifies `value` in-place, so don't call it on external data. */ - private[this] def changePrecision(value: Decimal, decimalType: DecimalType): Decimal = { -if (value.changePrecision(decimalType.precision, decimalType.scale)) value else null - } + private[this] def toPrecision(value: Decimal, decimalType: DecimalType): Decimal = +value.toPrecision(decimalType.precision, decimalType.scale).orNull private[this] def castToDecimal(from: DataType, target: DecimalType): Any => Any = from match { case StringType => buildCast[UTF8String](_, s => try { -changePrecision(Decimal(new JavaBigDecimal(s.toString)), target) +toPrecision(Decimal(new JavaBigDecimal(s.toString)), target) --- End diff -- agree --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103863369 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- h, why `INSERT INTO ...` is a query? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17095#discussion_r103863244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -254,7 +254,18 @@ class SessionCatalog( val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) val table = formatTableName(tableDefinition.identifier.table) validateName(table) -val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) + +val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) { --- End diff -- shall we apply it to all locations like database location, partition location? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103863070 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- we still have conf/hadoopConf in `InMemoryCatalog` and `HiveExternalCatalog`, we can just add one more parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16910#discussion_r103862924 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1588,6 +1590,153 @@ class HiveDDLSuite } } + test("insert data to a hive serde table which has a non-existing location should succeed") { +withTable("t") { + withTempDir { dir => +spark.sql( + s""" + |CREATE TABLE t(a string, b int) + |USING hive + |LOCATION '$dir' --- End diff -- can we just call `dir.delete` before creating this table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103862862 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- I think conf/hadoopConf is more useful, later logic can use it. and it's subclass also has these two conf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862946 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop --- End diff -- Your comment style looks weird. Please put `true...` in the /*... */ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16910 **[Test build #73735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73735/testReport)** for PR 16910 at commit [`a4f771a`](https://github.com/apache/spark/commit/a4f771a60f0c716e1811acab5fffead1929d8e80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17122 **[Test build #73734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73734/testReport)** for PR 17122 at commit [`9528ccc`](https://github.com/apache/spark/commit/9528ccc2d63d8c657d74f455dd2589d8e883d51c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103862631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -364,7 +364,12 @@ case class DataSource( catalogTable.get, catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize)) } else { - new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema)) --- End diff -- I'd like to create file status cache as a local variable, pass it to `getOrInferFileFormatSchema`, then use it here. It's much easier to reason about the lifetime of this cache by this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862423 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* + * for optimization to suppress shouldStop() in a loop of WholeStageCodegen + */ + // true: require to insert shouldStop() into a loop --- End diff -- ?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103862351 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -86,7 +86,7 @@ case class DataSource( lazy val providingClass: Class[_] = DataSource.lookupDataSource(className) lazy val sourceInfo: SourceInfo = sourceSchema() private val caseInsensitiveOptions = CaseInsensitiveMap(options) - + private lazy val fileStatusCache = FileStatusCache.getOrCreate(sparkSession) --- End diff -- what's the life time of this cache? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862311 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) val input = ctx.freshName("input") // Right now, Range is only used when there is one upstream. ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") + +val localIdx = ctx.freshName("localIdx") +val localEnd = ctx.freshName("localEnd") +val range = ctx.freshName("range") +// we need to place consume() before calling isShouldStopRequired --- End diff -- Thank you, done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* for optimization */ + var shouldStopRequired: Boolean = false --- End diff -- Sure, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17095#discussion_r103862289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -254,7 +254,18 @@ class SessionCatalog( val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) val table = formatTableName(tableDefinition.identifier.table) validateName(table) -val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) + +val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) { --- End diff -- if the location without schema like hdfs/file, when we restore it from metastore, we did not know what filesystem where the table stored. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan { */ final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery { this.parent = parent + --- End diff -- good catch. done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103862272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* for optimization */ --- End diff -- I see. done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17095#discussion_r103862062 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -254,7 +254,18 @@ class SessionCatalog( val db = formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase)) val table = formatTableName(tableDefinition.identifier.table) validateName(table) -val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) + +val newTableDefinition = if (tableDefinition.storage.locationUri.isDefined) { --- End diff -- but why we have to store the full qualified path? What can we gain from this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17127: [SPARK-19734][PYTHON][ML] Correct OneHotEncoder d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103861992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -122,7 +122,7 @@ case class DataSource( val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory) SparkHadoopUtil.get.globPathIfNecessary(qualified) }.toArray - new InMemoryFileIndex(sparkSession, globbedPaths, options, None) + new InMemoryFileIndex(sparkSession, globbedPaths, options, None, fileStatusCache) --- End diff -- This also impacts the streaming code path. If it is fine to streaming, the code changes look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17127: [SPARK-19734][PYTHON][ML] Correct OneHotEncoder doc stri...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17127 Merged into master, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17104: [MINOR][ML] Fix comments in LSH Examples and Pyth...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17104 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17104: [MINOR][ML] Fix comments in LSH Examples and Python API
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17104 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16910 ok, do it now ~ yesterday is ok... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103861521 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -30,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions.Expression * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) { --- End diff -- how about we just pass in a `defaultDB: CatalogDatabase`? then we don't need to add the `protected def warehousePath: String` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103861424 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -364,7 +364,12 @@ case class DataSource( catalogTable.get, catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize)) } else { - new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema)) + new InMemoryFileIndex( +sparkSession, +globbedPaths, +options, +Some(partitionSchema), +fileStatusCache) --- End diff -- ```Scala new InMemoryFileIndex( sparkSession, globbedPaths, options, Some(partitionSchema), fileStatusCache) ``` This is also valid --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17001#discussion_r103861355 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -74,7 +77,19 @@ abstract class ExternalCatalog { */ def alterDatabase(dbDefinition: CatalogDatabase): Unit - def getDatabase(db: String): CatalogDatabase + def getDatabase(db: String): CatalogDatabase = { +val database = getDatabaseInternal(db) +// The default database's location always uses the warehouse path. +// Since the location of database stored in metastore is qualified, +// we also make the warehouse location qualified. +if (db == SessionCatalog.DEFAULT_DATABASE) { --- End diff -- makes sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103861360 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) val input = ctx.freshName("input") // Right now, Range is only used when there is one upstream. ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") + +val localIdx = ctx.freshName("localIdx") +val localEnd = ctx.freshName("localEnd") +val range = ctx.freshName("range") +// we need to place consume() before calling isShouldStopRequired --- End diff -- Better to describe the reason that consume() may modify `shouldStopRequired`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16938: [SPARK-19583][SQL]CTAS for data source table with...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16938 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73733/testReport)** for PR 17081 at commit [`9a73947`](https://github.com/apache/spark/commit/9a73947efea334ba0cfc5b5508003807a93ff806). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16910 can you resolve the conflict? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103861241 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* for optimization */ + var shouldStopRequired: Boolean = false --- End diff -- Please add `protected`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73726/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17081 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17081 **[Test build #73726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73726/testReport)** for PR 17081 at commit [`60fa037`](https://github.com/apache/spark/commit/60fa03757d223f833e2fa161326a48a9015d4c6c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16938 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103860895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan { */ final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery { this.parent = parent + --- End diff -- extra space. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103860938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan { def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { throw new UnsupportedOperationException } + + /* for optimization */ --- End diff -- Deserve better comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103860516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- The sql parser only allows `CREATE VIEW AS query` here, a query can only be a `SELECT ...` or `INSERT INTO ...` or a CTE, so perhaps we don't have to consider other commands here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17076 @sethah Thanks for the good catch. I verified this optimization and found it indeed reduced the size of shuffle data. This looks good to me. BTW, like @MLnick 's suggestion, could you add the lazy evaluation for gradient array to all other aggregators in this PR? Since it's little change, I'd prefer to modify it here. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103860095 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- shall we forbid all commands? e.g. `CREATE VIEW xxx AS CREATE TABLE ...` should also be disallowed right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17081#discussion_r103859650 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -364,7 +364,8 @@ case class DataSource( catalogTable.get, catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize)) } else { - new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema)) + new InMemoryFileIndex(sparkSession, globbedPaths, options, Some(partitionSchema), +fileStatusCache) --- End diff -- Nit: indent issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17131 **[Test build #73732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73732/consoleFull)** for PR 17131 at commit [`4975ac7`](https://github.com/apache/spark/commit/4975ac7f3a6a714c80e5f875ab54dd60f4aa22a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17119 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73727/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17119 **[Test build #73727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73727/testReport)** for PR 17119 at commit [`be98a0f`](https://github.com/apache/spark/commit/be98a0fabc9244ccb9e376ac8e7aef5125675c9b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17131 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103859056 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => +throw new AnalysisException("Creating a view as insert into a table is not allowed") --- End diff -- It will be nice to put a view name in the error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103858978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => --- End diff -- `_: InsertIntoHadoopFsRelationCommand` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103858997 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -128,6 +129,15 @@ case class CreateViewCommand( qe.assertAnalyzed() val analyzedPlan = qe.analyzed +// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an AnalysisException. +analyzedPlan match { + case i: InsertIntoHadoopFsRelationCommand => +throw new AnalysisException("Creating a view as insert into a table is not allowed") + case i: InsertIntoDataSourceCommand => --- End diff -- The same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17131 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17034 **[Test build #73731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73731/testReport)** for PR 17034 at commit [`0185b45`](https://github.com/apache/spark/commit/0185b454aaa043406c39d1e8f19c98d3d345a836). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17122 **[Test build #73730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73730/testReport)** for PR 17122 at commit [`5ff8dca`](https://github.com/apache/spark/commit/5ff8dcae1bce0b553d4aefc563addc001e6a6691). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17125#discussion_r103858555 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -604,7 +604,14 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => -i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) +val newTable = EliminateSubqueryAliases(lookupTableFromCatalog(u)) +// Inserting into a view is not allowed, we should throw an AnalysisException. +newTable match { + case v: View => +u.failAnalysis(s"${v.desc.identifier} is a view, inserting into a view is not allowed") --- End diff -- Can we move this to `PreprocessTableInsertion`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103858494 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -77,6 +77,10 @@ trait CodegenSupport extends SparkPlan { */ final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery { this.parent = parent + +// to track the existence of apply() call in the current produce-consume cycle +// if apply is not called (e.g. in aggregation), we can skip shoudStop in the inner-most loop +parent.shouldStopRequired = false --- End diff -- I wanted to ensure `produce()` starts with `parent.shouldStopRequired = false`. This is because I am afraid other produce-consume may set true into `shouldStopRequired` if we have more than one-produce-consume in one parent. However, in most of cases, it would not happen. For the simplicity, I eliminated this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17122#discussion_r103858474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -69,6 +69,7 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { val stopEarly = ctx.freshName("stopEarly") ctx.addMutableState("boolean", stopEarly, s"$stopEarly = false;") +shouldStopRequired = true // loop may break early even without append in loop body --- End diff -- Good catch. This implementation depends on slightly old revision that means there is no `stopEarly()` method. Removed this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias col...
GitHub user stanzhai opened a pull request: https://github.com/apache/spark/pull/17131 [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in INNER JOIN should not be folded by FoldablePropagation rule This PR fix for branch-2.0 Refer #17099 @gatorsmile You can merge this pull request into a Git repository by running: $ git pull https://github.com/stanzhai/spark fix-inner-join-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17131 commit 4975ac7f3a6a714c80e5f875ab54dd60f4aa22a5 Author: Stan Zhai Date: 2017-03-02T05:56:07Z fix innner join --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103858261 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.random.{ExponentialGenerator, WeibullGenerator} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ --- End diff -- Yes, I will update this. Thanks for your reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17034#discussion_r103858210 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite } } + test("should support all NumericType censors, and not support other types") { +val df = spark.createDataFrame(Seq( + (0, Vectors.dense(0)), + (1, Vectors.dense(1)), + (2, Vectors.dense(2)), + (3, Vectors.dense(3)), + (4, Vectors.dense(4)) +)).toDF("label", "features") + .withColumn("censor", lit(0.0)) +val aft = new AFTSurvivalRegression().setMaxIter(1) +val expected = aft.fit(df) + +val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DecimalType(10, 0)) +types.foreach { t => + val actual = aft.fit(df.select(col("label"), col("features"), +col("censor").cast(t))) + assert(expected.intercept === actual.intercept) + assert(expected.coefficients === actual.coefficients) +} + +val dfWithStringCensors = spark.createDataFrame(Seq( + (0, Vectors.dense(0, 2, 3), "0") +)).toDF("label", "features", "censor") +val thrown = intercept[IllegalArgumentException] { --- End diff -- This place follows the implementation in `MLTestingUtils.checkNumericTypes`, so I prefer not to change this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org