[GitHub] [spark] cloud-fan commented on a change in pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
cloud-fan commented on a change in pull request #29045: URL: https://github.com/apache/spark/pull/29045#discussion_r454119852 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala ## @@ -179,12 +179,17 @@ class OrcFileFormat val fs = filePath.getFileSystem(conf) val readerOptions = OrcFile.readerOptions(conf).filesystem(fs) - val requestedColIdsOrEmptyFile = + val (requestedColIdsOrEmptyFile, sendActualSchema) = Utils.tryWithResource(OrcFile.createReader(filePath, readerOptions)) { reader => OrcUtils.requestedColumnIds( isCaseSensitive, dataSchema, requiredSchema, reader, conf) } + if (sendActualSchema) { +resultSchemaString = OrcUtils.orcTypeDescriptionString(actualSchema) Review comment: do you mean we can't do column pruning in this case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
zhengruifeng commented on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657984250 @huaxingao @WeichenXu123 @viirya How do you think about saving ~70% (Array[Int] -> Array[Byte]) RAM at the cost of somewhat regression (1% ~ 10%)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657984010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657984010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657927290 **[Test build #125799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125799/testReport)** for PR 28708 at commit [`5a0cd2a`](https://github.com/apache/spark/commit/5a0cd2abd316aacc601b9e8fa6e1406b67c55fb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657937025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/30410/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657983486 **[Test build #125799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125799/testReport)** for PR 28708 at commit [`5a0cd2a`](https://github.com/apache/spark/commit/5a0cd2abd316aacc601b9e8fa6e1406b67c55fb7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class MapOutputCommitMessage ` * ` case class IsExecutorAlive(executorId: String) extends CoarseGrainedClusterMessage` * `sealed trait LogisticRegressionSummary extends ClassificationSummary ` * `sealed trait RandomForestClassificationSummary extends ClassificationSummary ` * `class _ClassificationSummary(JavaWrapper):` * `class _TrainingSummary(JavaWrapper):` * `class _BinaryClassificationSummary(_ClassificationSummary):` * `class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable,` * `class LinearSVCSummary(_BinaryClassificationSummary):` * `class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):` * `class LogisticRegressionSummary(_ClassificationSummary):` * `class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):` * `class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,` * `class RandomForestClassificationSummary(_ClassificationSummary):` * `class RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,` * `class BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):` * `class BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,` * ` class DisableHints(conf: SQLConf) extends RemoveAllHints(conf: SQLConf) ` * `case class WithFields(` * `case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField ` * `case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField ` * `case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField ` * `trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant ` * `case class DayOfYear(child: Expression) extends GetDateField ` * `case class SecondsToTimestamp(child: Expression) extends UnaryExpression` * `case class Year(child: Expression) extends GetDateField ` * `case class YearOfWeek(child: Expression) extends GetDateField ` * `case class Quarter(child: Expression) extends GetDateField ` * `case class Month(child: Expression) extends GetDateField ` * `case class DayOfMonth(child: Expression) extends GetDateField ` * `case class DayOfWeek(child: Expression) extends GetDateField ` * `case class WeekDay(child: Expression) extends GetDateField ` * `case class WeekOfYear(child: Expression) extends GetDateField ` * `sealed trait UTCTimestamp extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant ` * `case class FromUTCTimestamp(left: Expression, right: Expression) extends UTCTimestamp ` * `case class ToUTCTimestamp(left: Expression, right: Expression) extends UTCTimestamp ` * `sealed abstract class MergeAction extends Expression with Unevaluable ` * `case class DeleteAction(condition: Option[Expression]) extends MergeAction` * `trait BaseScriptTransformationExec extends UnaryExecNode ` * `abstract class BaseScriptTransformationWriterThread(` * `abstract class BaseScriptTransformIOSchema extends Serializable ` * `case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan] ` * `class StateStoreConf(` * `case class HiveScriptTransformationExec(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
AmplabJenkins removed a comment on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657982803 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 edited a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027 > Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? So in case of orc data created by the hive no field names in the physical schema. Please find the below code for reference. https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133 So from this code we are sending the index of the col from the dataschema. But Where as in the below code , we are passing the input result schema and that result schema will not have that index number that is passed from OrcUtils.scala https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211 For example - ``` val u = """select date_dim. d_year from date_dim limit 5""" spark.sql(u).collect ``` Here the value of index(d_year returned by the OrcUtils.scala#L133 is 6 where the resultSchema passed in OrcFileFormat.scala#L211 is having only one struct<`d_year`:int> So now on using the index value 6 in the resultSchema schema which is having size 1 is giving the exception ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, 192.168.0.103, executor driver): java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
AmplabJenkins commented on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657982803 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29091: [SPARK-32258][SQL] Not duplicate normalization on children for float/double If/CaseWhen/Coalesce
cloud-fan commented on pull request #29091: URL: https://github.com/apache/spark/pull/29091#issuecomment-657982389 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29091: [SPARK-32258][SQL] Not duplicate normalization on children for float/double If/CaseWhen/Coalesce
cloud-fan closed pull request #29091: URL: https://github.com/apache/spark/pull/29091 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
SparkQA commented on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657982312 **[Test build #125801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125801/testReport)** for PR 29090 at commit [`dfbce91`](https://github.com/apache/spark/commit/dfbce912c7371afae5e8f87bf18b5a3d7dbfca52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
SparkQA removed a comment on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657931374 **[Test build #125801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125801/testReport)** for PR 29090 at commit [`dfbce91`](https://github.com/apache/spark/commit/dfbce912c7371afae5e8f87bf18b5a3d7dbfca52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027 > Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? So in case of orc data created by the hive no field names in the physical schema. Please find the below code for reference. https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133 So from this code we are sending the index of the col from the dataschema. But Where as in the below code , we are passing the input result schema and that result schema will not have that index number that is passed from OrcUtils.scala https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211 For example - ``` val u = """select date_dim.d_date_id from date_dim limit 5""" spark.sql(u).collect ``` Here the value of index(d_date_id) returned by the OrcUtils.scala#L133 is 2 where the resultSchema passed in OrcFileFormat.scala#L211 is having only one struct<`d_date_id`:string> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
AmplabJenkins commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-657976343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
AmplabJenkins removed a comment on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-657976343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
SparkQA removed a comment on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-657891341 **[Test build #125792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125792/testReport)** for PR 29077 at commit [`5459d58`](https://github.com/apache/spark/commit/5459d58d8ea68a8266f05366fc06eb3f6c062351). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
SparkQA commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-657975832 **[Test build #125792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125792/testReport)** for PR 29077 at commit [`5459d58`](https://github.com/apache/spark/commit/5459d58d8ea68a8266f05366fc06eb3f6c062351). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
AmplabJenkins removed a comment on pull request #29088: URL: https://github.com/apache/spark/pull/29088#issuecomment-657974472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125794/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
AmplabJenkins removed a comment on pull request #29088: URL: https://github.com/apache/spark/pull/29088#issuecomment-657974464 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
AmplabJenkins commented on pull request #29088: URL: https://github.com/apache/spark/pull/29088#issuecomment-657974464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
SparkQA removed a comment on pull request #29088: URL: https://github.com/apache/spark/pull/29088#issuecomment-657904577 **[Test build #125794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125794/testReport)** for PR 29088 at commit [`6111a0a`](https://github.com/apache/spark/commit/6111a0a495fc1c0650a472d985ea221f8008f81f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-657974354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
AmplabJenkins removed a comment on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-657974354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
SparkQA commented on pull request #29088: URL: https://github.com/apache/spark/pull/29088#issuecomment-657974102 **[Test build #125794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125794/testReport)** for PR 29088 at commit [`6111a0a`](https://github.com/apache/spark/commit/6111a0a495fc1c0650a472d985ea221f8008f81f). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
SparkQA commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-657973941 **[Test build #125805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125805/testReport)** for PR 28901 at commit [`9b11aac`](https://github.com/apache/spark/commit/9b11aace28be8169e8eff1ce61810bc8250fc37d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
LantaoJin commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-657972128 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
AmplabJenkins removed a comment on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-657967907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454102278 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Can we add this after all things done? and we need to add a new page like `Where clause` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
AmplabJenkins commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-657967907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454045113 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Yea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454101882 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -2558,6 +2558,131 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } } + + test("SPARK-28227: test script transform with aggregation") { Review comment: > Could you move the tests into `SQLQueryTestSuite`? This should wait for https://github.com/apache/spark/pull/29085, since currently we can't use script transform in sql/core This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
SparkQA commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-657967512 **[Test build #125804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125804/testReport)** for PR 29095 at commit [`50510dd`](https://github.com/apache/spark/commit/50510ddc30bb9da42dd7700a55bd5ecec7d3620b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-657966575 test: ``` import org.apache.spark.ml.linalg._ import org.apache.spark.ml.classification._ import org.apache.spark.storage.StorageLevel val df = spark.read.option("numFeatures", "2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("label", (col("label")+1)/2) df.persist(StorageLevel.MEMORY_AND_DISK) df.count val rf = new RandomForestClassifier().setMaxDepth(10).setNumTrees(100) val model = rf.fit(df) model.save("/tmp/rf-model") val rf2 = new RandomForestClassifier().setMaxDepth(20).setNumTrees(100) val model2 = rf2.fit(df) model2.save("/tmp/rf-model-d20") val model = RandomForestClassificationModel.load("/tmp/rf-model") val model2 = RandomForestClassificationModel.load("/tmp/rf-model-d20") val vecs = df.select("features").rdd.map(row => row.getAs[Vector](0)).collect val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model.predict)}; val end = System.currentTimeMillis; end - start val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model2.predict)}; val end = System.currentTimeMillis; end - start ``` Results (durations): this PR: 167640, 404901 Master: 187645, 416243 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-657965897 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125793/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
ulysses-you commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r454099903 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,45 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // function is not exists, clear cached function. + catalog.unregisterFunction(identifier, true) + throw new NoSuchFunctionException(identifier.database.get, functionName) Review comment: Just keep the same behavior with `refresh table`, the later also throw `NoSuchTableException`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng opened a new pull request #29095: URL: https://github.com/apache/spark/pull/29095 ### What changes were proposed in this pull request? use while-loop instead of the recursive way ### Why are the changes needed? 3% ~ 10% faster ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-657965895 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
AmplabJenkins commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-657965895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
SparkQA removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-657893231 **[Test build #125793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125793/testReport)** for PR 28840 at commit [`c129a54`](https://github.com/apache/spark/commit/c129a545b6ec92117728439e83842fccb54a6a66). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
SparkQA commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-657965588 **[Test build #125793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125793/testReport)** for PR 28840 at commit [`c129a54`](https://github.com/apache/spark/commit/c129a545b6ec92117728439e83842fccb54a6a66). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29091: [SPARK-32258][SQL] Not duplicate normalization on children for float/double If/CaseWhen/Coalesce
viirya commented on pull request #29091: URL: https://github.com/apache/spark/pull/29091#issuecomment-657965371 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
ulysses-you commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r454098536 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,45 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // function is not exists, clear cached function. Review comment: If function already in cache, query can still work after we drop function with hive client. I think the behavior of `refresh` should be that invalid the cache and keep consistent with hive metastore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657964078 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125800/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster
AmplabJenkins commented on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-657964151 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657964073 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster
AmplabJenkins removed a comment on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-657964151 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657964073 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster
AmplabJenkins removed a comment on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-657890028 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125778/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster
SparkQA commented on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-657963838 **[Test build #125803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125803/testReport)** for PR 28939 at commit [`449df2b`](https://github.com/apache/spark/commit/449df2b92e5ad0dac6ea8dd83233450946a39df2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657927315 **[Test build #125800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125800/testReport)** for PR 27428 at commit [`20ad143`](https://github.com/apache/spark/commit/20ad143c620ef75e8d446f8f1e595992a1959b4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657963736 **[Test build #125800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125800/testReport)** for PR 27428 at commit [`20ad143`](https://github.com/apache/spark/commit/20ad143c620ef75e8d446f8f1e595992a1959b4a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
AmplabJenkins removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657962666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
AmplabJenkins commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657962666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
SparkQA removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657876739 **[Test build #125791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125791/testReport)** for PR 29064 at commit [`5501213`](https://github.com/apache/spark/commit/5501213c0525aa6c0556a9bdae90edd0facf8025). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
SparkQA commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657962108 **[Test build #125791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125791/testReport)** for PR 29064 at commit [`5501213`](https://github.com/apache/spark/commit/5501213c0525aa6c0556a9bdae90edd0facf8025). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657959411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657953435 **[Test build #125802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125802/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657959411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657959340 **[Test build #125802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125802/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InheritableThread(threading.Thread):` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29053: [SPARK-32241][SQL] Remove empty children of union
cloud-fan closed pull request #29053: URL: https://github.com/apache/spark/pull/29053 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29053: [SPARK-32241][SQL] Remove empty children of union
cloud-fan commented on pull request #29053: URL: https://github.com/apache/spark/pull/29053#issuecomment-657958705 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
AmplabJenkins removed a comment on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-657958140 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125798/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
AmplabJenkins commented on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-657958137 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
AmplabJenkins removed a comment on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-657958137 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
SparkQA removed a comment on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-657916150 **[Test build #125798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125798/testReport)** for PR 29002 at commit [`d768385`](https://github.com/apache/spark/commit/d768385caac9c79c456de87a4afd72298dda46db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
SparkQA commented on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-657957833 **[Test build #125798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125798/testReport)** for PR 29002 at commit [`d768385`](https://github.com/apache/spark/commit/d768385caac9c79c456de87a4afd72298dda46db). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
cloud-fan commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r454087942 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,45 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // function is not exists, clear cached function. Review comment: BTW does the query fail if it tries to use such a function? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
cloud-fan commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r454087814 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,45 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // function is not exists, clear cached function. + catalog.unregisterFunction(identifier, true) + throw new NoSuchFunctionException(identifier.database.get, functionName) Review comment: If it's a valid use case, why do we throw an exception here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
cloud-fan commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r454087683 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,45 @@ case class ShowFunctionsCommand( } } + +/** + * A command for users to refresh the persistent function. + * The syntax of using this command in SQL is: + * {{{ + *REFRESH FUNCTION functionName + * }}} + */ +case class RefreshFunctionCommand( +databaseName: Option[String], +functionName: String) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +if (FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) { + throw new AnalysisException(s"Cannot refresh builtin function $functionName") +} +if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, databaseName))) { + throw new AnalysisException(s"Cannot refresh temporary function $functionName") +} + +val identifier = FunctionIdentifier( + functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase))) +// we only refresh the permanent function. +if (catalog.isPersistentFunction(identifier)) { + // register overwrite function. + val func = catalog.getFunctionMetadata(identifier) + catalog.registerFunction(func, true) +} else { + // function is not exists, clear cached function. Review comment: do you mean function does not exist in the metastore/catalog, and we need to clear the cache entry? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
cloud-fan commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657954315 > The reason behind this initBatch is not getting the schema that is needed to find out the column value in OrcFileFormat.scala Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657953861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657953861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29094: [SPARK-24983][SQL] limit number of leaf expressions in a single project when collapse project to prevent driver oom
AmplabJenkins removed a comment on pull request #29094: URL: https://github.com/apache/spark/pull/29094#issuecomment-657953317 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29094: [SPARK-24983][SQL] limit number of leaf expressions in a single project when collapse project to prevent driver oom
AmplabJenkins commented on pull request #29094: URL: https://github.com/apache/spark/pull/29094#issuecomment-657953631 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29094: [SPARK-24983][SQL] limit number of leaf expressions in a single project when collapse project to prevent driver oom
AmplabJenkins commented on pull request #29094: URL: https://github.com/apache/spark/pull/29094#issuecomment-657953317 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-657953435 **[Test build #125802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125802/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] constzhou opened a new pull request #29094: [SPARK-24983][SQL] limit number of leaf expressions in a single project when collapse project to prevent driver oom
constzhou opened a new pull request #29094: URL: https://github.com/apache/spark/pull/29094 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
GuoPhilipse commented on pull request #29056: URL: https://github.com/apache/spark/pull/29056#issuecomment-657942650 cc @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657937021 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657937021 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
AmplabJenkins commented on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657931688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
AmplabJenkins removed a comment on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657931688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel
srowen commented on a change in pull request #28960: URL: https://github.com/apache/spark/pull/28960#discussion_r454062744 ## File path: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala ## @@ -226,45 +239,48 @@ object GradientDescent extends Logging { var converged = false // indicates whether converged based on convergenceTol var i = 1 -while (!converged && i <= numIterations) { - val bcWeights = data.context.broadcast(weights) - // Sample a subset (fraction miniBatchFraction) of the total data - // compute and sum up the subgradients on this subset (this is one map-reduce) - val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) -.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( - seqOp = (c, v) => { -// c: (grad, loss, count), v: (label, features) -val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) -(c._1, c._2 + l, c._3 + 1) - }, - combOp = (c1, c2) => { -// c: (grad, loss, count) -(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) - }) - bcWeights.destroy() - - if (miniBatchSize > 0) { -/** - * lossSum is computed using the weights from the previous iteration - * and regVal is the regularization value computed in the previous iteration as well. - */ -stochasticLossHistory += lossSum / miniBatchSize + regVal -val update = updater.compute( - weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), - stepSize, i, regParam) -weights = update._1 -regVal = update._2 - -previousWeights = currentWeights -currentWeights = Some(weights) -if (previousWeights != None && currentWeights != None) { - converged = isConverged(previousWeights.get, -currentWeights.get, convergenceTol) +breakable { + while (i <= numIterations + 1) { +val bcWeights = data.context.broadcast(weights) +// Sample a subset (fraction miniBatchFraction) of the total data +// compute and sum up the subgradients on this subset (this is one map-reduce) +val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) + .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( +seqOp = (c, v) => { Review comment: Yeah it's a little unusual unless it significantly simplifies the code. Can `!converged` be added back to the while condition, and then turn the `if (X) break` condition below into `if (!X) { ... code that follows ...}` ? should be the same as i will increment and end the loop right after anyway This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
SparkQA commented on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657931374 **[Test build #125801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125801/testReport)** for PR 29090 at commit [`dfbce91`](https://github.com/apache/spark/commit/dfbce912c7371afae5e8f87bf18b5a3d7dbfca52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29090: [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option
HyukjinKwon commented on pull request #29090: URL: https://github.com/apache/spark/pull/29090#issuecomment-657929538 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657927691 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657927691 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657927315 **[Test build #125800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125800/testReport)** for PR 27428 at commit [`20ad143`](https://github.com/apache/spark/commit/20ad143c620ef75e8d446f8f1e595992a1959b4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
holdenk commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657927412 Thanks for doing this, awesome work :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657927249 Looking at this I believe all of the changes requested have been addressed. I'm going to get this PR up to date with the current development now that the SPIP has passed and if there are no other issues by the time that's done I intend to merge this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-657927290 **[Test build #125799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125799/testReport)** for PR 28708 at commit [`5a0cd2a`](https://github.com/apache/spark/commit/5a0cd2abd316aacc601b9e8fa6e1406b67c55fb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
HyukjinKwon closed pull request #28957: URL: https://github.com/apache/spark/pull/28957 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel
zhengruifeng commented on a change in pull request #28960: URL: https://github.com/apache/spark/pull/28960#discussion_r454058321 ## File path: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala ## @@ -226,45 +239,48 @@ object GradientDescent extends Logging { var converged = false // indicates whether converged based on convergenceTol var i = 1 -while (!converged && i <= numIterations) { - val bcWeights = data.context.broadcast(weights) - // Sample a subset (fraction miniBatchFraction) of the total data - // compute and sum up the subgradients on this subset (this is one map-reduce) - val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) -.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( - seqOp = (c, v) => { -// c: (grad, loss, count), v: (label, features) -val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) -(c._1, c._2 + l, c._3 + 1) - }, - combOp = (c1, c2) => { -// c: (grad, loss, count) -(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) - }) - bcWeights.destroy() - - if (miniBatchSize > 0) { -/** - * lossSum is computed using the weights from the previous iteration - * and regVal is the regularization value computed in the previous iteration as well. - */ -stochasticLossHistory += lossSum / miniBatchSize + regVal -val update = updater.compute( - weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), - stepSize, i, regParam) -weights = update._1 -regVal = update._2 - -previousWeights = currentWeights -currentWeights = Some(weights) -if (previousWeights != None && currentWeights != None) { - converged = isConverged(previousWeights.get, -currentWeights.get, convergenceTol) +breakable { + while (i <= numIterations + 1) { +val bcWeights = data.context.broadcast(weights) +// Sample a subset (fraction miniBatchFraction) of the total data +// compute and sum up the subgradients on this subset (this is one map-reduce) +val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) + .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( +seqOp = (c, v) => { Review comment: nit: it seems that `breakable` is not used in spark (except two suites): ``` ➜ spark git:(master) ag --scala 'breakable' . mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala 2941: breakable { mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala 142: breakable { ``` I am not sure whether it is suiteable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
HyukjinKwon commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657927040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r454058089 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -242,8 +244,7 @@ private[spark] class BlockManager( private var blockReplicationPolicy: BlockReplicationPolicy = _ - private var blockManagerDecommissioning: Boolean = false - private var decommissionManager: Option[BlockManagerDecommissionManager] = None + @volatile private var decommissioner: Option[BlockManagerDecommissioner] = None Review comment: I think I'm going to leave it volatile for now, I'd like to avoid remote block puts once we're in decommissioning because we depend on not getting new blocks except from tasks to figure out when it is safe to exit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657922148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13
AmplabJenkins commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657922148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13
SparkQA commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657921783 **[Test build #125790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125790/testReport)** for PR 29078 at commit [`370dabe`](https://github.com/apache/spark/commit/370dabeca759f78237afe3c84c511f2b0904b228). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ContinuousRecordEndpoint(buckets: Seq[mutable.Seq[UnsafeRow]], lock: Object)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org