[GitHub] SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455232870 **[Test build #101376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101376/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gengliangwang commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
gengliangwang commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455232738 @cloud-fan @dongjoon-hyun @gatorsmile Thanks for the review. I will come up with the file write path migration very soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on issue #23512: [SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8String to Date/TimestampType
hvanhovell commented on issue #23512: [SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8String to Date/TimestampType URL: https://github.com/apache/spark/pull/23512#issuecomment-455245370 Merging to master. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455254529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101370/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455265084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
SparkQA commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455264777 **[Test build #101377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101377/testReport)** for PR 23575 at commit [`df8cc7e`](https://github.com/apache/spark/commit/df8cc7ee5fb742abbaf34241a9d265d1af473211). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455265089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101377/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455265089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101377/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
SparkQA removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255681 **[Test build #101377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101377/testReport)** for PR 23575 at commit [`df8cc7e`](https://github.com/apache/spark/commit/df8cc7ee5fb742abbaf34241a9d265d1af473211). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on issue #23510: [SPARK-26590][CORE] make fetch-block-to-disk backward compatible
squito commented on issue #23510: [SPARK-26590][CORE] make fetch-block-to-disk backward compatible URL: https://github.com/apache/spark/pull/23510#issuecomment-455237408 I kinda agree with Tom after thinking about this a bit more, that maybe its not worth adding . The worry I have about trying to make it backwards compatible is that we might not be testing it regularly and it gets inadvertently broken later on. I guess I'm fine either way. > AFAIK the streaming response is very different from chunk fetch response. not really that different -- there is a small header (which is different in each case), followed by the bulk of the response which is the actual data of the shuffle block (the same in both cases). Now, the *client* does very different things with that response based on the first header -- the data is always a stream as some level, but the client may decide to buffer it all into memory or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] asfgit closed pull request #23512: [SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8String to Date/TimestampType
asfgit closed pull request #23512: [SPARK-26593][SQL] Use Proleptic Gregorian calendar in casting UTF8String to Date/TimestampType URL: https://github.com/apache/spark/pull/23512 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] mgaido91 commented on a change in pull request #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
mgaido91 commented on a change in pull request #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#discussion_r248763381 ## File path: python/pyspark/sql/types.py ## @@ -865,6 +865,8 @@ def _parse_datatype_json_string(json_string): >>> complex_maptype = MapType(complex_structtype, ... complex_arraytype, False) >>> check_datatype(complex_maptype) +>>> # Decimal with negative scale. +>>> check_datatype(DecimalType(1,-1)) Review comment: done, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455158379 **[Test build #101370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101370/testReport)** for PR 23260 at commit [`a2977f9`](https://github.com/apache/spark/commit/a2977f9424cc6a3b4e7f0e33cea67e6194462d55). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
AmplabJenkins removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455254025 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels
gatorsmile commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels URL: https://github.com/apache/spark/pull/23551#issuecomment-455284437 @juliuszsompolski was recently assigned something else. We also can ping the other active contributors to do it. It is totally decided by you. :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels
gatorsmile commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels URL: https://github.com/apache/spark/pull/23551#issuecomment-455285126 LGTM Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] ipwright commented on issue #23567: [MINOR][BUILD] ensure call to translate_component has correct number of arguments
ipwright commented on issue #23567: [MINOR][BUILD] ensure call to translate_component has correct number of arguments URL: https://github.com/apache/spark/pull/23567#issuecomment-455234495 @srowen Thanks for the example, which is super helpful. Our Python team has a fix in mind, and we're on the case! P.S. I forgot to mention that you can activate LGTM automated code review from this page: https://lgtm.com/projects/g/apache/spark/ci/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23567: [MINOR][BUILD] ensure call to translate_component has correct number of arguments
srowen commented on issue #23567: [MINOR][BUILD] ensure call to translate_component has correct number of arguments URL: https://github.com/apache/spark/pull/23567#issuecomment-455236842 I think we can't enable it as we don't have admin access but can check on the code status periodically This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23541: [SPARK-26618][SQL] Make typed Timestamp/Date literals consistent to casting
hvanhovell commented on a change in pull request #23541: [SPARK-26618][SQL] Make typed Timestamp/Date literals consistent to casting URL: https://github.com/apache/spark/pull/23541#discussion_r248756349 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1555,9 +1554,25 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging try { valueType match { case "DATE" => - Literal(Date.valueOf(value)) + val castedValue = Cast( +Literal(value), +DateType, +Some(SQLConf.get.sessionLocalTimeZone)).eval() Review comment: Please use `Option(SQLConf.get.sessionLocalTimeZone)`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache.
vanzin commented on a change in pull request #23525: [SPARK-26595][core] Allow credential renewal based on kerberos ticket cache. URL: https://github.com/apache/spark/pull/23525#discussion_r248755998 ## File path: core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala ## @@ -97,28 +106,37 @@ private[spark] class HadoopDelegationTokenManager( ThreadUtils.newDaemonSingleThreadScheduledExecutor("Credential Renewal Thread") val ugi = UserGroupInformation.getCurrentUser() -if (ugi.isFromKeytab()) { +val tgtRenewalTask = if (ugi.isFromKeytab()) { // In Hadoop 2.x, renewal of the keytab-based login seems to be automatic, but in Hadoop 3.x, // it is configurable (see hadoop.kerberos.keytab.login.autorenewal.enabled, added in // HADOOP-9567). This task will make sure that the user stays logged in regardless of that // configuration's value. Note that checkTGTAndReloginFromKeytab() is a no-op if the TGT does // not need to be renewed yet. - val tgtRenewalTask = new Runnable() { + new Runnable() { override def run(): Unit = { ugi.checkTGTAndReloginFromKeytab() Review comment: I was testing with Hadoop 2 and did not see the error you saw. But let me take another look at the code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched
squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#discussion_r248759101 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -286,6 +286,33 @@ private[spark] class TaskSchedulerImpl( } } + override def completeTasks( +partitionId: Int, stageId: Int, taskInfo: TaskInfo, killTasks: Boolean): Unit = { Review comment: nit: double indent the params (4 space), each on their own line) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched
squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#discussion_r248758945 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ## @@ -109,4 +109,13 @@ private[spark] trait TaskScheduler { */ def applicationAttemptId(): Option[String] + /** + * SPARK-25250: Whenever any Task gets successfully completed, we simply mark the + * corresponding partition id as completed in all attempts for that particular stage and + * additionally, for a Result Stage, we also kill the remaining task attempts running on the + * same partition. As a result, we do not see any Killed tasks due to + * TaskCommitDenied Exceptions showing up in the UI. Review comment: I wouldn't mention the TaskCommitDenied bit here -- that's really just one very particular thing which can go wrong later. Also I think its worth mentioning that this method must be called from inside the DAGScheduler event loop, to ensure a consistent view of all task sets for the given stage. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455275243 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455275248 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101365/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455275248 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101365/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455233480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7192/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455233462 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
SparkQA removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455156669 **[Test build #101368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101368/testReport)** for PR 23571 at commit [`73e0a25`](https://github.com/apache/spark/commit/73e0a25f409bacad2b7eaceaf97a2f3307a0db36). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.
AmplabJenkins commented on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S. URL: https://github.com/apache/spark/pull/23573#issuecomment-455272493 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.
AmplabJenkins commented on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S. URL: https://github.com/apache/spark/pull/23573#issuecomment-455272510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455275243 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455142468 **[Test build #101365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101365/testReport)** for PR 19788 at commit [`8398120`](https://github.com/apache/spark/commit/8398120e2b7acd628fc164a9e0b74bcd4a6105e8). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
SparkQA commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455275163 **[Test build #101365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101365/testReport)** for PR 19788 at commit [`8398120`](https://github.com/apache/spark/commit/8398120e2b7acd628fc164a9e0b74bcd4a6105e8). * This patch **fails from timeout after a configured wait of `400m`**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class ArrayShuffleBlockId(blockIds: Seq[ShuffleBlockId]) extends BlockId ` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories.
srowen commented on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories. URL: https://github.com/apache/spark/pull/23550#issuecomment-455278203 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455230929 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455233462 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455233480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7192/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
SparkQA commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455253277 **[Test build #101368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101368/testReport)** for PR 23571 at commit [`73e0a25`](https://github.com/apache/spark/commit/73e0a25f409bacad2b7eaceaf97a2f3307a0db36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
AmplabJenkins commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455254025 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
AmplabJenkins removed a comment on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455254036 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101368/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis
AmplabJenkins commented on issue #23571: [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis URL: https://github.com/apache/spark/pull/23571#issuecomment-455254036 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101368/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455254519 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455254529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101370/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched
squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#discussion_r248770953 ## File path: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ## @@ -1319,4 +1328,34 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B tsm.handleFailedTask(tsm.taskAttempts.head.head.taskId, TaskState.FAILED, TaskKilled("test")) assert(tsm.isZombie) } + + test("SPARK-25250 On successful completion of a task attempt on a partition id, kill other" + Review comment: This test isn't testing the important case that was missed before, and I think for now we don't want to kill tasks as part of this change. OTOH, we do need to take the "Completions in zombie tasksets update status of non-zombie taskset" test and move it to a test in DAGSchedulerSuite. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched
squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#discussion_r248768327 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -286,6 +286,33 @@ private[spark] class TaskSchedulerImpl( } } + override def completeTasks( +partitionId: Int, stageId: Int, taskInfo: TaskInfo, killTasks: Boolean): Unit = { +taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { tsm => + tsm.partitionToIndex.get(partitionId) match { +case Some(index) => + tsm.markPartitionCompleted(index, taskInfo) + if (killTasks) { +val taskInfoList = tsm.taskAttempts(index) +taskInfoList.filter(_.running).foreach { tInfo => + try { +killTaskAttempt(tInfo.taskId, false, + s"Partition $partitionId is already completed") + } catch { +case e: Exception => + logWarning(s"Unable to kill Task ID ${tInfo.taskId}.") + } +} + } + +case None => + throw new SparkException(s"No corresponding index found for" + +s" partition ID $partitionId in TaskSet ${tsm.name}. This is likely a bug" + Review comment: this should be not be an exception, it should just be a no-op. You might have taskset 1 w/ partitions 1 - 100, then taskset 2 gets launched after some have completed from taskset 1 so it only runs partitions 10-100, and then taskset 3 gets launched with partitions 1-50 after a different failure. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.
AmplabJenkins removed a comment on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S. URL: https://github.com/apache/spark/pull/23573#issuecomment-455272493 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S.
AmplabJenkins removed a comment on issue #23573: [SPARK-26642][K8S] Add --num-executors option to spark-submit for Spark on K8S. URL: https://github.com/apache/spark/pull/23573#issuecomment-455272510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on issue #23523: [SPARK-26605][yarn] Update AM's credentials when creating tokens.
vanzin commented on issue #23523: [SPARK-26605][yarn] Update AM's credentials when creating tokens. URL: https://github.com/apache/spark/pull/23523#issuecomment-455284169 Ping? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
HyukjinKwon commented on a change in pull request #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#discussion_r248756978 ## File path: python/pyspark/sql/types.py ## @@ -865,6 +865,8 @@ def _parse_datatype_json_string(json_string): >>> complex_maptype = MapType(complex_structtype, ... complex_arraytype, False) >>> check_datatype(complex_maptype) +>>> # Decimal with negative scale. +>>> check_datatype(DecimalType(1,-1)) Review comment: maybe at `test_types.py` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7193/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7193/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
SparkQA commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255681 **[Test build #101377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101377/testReport)** for PR 23575 at commit [`df8cc7e`](https://github.com/apache/spark/commit/df8cc7ee5fb742abbaf34241a9d265d1af473211). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins removed a comment on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255393 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455255393 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] BryanCutler commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
BryanCutler commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455257042 How about for `_parse_datatype_string`? It doesn't seem to work ```python In [15]: _parse_datatype_string("decimal(1, -1)") --- Py4JJavaError Traceback (most recent call last) ~/git/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: ~/git/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.api.python.PythonSQLUtils.parseDataType. : org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '-' expecting INTEGER_VALUE(line 1, pos 11) == SQL == decimal(1, -1) ---^^^ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype
AmplabJenkins commented on issue #23575: [SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype URL: https://github.com/apache/spark/pull/23575#issuecomment-455265084 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] asfgit closed pull request #23551: [SPARK-26622][SQL] Revise SQL Metrics labels
asfgit closed pull request #23551: [SPARK-26622][SQL] Revise SQL Metrics labels URL: https://github.com/apache/spark/pull/23551 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455034804 **[Test build #101346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101346/testReport)** for PR 23260 at commit [`028fca1`](https://github.com/apache/spark/commit/028fca1be2879c4218e2ec1f583b8d3bec303117). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
SparkQA removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455036470 **[Test build #101347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101347/testReport)** for PR 23383 at commit [`6e87532`](https://github.com/apache/spark/commit/6e875323a430cee190a458b8842adea44bb4e0b7). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455078498 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455078344 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
AmplabJenkins commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455078580 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
SparkQA removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455049068 **[Test build #101350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101350/testReport)** for PR 23545 at commit [`1f78144`](https://github.com/apache/spark/commit/1f7814478779ac71d72a112da85b39368ef03a30). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455078454 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories.
AmplabJenkins commented on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories. URL: https://github.com/apache/spark/pull/23550#issuecomment-455078502 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455032836 **[Test build #101345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101345/testReport)** for PR 19788 at commit [`5933bf8`](https://github.com/apache/spark/commit/5933bf88b4f34ef9e20bfe2584591565552c46ff). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455036471 **[Test build #101348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101348/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
SparkQA removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455069071 **[Test build #101353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101353/testReport)** for PR 19788 at commit [`c75e016`](https://github.com/apache/spark/commit/c75e016715fbd10135d365811f42caeff4ab9e01). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455078564 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455078498 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455078280 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
AmplabJenkins commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455078591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101347/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455078352 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101345/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
AmplabJenkins removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455078580 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories.
AmplabJenkins removed a comment on issue #23550: [SPARK-26621][CORE]Use ConfigEntry for hardcoded configs for shuffle categories. URL: https://github.com/apache/spark/pull/23550#issuecomment-455078511 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101349/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
AmplabJenkins removed a comment on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455078343 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455078500 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101348/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455078459 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101350/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455078564 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
SparkQA commented on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#issuecomment-455078952 **[Test build #101355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101355/testReport)** for PR 23519 at commit [`06af625`](https://github.com/apache/spark/commit/06af625bf37382bfe19d7d6f990163cfed1762ee). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
AmplabJenkins removed a comment on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#issuecomment-455079486 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] Deegue commented on a change in pull request #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
Deegue commented on a change in pull request #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#discussion_r248571444 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ## @@ -288,29 +285,119 @@ class HadoopTableReader( } } + /** + * The entry of creating a RDD. + */ + private def getRDD( +inputClassName: String, +localTableDesc: TableDesc, +inputPathStr: String): RDD[Writable] = { +if (isCreateNewHadoopRDD(inputClassName)) { + createNewHadoopRdd( +localTableDesc, +inputPathStr, +inputClassName) +} else { + createHadoopRdd( +localTableDesc, +inputPathStr, +inputClassName) +} + } + /** * Creates a HadoopRDD based on the broadcasted HiveConf and other job properties that will be * applied locally on each slave. */ private def createHadoopRdd( tableDesc: TableDesc, path: String, -inputFormatClass: Class[InputFormat[Writable, Writable]]): RDD[Writable] = { +inputClassName: String): RDD[Writable] = { val initializeJobConfFunc = HadoopTableReader.initializeLocalJobConfFunc(path, tableDesc) _ val rdd = new HadoopRDD( sparkSession.sparkContext, _broadcastedHadoopConf.asInstanceOf[Broadcast[SerializableConfiguration]], Some(initializeJobConfFunc), - inputFormatClass, + getInputFormat(inputClassName), classOf[Writable], classOf[Writable], _minSplitsPerRDD) // Only take the value (skip the key) because Hive works only with values. rdd.map(_._2) } + + /** + * Creates a HadoopRDD based on the broadcasted HiveConf and other job properties that will be + * applied locally on each slave. + */ + private def createNewHadoopRdd( +tableDesc: TableDesc, +path: String, +inputClassName: String): RDD[Writable] = { + +val initializeJobConfFunc = HadoopTableReader.initializeLocalJobConfFunc(path, tableDesc) _ + +val newJobConf = new JobConf(hadoopConf) +initializeJobConfFunc.apply(newJobConf) +val rdd = new NewHadoopRDD( + sparkSession.sparkContext, + getNewInputFormat(inputClassName), + classOf[Writable], + classOf[Writable], + newJobConf +) + +// Only take the value (skip the key) because Hive works only with values. +rdd.map(_._2) + } + + /** + * If `spark.sql.hive.fileInputFormat.enabled` is true, this function will optimize the input + * method while reading Hive tables. + * For old input format `org.apache.hadoop.mapred.InputFormat`. + */ + private def getInputFormat( +inputClassName: String): Class[org.apache.hadoop.mapred.InputFormat[Writable, Writable]] = { + +var ifc = Utils.classForName(inputClassName) + .asInstanceOf[java.lang.Class[org.apache.hadoop.mapred.InputFormat[Writable, Writable]]] +if (conf.getConf(HiveUtils.HIVE_INPUT_FORMAT_OPTIMIZER_ENABLED) && + "org.apache.hadoop.mapred.TextInputFormat".equals(inputClassName)) { +ifc = Utils.classForName("org.apache.hadoop.mapred.lib.CombineTextInputFormat") Review comment: You're right, I've removed this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
AmplabJenkins removed a comment on issue #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#issuecomment-455079493 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101355/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
SparkQA commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455082248 **[Test build #101359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101359/testReport)** for PR 23383 at commit [`6e87532`](https://github.com/apache/spark/commit/6e875323a430cee190a458b8842adea44bb4e0b7). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r248569542 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -205,30 +206,145 @@ class Analyzer( CleanupAliases) ) + object ResolveRecursiveReferneces extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = { + val recursiveTables = plan.collect { +case rt @ RecursiveTable(name, _, _) if rt.anchorResolved => name -> rt + }.toMap + + plan.resolveOperatorsUp { +case UnresolvedRecursiveReference(name) if recursiveTables.contains(name) => + RecursiveReference(name, recursiveTables(name).output.map(_.newInstance())) +case other => other + } +} + } + /** * Analyze cte definitions and substitute child plan with analyzed cte definitions. */ object CTESubstitution extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case With(child, relations) => + case With(child, relations, allowRecursion) => substituteCTE(child, relations.foldLeft(Seq.empty[(String, LogicalPlan)]) { case (resolved, (name, relation)) => -resolved :+ name -> executeSameContext(substituteCTE(relation, resolved)) -}) +val recursiveTableName = if (allowRecursion) Some(name) else None +resolved :+ + name -> executeSameContext(substituteCTE(relation, resolved, recursiveTableName)) +}, None) case other => other } -def substituteCTE(plan: LogicalPlan, cteRelations: Seq[(String, LogicalPlan)]): LogicalPlan = { - plan resolveOperatorsDown { -case u: UnresolvedRelation => - cteRelations.find(x => resolver(x._1, u.tableIdentifier.table)) -.map(_._2).getOrElse(u) -case other => - // This cannot be done in ResolveSubquery because ResolveSubquery does not know the CTE. - other transformExpressions { -case e: SubqueryExpression => - e.withNewPlan(substituteCTE(e.plan, cteRelations)) +def substituteCTE( +plan: LogicalPlan, +cteRelations: Seq[(String, LogicalPlan)], +recursiveTableName: Option[String]): LogicalPlan = { + def substitute( + plan: LogicalPlan, + inSubQuery: Boolean = false): (LogicalPlan, Boolean) = { +val references = mutable.Set.empty[UnresolvedRecursiveReference] + +def newReference(recursiveTableName: String) = { + val recursiveReference = UnresolvedRecursiveReference(recursiveTableName) + references += recursiveReference + + recursiveReference +} + +val newPlan = plan resolveOperatorsDown { + case u: UnresolvedRelation => +val table = u.tableIdentifier.table + +val recursiveReference = recursiveTableName.find(resolver(_, table)).map { name => + if (inSubQuery) { +throw new AnalysisException( + s"Recursive reference ${name} can't be used in a subquery") + } + + newReference(name) +} + +recursiveReference + .orElse(cteRelations.find(x => resolver(x._1, table)).map(_._2)) + .getOrElse(u) + + case other => +// This cannot be done in ResolveSubquery because ResolveSubquery does not know the CTE. +other transformExpressions { + case e: SubqueryExpression => e.withNewPlan(substitute(e.plan, true)._1) +} +} + +(newPlan, !references.isEmpty) + } + + plan match { +case SubqueryAlias(name, u: Union) if recursiveTableName.isDefined => + def combineUnions(union: Union): Seq[LogicalPlan] = union.children.flatMap { +case u: Union => combineUnions(u) +case o => Seq(o) } + + val substitutedTerms = combineUnions(u).map(substitute(_)) + val (anchorTerms, recursiveTerms) = substitutedTerms.partition(!_._2) + + if (!recursiveTerms.isEmpty) { +if (anchorTerms.isEmpty) { + throw new AnalysisException("There should be at least 1 anchor term defined in a " + +s"recursive query $name") +} + +val recursiveTermPlans = recursiveTerms.map(_._1) + +def traversePlanAndCheck( +plan: LogicalPlan, +isRecursiveReferenceAllowed: Boolean = true): Boolean = plan match { + case UnresolvedRecursiveReference(name) => +if (!isRecursiveReferenceAllowed) { + throw new AnalysisException(s"Wrong usage of recursive reference ${name}") +} +true + case Join(left, right,
[GitHub] peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r248569468 ## File path: sql/core/src/test/resources/sql-tests/inputs/recursion.sql ## @@ -0,0 +1,291 @@ +-- List of configuration the test suite is run against: Review comment: Thank you @gatorsmile @mgaido91 @viirya for the feedback. What I will try to add in this PR is maybe a bit limited compared to what other DBs offer, but I hope it will be a still useful new feature of Spark SQL. And maybe we can extend it in follow up tickets. I found this nice presentation: https://www.percona.com/live/plam16/sites/default/files/slides/CTEs_in_MariaDB_10.2.pdf where you can find some slides about linear (SQL standard) and non-linear recursion which allows much more but not compatible with the standard. I will try to look into the `with.sql` tests of postgres, provide some more tests and find the features still missing and then we can decide which ones are must have in this PR if that works for you. cc @mgaido91 @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gengliangwang commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
gengliangwang commented on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455081340 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455080860 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455080869 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7181/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455080860 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455080430 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455080434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7182/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455080869 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7181/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455078967 **[Test build #101356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101356/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path
AmplabJenkins removed a comment on issue #23383: [SPARK-23817][SQL] Create file source V2 framework and migrate ORC read path URL: https://github.com/apache/spark/pull/23383#issuecomment-455078591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101347/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455078829 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455078566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101346/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
SparkQA commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455078951 **[Test build #101354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101354/testReport)** for PR 23559 at commit [`cb436fc`](https://github.com/apache/spark/commit/cb436fcf2b1e4f9fdede080212c198c553817259). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins removed a comment on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455079130 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD
AmplabJenkins commented on issue #23559: [SPARK-26630][SQL] Fix ClassCastException in TableReader while creating HadoopRDD URL: https://github.com/apache/spark/pull/23559#issuecomment-455079133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7180/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org