[GitHub] [spark] LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-539844688 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED
dilipbiswal commented on issue #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED URL: https://github.com/apache/spark/pull/26042#issuecomment-539844257 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree
huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree URL: https://github.com/apache/spark/pull/25929#issuecomment-539840347 OK. I will add _single_leading_underscore to the classes you mentioned in the comments. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr edited a comment on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr edited a comment on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-539838772 > 1. we using flatMapGroupsWithState, it cause it fail at begining Will update the PR with the fix > 2. Rocksdb checkpoint creating had a quite high time cost, sometimes > 20 secs, .. then I changed all of them to a ext4 partition, the result is much better, it's now could be < 10ms for most case, but still sometimes could be > 100ms. For Isolation and Data consistency, we checkpoint the rocksdb state to local disk. As you have suggested a good file system and SSD based instance storage should be used to get the best performance. > 3. All spark executors stucks when one of executor try to load snapshot file from spark checkpoint. Great catch. Let me look at it and make appropriate changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-539838772 > 1. we using flatMapGroupsWithState, it cause it fail at begining Will update the PR with the fix > 2. Rocksdb checkpoint creating had a quite high time cost, sometimes > 20 secs, .. then I changed all of them to a ext4 partition, the result is much better, it's now could be < 10ms for most case, but still sometimes could be > 100ms. For Isolation and Data consistency, we checkpoint the rocksdb state to local disk. As you have suggested a good file system and SSD based instance storage should be used to get the best performance. > 3. All spark executors stucks when one of executor try to load snapshot file from spark checkpoint. Great catch. Let me look at it and make appropriate changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree
zhengruifeng commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree URL: https://github.com/apache/spark/pull/25929#issuecomment-539837239 @huaxingao Yes, I can reproduce your case. The 'private' classes can only be imported explicitly. I guess that is way it is a **weak** “internal use” indicator. I think we can add _single_leading_underscore according to the scala side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree
huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree URL: https://github.com/apache/spark/pull/25929#issuecomment-539830362 @zhengruifeng Thanks for your comments. I didn't add _single_leading_underscore for classes that are used for other packages. I am a little fuzzy about this _single_leading_underscore usage: In https://pep8.org/#descriptive-naming-styles, it has ```_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.``` It makes me feel that the class with _single_leading_underscore is for internal use only. It is not intended to be used in other packages. However, if I explicitly import the _single_leading_underscore class, it works OK. For example, If I do ```from pyspark.ml.tree import *```, the _single_leading_underscore class is not imported. If I do ```from pyspark.ml.tree import _DecisionTreeModel, _DecisionTreeParams```, these classes are imported OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shivusondur commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.
shivusondur commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference. URL: https://github.com/apache/spark/pull/25561#discussion_r332826762 ## File path: docs/sql-ref-syntax-aux-show-tables.md ## @@ -18,5 +18,86 @@ license: | See the License for the specific language governing permissions and limitations under the License. --- +### Description -**This page is under construction** +The `SHOW TABLES` statement returns all the tables for an optionally specified database. +Additionally, the output of this statement may be filtered by an optional matching +pattern. If no database is specified then the tables are returned from the +current database. + +### Syntax +{% highlight sql %} +SHOW TABLES [{FROM|IN} database_name] [LIKE 'regex_pattern'] +{% endhighlight %} + +### Parameters + + {FROM|IN} database_name + + Specifies the `database` name from which tables are listed. + + LIKE 'regex_pattern' + + Specifies the regex pattern that is used to filter out unwanted tables. +- The pattern is a regex except `*` and `|`characters Review comment: done. used the html ul and li tags and added the description like below `Except `*` and `|` characters remaining characters will follow the regular expression convention.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issues. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims are too general. Not every users have that kind of gigantic size clusters. Although a few big customers have some, there are also many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? Does it run on Apache Spark? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims are too general. Not every users have that kind of gigantic size clusters. Although a few big customers have some, there are also many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? Does it run on Apache Spark? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims are too general. Not every users have that kind of gigantic size clusters. Although a few big customers have some, there are also many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims are too general. Not every users have that kind of gigantic size clusters. Although those big customers have some, there are also many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims is too general. Not every users have that kind of gigantic size clusters. Although the those big customers, there are many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims are too general. Not every users have that kind of gigantic size clusters. Although the those big customers, there are many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims is too general. Not every users have that kind of gigantic size clusters. Although the those big customers, there are many satellite small-size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint
LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint URL: https://github.com/apache/spark/pull/25971#issuecomment-539820817 Thanks for the explanation @cloud-fan . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (`spark.ui.prometheus.enabled` is also by default `false`). 2. Second, your claims is too general. Not every users have that kind of gigantic size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403 Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issue. However, none of them blocks Apache Spark support `Prometheus` metric natively. 1. First, you can use the previous existing solution if you have that. (The is also by default `false`). 2. Second, your claims is too general. Not every users have that kind of gigantic size clusters. I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539820493 I don't see any number from you so far here. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#discussion_r332823271 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -286,7 +287,9 @@ private[hive] class SparkExecuteStatementOperation( if (e.isInstanceOf[HiveSQLException]) { throw e.asInstanceOf[HiveSQLException] } else { -throw new HiveSQLException("Error running query: " + e.toString, e) +val root = ExceptionUtils.getRootCause(e) Review comment: Besides the null checker, I've changed the code to above style. @wangyum This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539819286 > BTW, the following cover some gigantic clusters, but not all cases. There is a different and cheaper approach like `Federation`. We are using `Federation` which fits our environment much better. Would like to hear more about this. Our experience on Federation is that if we do not filter out some metrics but just federate metrics from each child server, the Federation prom will become the bottleneck. Federation provid a way to gather metrics from other Prometheus server, but it does not solve the scalability issues. How many metrics you have for your Federation Prometheus ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path
imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path URL: https://github.com/apache/spark/pull/26006#discussion_r332823038 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -2694,7 +2694,7 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession { sparkContext.addSparkListener(listener) try { // Execute the command. - sql("show databases").head() + sql("EXPLAIN show databases").head() Review comment: Adressed by #26048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539819286 > BTW, the following cover some gigantic clusters, but not all cases. There is a different and cheaper approach like `Federation`. We are using `Federation` which fits our environment much better. Would like to hear more about this. Our experience on Federation is that if we do not filter our some metrics but just federate metrics from each child server, the Federation prom will become the bottleneck. how many metrics you have for your Federation prom ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path
imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path URL: https://github.com/apache/spark/pull/26006#discussion_r332823038 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -2694,7 +2694,7 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession { sparkContext.addSparkListener(listener) try { // Execute the command. - sql("show databases").head() + sql("EXPLAIN show databases").head() Review comment: Addressed by #26048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539818338 > * `storage.tsdb.retention.time`: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects Thanks for the pointer. We are using a retention period for sure. But this will not help to expire some metrics as far as it goes into the Prometheus server. Also, it is one challenge to remove metrics from the [Prometheus registry](https://prometheus.io/docs/instrumenting/writing_clientlibs/#overall-structure) from the client-side as well. Again, these challenges become more critical when it is high cardinality metrics. This is why Prometheus community does not suggest people to use some unbounded values for a label. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539818709 BTW, the following cover some gigantic clusters, but not all cases. There is a different and cheaper approach like `Federation`. We are using `Federation` which fits our environment much better. > People use a highly scalable Prometheus(e.g. M3, Cotext, etc) to handle Spark metrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539818338 > * `storage.tsdb.retention.time`: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects Thanks for the pointer. We are using a retention period for sure. But this will not help to expire some metrics as far as it goes into the Prometheus server. Also it is one challenge to remove metrics from the prometheus registry from client-side as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539817155 Sorry for the misleading naming. I meant the following. - `storage.tsdb.retention.time`: https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#discussion_r332819911 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -286,7 +287,9 @@ private[hive] class SparkExecuteStatementOperation( if (e.isInstanceOf[HiveSQLException]) { throw e.asInstanceOf[HiveSQLException] } else { -throw new HiveSQLException("Error running query: " + e.toString, e) +val root = ExceptionUtils.getRootCause(e) Review comment: > ```scala > val rootCause = Option(ExceptionUtils.getRootCause(e)).getOrElse(e) > ``` Return null only if the input `e` is null. Do we still add this option? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-53981 > Second, you can use `Prometheus` TTL feature, @yuecong . Have you try that? could you share the link on this one? we are not using TTL feature for Prometheus yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539813010 > That is a general issue on Apache Spark monitoring instead of this PR, isn't it? So, I have three questions for you. > > 1. Do you use a custom Sink to monitor Apache Spark? > 2. Do you collect only a cluster-wide metrics? > 3. Is it helpful for long-running app monitoring like structured streamings? I agree with it is a general challenge for Apache Spark monitoring using normal Prometheus server. I would suggest just make it clear about its high-cardinality. Maybe this is orthogonal to your PR. just my two cents. People use a highly scalable Prometheus(e.g. M3, Cotext, etc) to handle Spark metrics. Also if we could have one custom exporter to allow users to use a push model to expose it to some distributed time serials database or a pub-sub system(e.g. kafka), it can solve this high cardinality issue as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
yaooqinn commented on a change in pull request #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#discussion_r332818335 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala ## @@ -177,13 +176,13 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter { } test("Waiting for app completion should stall on the watcher") { +kconf.sparkConf.set(WAIT_FOR_APP_COMPLETION, true) Review comment: I see, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job
cloud-fan commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job URL: https://github.com/apache/spark/pull/26048#issuecomment-539811954 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job
cloud-fan closed pull request #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job URL: https://github.com/apache/spark/pull/26048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514 This PR doesn't collect new metrics, only exposing the existing one. So, the following is not about this PR. If you have a concern on Apache Spark *driver*, you can file a new issue on that. > If driver keeps all the metrics for all the spark applications running using the driver, Second, from `Prometheus` side, you can use `Prometheus` TTL feature, @yuecong . Have you try that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface.
advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface. URL: https://github.com/apache/spark/pull/26058#discussion_r332817520 ## File path: core/src/main/scala/org/apache/spark/util/Clock.scala ## @@ -21,7 +21,14 @@ package org.apache.spark.util * An interface to represent clocks, so that they can be mocked out in unit tests. */ private[spark] trait Clock { + /** @return Current system time, in ms. */ def getTimeMillis(): Long + /** @return Current value of monotonic time source, in ns. */ + def nanoTime(): Long + /** Review comment: Nit: add blank line between methods? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface.
advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface. URL: https://github.com/apache/spark/pull/26058#discussion_r332817884 ## File path: core/src/main/scala/org/apache/spark/util/Clock.scala ## @@ -36,19 +43,23 @@ private[spark] class SystemClock extends Clock { * @return the same time (milliseconds since the epoch) * as is reported by `System.currentTimeMillis()` */ - def getTimeMillis(): Long = System.currentTimeMillis() + override def getTimeMillis(): Long = System.currentTimeMillis() + + /** + * @return value reported by `System.nanoTime()`. + */ + override def nanoTime(): Long = System.nanoTime() /** * @param targetTime block until the current time is at least this value * @return current system time when wait has completed */ - def waitTillTime(targetTime: Long): Long = { -var currentTime = 0L -currentTime = System.currentTimeMillis() + override def waitTillTime(targetTime: Long): Long = { +var currentTime = System.currentTimeMillis() var waitTime = targetTime - currentTime if (waitTime <= 0) { - return currentTime + return getTimeMillis() Review comment: Why not just `return currentTime`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514 This PR doesn't collect new metrics, only exposing the existing one. So, the following is not about this PR. > If driver keeps all the metrics for all the spark applications running using the driver, Second, you can use `Prometheus` TTL feature, @yuecong . Have you try that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514 This PR doesn't collect new metrics, only exposing the existing one. So, the following is not about this PR. > If driver keeps all the metrics for all the spark applications running using the driver, Second, from `Prometheus` side, you can use `Prometheus` TTL feature, @yuecong . Have you try that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] firestarman commented on a change in pull request #25983: [SPARK-29327][MLLIB]Support specifying features via multiple columns
firestarman commented on a change in pull request #25983: [SPARK-29327][MLLIB]Support specifying features via multiple columns URL: https://github.com/apache/spark/pull/25983#discussion_r332817984 ## File path: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ## @@ -55,14 +55,50 @@ class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext { predictor.fit(df.select(col("label"), col("weight").cast(StringType), col("features"))) } } + + test("multiple columns for features should work well without side effect") { +// Should fail due to not supporting multiple columns +intercept[IllegalArgumentException] { + new MockPredictor(false).setFeaturesCol(Array("feature1", "feature2", "feature3")) +} + +// Only use multiple columns for features +val df = spark.createDataFrame(Seq( + (0, 1, 0, 2, 3), + (1, 2, 0, 3, 9), + (0, 3, 0, 2, 6) +)).toDF("label", "weight", "feature1", "feature2", "feature3") + +val predictor = new MockPredictor().setWeightCol("weight") + .setFeaturesCol(Array("feature1", "feature2", "feature3")) +predictor.fit(df) + +// Should fail due to wrong type for column "feature1" in schema +intercept[IllegalArgumentException] { + predictor.fit(df.select(col("label"), col("weight"), +col("feature1").cast(StringType), col("feature2"), col("feature3"))) +} + +val df2 = df.toDF("label", "weight", "features", "feature2", "feature3") +// Should fail due to missing "feature1" in schema +intercept[IllegalArgumentException] { + predictor.setFeaturesCol(Array("feature1", "feature2", "feature3")).fit(df2) +} + +// Should fail due to wrong type in schema for single column of features Review comment: Thanks for review. Updated the comments Actually that's expected. I mean only the names ("feature2", "feature3") passed into `setFeaturesCol(Array)` are wanted to use as multiple columns. But "features" is provided in "df2" schema, equal to the default value of the single column name (just like calling `setFeaturesCol("features")`). Then my current design supposes users are trying to use both single column and multiple columns, and does type check for both of them. As said above ,"features" now is used as single column, and should be "Vector" but actually "Int", so the test fails. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning
viirya commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning URL: https://github.com/apache/spark/pull/20935#issuecomment-539811088 I think it is fine as his last response is more than 1 yr ago. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539810831 Do the metrics for the Spark application disappear after the application finished? I guess the answer is No. If driver keeps all the metrics for all the spark applications running using the driver, will this not cause a high memory usage for the driver? Do you have some tests on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539810783 @yuecong . That is a general issue on Apache Spark monitoring instead of this PR, isn't it? So, I have three questions for you. 1. Do you use a custom Sink to monitor Apache Spark? 2. Do you collect only a cluster-wide metrics? 3. Is it helpful for long-running app monitoring like structured streamings? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539809234 > 1. Please see this PR's description. The metric name is **unique** with cadinality 1 by using labels, `metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"` For Prometheus, not just the metrics name, but also labels count for a high-cardinality. Inside the tsdb for Prometheus server, actually, the combination of metrics names and labels are the key. see details [here](https://prometheus.io/docs/practices/naming/#labels) > CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-53980 Hi, @yuecong . Thank you for review. 1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes this Prometheus metric on the driver port, instead of the executor port. I mean you are referring `executor` instead of `driver`. Do you have a short-live Spark driver which dies in `30s`? > As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? 2. Please see this PR's description. The metric name is **unique** with cadinality 1 by using labels, `metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"` > It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. I don't think you mean `Prometheus Dimension feature` is high-cardinality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#discussion_r332816528 ## File path: core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala ## @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends ApiRequestContext { def executors(): String = { val sb = new StringBuilder val store = uiRoot.asInstanceOf[SparkUI].store -val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_") store.executorList(true).foreach { executor => - val prefix = s"metrics_${appId}_${executor.id}_executor_" - sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n") - sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n") - sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n") - sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n") - sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n") - sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n") - sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n") - sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n") - sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n") - sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n") - sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n") - sb.append(s"${prefix}totalInputBytes_Count ${executor.totalInputBytes}\n") - sb.append(s"${prefix}totalShuffleRead_Count ${executor.totalShuffleRead}\n") - sb.append(s"${prefix}totalShuffleWrite_Count ${executor.totalShuffleWrite}\n") - sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n") + val prefix = "metrics_executor_" + val labels = Seq( +"application_id" -> store.applicationInfo.id, +"application_name" -> store.applicationInfo.name, +"executor_id" -> executor.id + ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}") + sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n") + sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n") Review comment: Thank you for review, @viirya . Yes. Of course, we rename it freely because we start to support them natively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539808598 > 1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes this Prometheus metric on the driver port, instead of the executor port This is awesome! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-53980 Hi, @yuecong . Thank you for review. 1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes this Prometheus metric on the driver port, instead of the executor port. I mean you are referring `executor` instead of `driver`. Do you have a short-live Spark driver which dies in `30s`? > As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? 2. Please see this PR's description. The metric name is **unique** with cadinality 1 by using labels, `metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"` > It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. I don't think you mean `Prometheus Dimention feature` is high-cardinality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-53980 @yuecong . 1. That's true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes this Prometheus metric on the driver port, instead of the executor port. I mean you are referring `executor` instead of `driver`. Do you have a short-live Spark driver which dies in `30s`? > As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? 2. Please see this PR's description. The metric name is **unique** with cadinality 1 by using labels, `metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"` > It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. I don't think you mean `Prometheus Dimention feature` is high-cardinality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-53980 Hi, @yuecong . Thank you for review. 1. That's true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes this Prometheus metric on the driver port, instead of the executor port. I mean you are referring `executor` instead of `driver`. Do you have a short-live Spark driver which dies in `30s`? > As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? 2. Please see this PR's description. The metric name is **unique** with cadinality 1 by using labels, `metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"` > It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. I don't think you mean `Prometheus Dimention feature` is high-cardinality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#discussion_r332815572 ## File path: core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala ## @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends ApiRequestContext { def executors(): String = { val sb = new StringBuilder val store = uiRoot.asInstanceOf[SparkUI].store -val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_") store.executorList(true).foreach { executor => - val prefix = s"metrics_${appId}_${executor.id}_executor_" - sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n") - sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n") - sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n") - sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n") - sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n") - sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n") - sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n") - sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n") - sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n") - sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n") - sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n") - sb.append(s"${prefix}totalInputBytes_Count ${executor.totalInputBytes}\n") - sb.append(s"${prefix}totalShuffleRead_Count ${executor.totalShuffleRead}\n") - sb.append(s"${prefix}totalShuffleWrite_Count ${executor.totalShuffleWrite}\n") - sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n") + val prefix = "metrics_executor_" + val labels = Seq( +"application_id" -> store.applicationInfo.id, +"application_name" -> store.applicationInfo.name, +"executor_id" -> executor.id + ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}") + sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n") + sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n") Review comment: Not related to this PR. But why they all end with `_Count`? For `rddBlocks`, it is ok, but some seems not suitable, like `memoryUsed_Count`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #26062: [SPARK-29401][CORE][ML][SQL][GRAPHX][TESTS] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
dongjoon-hyun closed pull request #26062: [SPARK-29401][CORE][ML][SQL][GRAPHX][TESTS] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591 @dongjoon-hyun Thanks for fixing this. I have several questions on this. 1. Short-lived metrics As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? Check this [blog]( https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) about short-lived metrics for Prometheus. 2. High cardinality It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. See more information about prometheus's cardinality issue as [here](https://www.robustperception.io/cardinality-is-key) as well as this [doc](https://prometheus.io/docs/practices/naming/#labels) If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics(say 10) and assume it has M workers(20) on average. As app_id will change each time, with time going, old metrics will not disappear, it will add up to millions and even billions of metrics. This will cause a heavy load for a traditional Prometheus server. There are several solutions([M3](https://eng.uber.com/m3/), [Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/), [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address this issue, but we should make it clear about the cardinality for users to use such metrics. It would be great we give some suggestion how we want users to use such metrics in practice, especially on how to handle short-lived metrics and high cardinality metrics This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591 @dongjoon-hyun Thanks for fixing this. I have several questions on this. 1. Short-lived metrics As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? Check this [blog]( https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) about short-lived metrics for Prometheus. 2. Cardinality It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. See more information about prometheus's cardinality issue as [here](https://www.robustperception.io/cardinality-is-key) as well as this [doc](https://prometheus.io/docs/practices/naming/#labels) If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics(say 10) and assume it has M workers(100) on average. As app_id will change each time, with time going, old metrics will not disappear, it will add up to millions and even billions of metrics. This will cause a heavy load for a traditional Prometheus server. There are several solutions([M3](https://eng.uber.com/m3/), [Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/), [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address this issue, but we should make it clear about the cardinality for users to use such metrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591 @dongjoon-hyun Thanks for fixing this. I have several questions on this. 1. Short-lived metrics As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? Check this [blog]( https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) about short-lived metrics for Prometheus. 2. Cardinality It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. See more information about prometheus's cardinality issue as [here](https://www.robustperception.io/cardinality-is-key) as well as this [doc](https://prometheus.io/docs/practices/naming/#labels) If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics(say 10) and assume it has M workers(20) on average. As app_id will change each time, with time going, old metrics will not disappear, it will add up to millions and even billions of metrics. This will cause a heavy load for a traditional Prometheus server. There are several solutions([M3](https://eng.uber.com/m3/), [Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/), [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address this issue, but we should make it clear about the cardinality for users to use such metrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo")
dongjoon-hyun closed pull request #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo") URL: https://github.com/apache/spark/pull/26061 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591 @dongjoon-hyun Thanks for fixing this. I have several questions on this. 1. Short-lived metrics As Prometheus uses pull model, how do you recommend people to use these metrics for some executors who get shut down immediately? Also how this will work for some short-lived(e.g. shorter than one Prometheus scrape interval, usually it is 30s) spark application? Check this [blog]( https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) about short-lived metrics for Prometheus. 2. Cardinality It looks like you are using app_id as one of the app_id, which will increase the cardinality for Prometheus metrics. See more information about prometheus's cardinality issue as [here](https://www.robustperception.io/cardinality-is-key) as well as this [doc](https://prometheus.io/docs/practices/naming/#labels) If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics and assume it has M workers on average. This will cause a heavy load for a traditional Prometheus server. There are several solutions([M3](https://eng.uber.com/m3/), [Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/), [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address this issue, but we should make it clear about the cardinality for users to use such metrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo")
dongjoon-hyun commented on issue #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo") URL: https://github.com/apache/spark/pull/26061#issuecomment-539804605 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels
dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels URL: https://github.com/apache/spark/pull/26060#issuecomment-539803942 Hi, @srowen , @dbtsai , @HyukjinKwon . Could you review this PR, please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request #26064: [SPARK-23578][ML][PYSPARK] Binarizer support multi-column
zhengruifeng opened a new pull request #26064: [SPARK-23578][ML][PYSPARK] Binarizer support multi-column URL: https://github.com/apache/spark/pull/26064 ### What changes were proposed in this pull request? Binarizer support multi-column by extending `HasInputCols`/`HasOutputCols`/`HasThreshold`/`HasThresholds` ### Why are the changes needed? similar algs in `ml.feature` already support multi-column, like `Bucketizer`/`StringIndexer`/`QuantileDiscretizer` ### Does this PR introduce any user-facing change? yes, add setter/getter of `thresholds`/`inputCols`/`outputCols` ### How was this patch tested? added suites This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning
kiszk commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning URL: https://github.com/apache/spark/pull/20935#issuecomment-539795529 @pwoody @HyukjinKwon @viirya May I take over this since he did not respond for a long time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide
kiszk commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide URL: https://github.com/apache/spark/pull/26045#discussion_r332807321 ## File path: docs/sql-pyspark-pandas-with-arrow.md ## @@ -219,3 +219,14 @@ Note that a standard UDF (non-Pandas) will load timestamp data as Python datetim different than a Pandas timestamp. It is recommended to use Pandas time series functionality when working with timestamps in `pandas_udf`s to get the best performance, see [here](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for details. + +### Compatibiliy Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x + +Since Arrow 0.15.0, a change in the binary IPC format requires an environment variable to be set in Review comment: How about adding a link `http://arrow.apache.org/blog/2019/10/06/0.15.0-release/#columnar-streaming-protocol-change-since-0140` to the release blog of Apache Arrow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #26051: [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default
gatorsmile commented on issue #26051: [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default URL: https://github.com/apache/spark/pull/26051#issuecomment-539793779 @MaxGekk Could you submit a follow-up PR to update the migration guide? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job
imback82 commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job URL: https://github.com/apache/spark/pull/26048#issuecomment-539793532 I double-checked this. `V2TableWriteExec.writeWithV2` returns `sparkContext.emptyRDD`. In this case, `DAGScheduler.submitJob` will return without actually submitting a job. So there will be one job for `CREATE TABLE AS SELECT`. So theoretically, we could have just returned `sparkContext.emptyRDD` for some of the commands that don't return results (such as `USE`, etc.), but I think the new approach is cleaner (and we still need this new physical operator for `SHOW DATABASE`, etc.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression
beliefer commented on issue #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression URL: https://github.com/apache/spark/pull/25416#issuecomment-539793201 @dongjoon-hyun @HyukjinKwon Could you help me to review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-539790518 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-539790526 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16909/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-539790518 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-539790526 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16909/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-539789913 **[Test build #111929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111929/testReport)** for PR 26053 at commit [`b066088`](https://github.com/apache/spark/commit/b066088ee9a7deba990327f92a82940b85bf6025). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539789010 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539789015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111924/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539788946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539788951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111923/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539789015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111924/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539789010 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539788946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539788951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111923/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
SparkQA removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539749285 **[Test build #111924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111924/testReport)** for PR 24851 at commit [`c50b679`](https://github.com/apache/spark/commit/c50b679310de99bba38e94b092871b3c82dc0ce1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component
zhengruifeng commented on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/25909#issuecomment-539788608 @mob-ai Thanks for this work! But before you continue, I guess you can refer to previous dicsussion [SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you should provide some information like convergence curves on common datasets, and proof that mini-batch SGD is a good choice as an efficient solver. As to the PR itself, besides owen's comments, I think there should be `FMClassifier` & `FMRegressor` respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
SparkQA removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539741136 **[Test build #111923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111923/testReport)** for PR 26062 at commit [`c43aa71`](https://github.com/apache/spark/commit/c43aa711ee035891d1d6af9ff27786d35c76885a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
SparkQA commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#issuecomment-539788379 **[Test build #111924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111924/testReport)** for PR 24851 at commit [`c50b679`](https://github.com/apache/spark/commit/c50b679310de99bba38e94b092871b3c82dc0ce1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class PropertyGraphReader(session: CypherSession) ` * `abstract class PropertyGraphWriter(val graph: PropertyGraph) ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
SparkQA commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples URL: https://github.com/apache/spark/pull/26062#issuecomment-539788014 **[Test build #111923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111923/testReport)** for PR 26062 at commit [`c43aa71`](https://github.com/apache/spark/commit/c43aa711ee035891d1d6af9ff27786d35c76885a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25963: [SPARK-28137][SQL] Add Postgresql function to_number.
beliefer commented on issue #25963: [SPARK-28137][SQL] Add Postgresql function to_number. URL: https://github.com/apache/spark/pull/25963#issuecomment-539787262 @dongjoon-hyun @wangyum Could you help me to review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now URL: https://github.com/apache/spark/pull/26041#issuecomment-539785681 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539785727 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now URL: https://github.com/apache/spark/pull/26041#issuecomment-539785687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16907/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539785731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16908/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539785731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16908/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now URL: https://github.com/apache/spark/pull/26041#issuecomment-539785681 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now URL: https://github.com/apache/spark/pull/26041#issuecomment-539785687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16907/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539785727 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
SparkQA commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now URL: https://github.com/apache/spark/pull/26041#issuecomment-539784395 **[Test build #111927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111927/testReport)** for PR 26041 at commit [`e1a6807`](https://github.com/apache/spark/commit/e1a680755475d0a4cfd60d5352b9af4dcf573dd7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
SparkQA commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539784367 **[Test build #111928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111928/testReport)** for PR 25984 at commit [`f7bd663`](https://github.com/apache/spark/commit/f7bd663b73c95d8d50691da455aea980630e112a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2
wangyum commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2 URL: https://github.com/apache/spark/pull/25984#issuecomment-539783883 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer edited a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer edited a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-539467518 @dongjoon-hyun @maropu @HyukjinKwon @wangyum Could you have time to follow up this PR ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] teeyog commented on issue #25287: [SPARK-28552][SQL]Identification of different dialects insensitive to case by JDBC URL prefix
teeyog commented on issue #25287: [SPARK-28552][SQL]Identification of different dialects insensitive to case by JDBC URL prefix URL: https://github.com/apache/spark/pull/25287#issuecomment-539783224 @maropu hi, I reported this error when test build. I don't know how to solve it. Can you help me? Thank you. You can also go to see Console Output. ``` [error] (spark/javaunidoc:doc) javadoc returned nonzero exit code [error] Total time: 98 s, completed Oct 7, 2019 8:24:15 PM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos unidoc ; received return code 1 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-539782072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111922/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-539782070 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-539782072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111922/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org