date:20191008

[GitHub] [spark] LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution

2019-10-08 Thread GitBox

LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden 
when query from JDBC, especially enabled adaptive execution
URL: https://github.com/apache/spark/pull/25960#issuecomment-539844688
 
 
   retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on issue #26042: [SPARK-29092][SQL] Report additional information about DataSourceScanExec in EXPLAIN FORMATTED

2019-10-08 Thread GitBox

dilipbiswal commented on issue #26042: [SPARK-29092][SQL] Report additional 
information about DataSourceScanExec in EXPLAIN FORMATTED
URL: https://github.com/apache/spark/pull/26042#issuecomment-539844257
 
 
   cc @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree

2019-10-08 Thread GitBox

huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py 
classes related to DecisionTree
URL: https://github.com/apache/spark/pull/25929#issuecomment-539840347
 
 
   OK. I will add _single_leading_underscore to the classes you mentioned in 
the comments. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itsvikramagr edited a comment on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-10-08 Thread GitBox

itsvikramagr edited a comment on issue #24922: [SPARK-28120][SS]  Rocksdb state 
storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-539838772
 
 
   > 1. we using flatMapGroupsWithState, it cause it fail at begining
   
   Will update the PR with the fix
   
   > 2. Rocksdb checkpoint creating had a quite high time cost, sometimes > 20 
secs, ..  then I changed all of them to a ext4 partition, the result is much 
better, it's now could be < 10ms for most case, but still sometimes could be > 
100ms.
   
   For Isolation and Data consistency, we checkpoint the rocksdb state to local 
disk. As you have suggested a good file system and SSD based instance storage 
should be used to get the best performance.
   
   > 3. All spark executors stucks when one of executor try to load snapshot 
file from spark checkpoint.
   
   Great catch. Let me look at it and make appropriate changes.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-10-08 Thread GitBox

itsvikramagr commented on issue #24922: [SPARK-28120][SS]  Rocksdb state 
storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-539838772
 
 
   > 1. we using flatMapGroupsWithState, it cause it fail at begining
   Will update the PR with the fix
   
   > 2. Rocksdb checkpoint creating had a quite high time cost, sometimes > 20 
secs, ..  then I changed all of them to a ext4 partition, the result is much 
better, it's now could be < 10ms for most case, but still sometimes could be > 
100ms.
   
   For Isolation and Data consistency, we checkpoint the rocksdb state to local 
disk. As you have suggested a good file system and SSD based instance storage 
should be used to get the best performance.
   
   > 3. All spark executors stucks when one of executor try to load snapshot 
file from spark checkpoint.
   Great catch. Let me look at it and make appropriate changes.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree

2019-10-08 Thread GitBox

zhengruifeng commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py 
classes related to DecisionTree
URL: https://github.com/apache/spark/pull/25929#issuecomment-539837239
 
 
   @huaxingao Yes, I can reproduce your case.
   The 'private' classes can only be imported explicitly. I guess that is way 
it is a **weak** “internal use” indicator.
   I think we can add _single_leading_underscore according to the scala side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py classes related to DecisionTree

2019-10-08 Thread GitBox

huaxingao commented on issue #25929: [SPARK-29116][PYTHON][ML] Refactor py 
classes related to DecisionTree
URL: https://github.com/apache/spark/pull/25929#issuecomment-539830362
 
 
   @zhengruifeng 
   Thanks for your comments.
   I didn't add _single_leading_underscore for classes that are used for other 
packages. 
   I am a little fuzzy about this _single_leading_underscore usage:
   In https://pep8.org/#descriptive-naming-styles, it has
   ```_single_leading_underscore: weak “internal use” indicator. E.g. from M 
import * does not import objects whose name starts with an underscore.```
   It makes me feel that the class with _single_leading_underscore is for 
internal use only. It is not intended to be used in other packages. However, if 
I explicitly import the _single_leading_underscore class, it works OK. 
   For example, If I do 
   ```from pyspark.ml.tree import *```, the _single_leading_underscore class is 
not imported. 
   If I do
   ```from pyspark.ml.tree import _DecisionTreeModel, _DecisionTreeParams```, 
these classes are imported OK. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shivusondur commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.

2019-10-08 Thread GitBox

shivusondur commented on a change in pull request #25561: 
[SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.
URL: https://github.com/apache/spark/pull/25561#discussion_r332826762
 
 

 ##
 File path: docs/sql-ref-syntax-aux-show-tables.md
 ##
 @@ -18,5 +18,86 @@ license: |
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
+### Description
 
-**This page is under construction**
+The `SHOW TABLES` statement returns all the tables for an optionally specified 
database.
+Additionally, the output of this statement may be filtered by an optional 
matching
+pattern. If no database is specified then the tables are returned from the 
+current database.
+
+### Syntax
+{% highlight sql %}
+SHOW TABLES [{FROM|IN} database_name] [LIKE 'regex_pattern']
+{% endhighlight %}
+
+### Parameters
+
+  {FROM|IN} database_name
+  
+ Specifies the `database` name from which tables are listed.
+  
+  LIKE 'regex_pattern'
+  
+ Specifies the regex pattern that is used to filter out unwanted tables.
+- The pattern is a regex except `*` and `|`characters
 
 Review comment:
   done.
   used the html ul and li  tags  
   and added the description like below
`Except `*` and `|` characters remaining characters will follow the regular 
expression convention.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issues.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims are too general. Not every users have that kind of 
gigantic size clusters. Although a few big customers have some, there are also 
many satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics? Does it run on Apache Spark?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims are too general. Not every users have that kind of 
gigantic size clusters. Although a few big customers have some, there are also 
many satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics? Does it run on Apache Spark?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims are too general. Not every users have that kind of 
gigantic size clusters. Although a few big customers have some, there are also 
many satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims are too general. Not every users have that kind of 
gigantic size clusters. Although those big customers have some, there are also 
many satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims is too general. Not every users have that kind of 
gigantic size clusters. Although the those big customers, there are many 
satellite small-size clusters. I'm not sure your metric. Could you share us the 
size of your clusters and the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims are too general. Not every users have that kind of 
gigantic size clusters. Although the those big customers, there are many 
satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims is too general. Not every users have that kind of 
gigantic size clusters. Although the those big customers, there are many 
satellite small-size clusters.
   
   I'm not sure your metric. Could you share us the size of your clusters and 
the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager heartbeat endpoint from driver endpoint

2019-10-08 Thread GitBox

LantaoJin commented on issue #25971: [SPARK-29298][CORE] Separate block manager 
heartbeat endpoint from driver endpoint
URL: https://github.com/apache/spark/pull/25971#issuecomment-539820817
 
 
   Thanks for the explanation @cloud-fan .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. 
(`spark.ui.prometheus.enabled` is also by default `false`).
   2. Second, your claims is too general. Not every users have that kind of 
gigantic size clusters. I'm not sure your metric. Could you share us the size 
of your clusters and the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820403
 
 
   Nope. Why do you collect all? It's up to your configuration.
   
   Back to the beginning, I fully understand your cluster's underlying issue.
   
   However, none of them blocks Apache Spark support `Prometheus` metric 
natively.
   1. First, you can use the previous existing solution if you have that. (The 
is also by default `false`).
   2. Second, your claims is too general. Not every users have that kind of 
gigantic size clusters. I'm not sure your metric. Could you share us the size 
of your clusters and the number of apps and the number of metrics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539820493
 
 
   I don't see any number from you so far here. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution

2019-10-08 Thread GitBox

LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] 
Error message is hidden when query from JDBC, especially enabled adaptive 
execution
URL: https://github.com/apache/spark/pull/25960#discussion_r332823271
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ##
 @@ -286,7 +287,9 @@ private[hive] class SparkExecuteStatementOperation(
   if (e.isInstanceOf[HiveSQLException]) {
 throw e.asInstanceOf[HiveSQLException]
   } else {
-throw new HiveSQLException("Error running query: " + e.toString, e)
+val root = ExceptionUtils.getRootCause(e)
 
 Review comment:
   Besides the null checker, I've changed the code to above style. @wangyum 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539819286
 
 
   > BTW, the following cover some gigantic clusters, but not all cases. There 
is a different and cheaper approach like `Federation`. We are using 
`Federation` which fits our environment much better.
   
   Would like to hear more about this. Our experience on Federation is that if 
we do not filter out some metrics but just federate metrics from each child 
server, the Federation prom will become the bottleneck. Federation provid a way 
to gather metrics from other Prometheus server, but it does not solve the 
scalability issues. 
   How many metrics you have for your Federation Prometheus ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path

2019-10-08 Thread GitBox

imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge 
SHOW NAMESPACES and SHOW DATABASES code path
URL: https://github.com/apache/spark/pull/26006#discussion_r332823038
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
 ##
 @@ -2694,7 +2694,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession {
 sparkContext.addSparkListener(listener)
 try {
   // Execute the command.
-  sql("show databases").head()
+  sql("EXPLAIN show databases").head()
 
 Review comment:
   Adressed by #26048


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539819286
 
 
   > BTW, the following cover some gigantic clusters, but not all cases. There 
is a different and cheaper approach like `Federation`. We are using 
`Federation` which fits our environment much better.
   
   Would like to hear more about this. Our experience on Federation is that if 
we do not filter our some metrics but just federate metrics from each child 
server, the Federation prom will become the bottleneck. how many metrics you 
have for your Federation prom ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge SHOW NAMESPACES and SHOW DATABASES code path

2019-10-08 Thread GitBox

imback82 commented on a change in pull request #26006: [SPARK-29279][SQL] Merge 
SHOW NAMESPACES and SHOW DATABASES code path
URL: https://github.com/apache/spark/pull/26006#discussion_r332823038
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
 ##
 @@ -2694,7 +2694,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession {
 sparkContext.addSparkListener(listener)
 try {
   // Execute the command.
-  sql("show databases").head()
+  sql("EXPLAIN show databases").head()
 
 Review comment:
   Addressed by #26048


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539818338
 
 
   > * `storage.tsdb.retention.time`: 
https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
   
   Thanks for the pointer. We are using a retention period for sure. But this 
will not help to expire some metrics as far as it goes into the Prometheus 
server. Also, it is one challenge to remove metrics from the [Prometheus 
registry](https://prometheus.io/docs/instrumenting/writing_clientlibs/#overall-structure)
 from the client-side as well. Again, these challenges become more critical 
when it is high cardinality metrics. This is why Prometheus community does not 
suggest people to use some unbounded values for a label.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539818709
 
 
   BTW, the following cover some gigantic clusters, but not all cases. There is 
a different and cheaper approach like `Federation`. We are using `Federation` 
which fits our environment much better.
   > People use a highly scalable Prometheus(e.g. M3, Cotext, etc) to handle 
Spark metrics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539818338
 
 
   > * `storage.tsdb.retention.time`: 
https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
   
   Thanks for the pointer. We are using a retention period for sure. But this 
will not help to expire some metrics as far as it goes into the Prometheus 
server. Also it is one challenge to remove metrics from the prometheus registry 
from client-side as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539817155
 
 
   Sorry for the misleading naming. I meant the following.
   - `storage.tsdb.retention.time`: 
https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution

2019-10-08 Thread GitBox

LantaoJin commented on a change in pull request #25960: [SPARK-29283][SQL] 
Error message is hidden when query from JDBC, especially enabled adaptive 
execution
URL: https://github.com/apache/spark/pull/25960#discussion_r332819911
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ##
 @@ -286,7 +287,9 @@ private[hive] class SparkExecuteStatementOperation(
   if (e.isInstanceOf[HiveSQLException]) {
 throw e.asInstanceOf[HiveSQLException]
   } else {
-throw new HiveSQLException("Error running query: " + e.toString, e)
+val root = ExceptionUtils.getRootCause(e)
 
 Review comment:
   > ```scala
   > val rootCause = Option(ExceptionUtils.getRootCause(e)).getOrElse(e)
   > ```
   
   Return null only if the input `e` is null. Do we still add this option?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-53981
 
 
   > Second, you can use `Prometheus` TTL feature, @yuecong . Have you try that?
   
   could you share the link on this one? we are not using TTL feature for 
Prometheus yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539813010
 
 
   > That is a general issue on Apache Spark monitoring instead of this PR, 
isn't it? So, I have three questions for you.
   > 
   > 1. Do you use a custom Sink to monitor Apache Spark?
   > 2. Do you collect only a cluster-wide metrics?
   > 3. Is it helpful for long-running app monitoring like structured 
streamings?
   
   I agree with it is a general challenge for Apache Spark monitoring using 
normal Prometheus server. I would suggest just make it clear about its 
high-cardinality. Maybe this is orthogonal to your PR. just my two cents. 
People use a highly scalable Prometheus(e.g. M3, Cotext, etc) to handle Spark 
metrics.
   
   Also if we could have one custom exporter to allow users to use a push model 
to expose it to some distributed time serials database or a pub-sub system(e.g. 
kafka), it can solve this high cardinality issue as well
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness

2019-10-08 Thread GitBox

yaooqinn commented on a change in pull request #25648: [SPARK-28947][K8S] 
Status logging not happens at an interval for liveness
URL: https://github.com/apache/spark/pull/25648#discussion_r332818335
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
 ##
 @@ -177,13 +176,13 @@ class ClientSuite extends SparkFunSuite with 
BeforeAndAfter {
   }
 
   test("Waiting for app completion should stall on the watcher") {
+kconf.sparkConf.set(WAIT_FOR_APP_COMPLETION, true)
 
 Review comment:
   I see, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job

2019-10-08 Thread GitBox

cloud-fan commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands 
should not submit a spark job
URL: https://github.com/apache/spark/pull/26048#issuecomment-539811954
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job

2019-10-08 Thread GitBox

cloud-fan closed pull request #26048: [SPARK-29373][SQL] DataSourceV2: Commands 
should not submit a spark job
URL: https://github.com/apache/spark/pull/26048
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514
 
 
   This PR doesn't collect new metrics, only exposing the existing one. So, the 
following is not about this PR. If you have a concern on Apache Spark *driver*, 
you can file a new issue on that.
   > If driver keeps all the metrics for all the spark applications running 
using the driver,
   
   Second, from `Prometheus` side, you can use `Prometheus` TTL feature, 
@yuecong . Have you try that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface.

2019-10-08 Thread GitBox

advancedxy commented on a change in pull request #26058: [SPARK-10614][core] 
Add monotonic time to Clock interface.
URL: https://github.com/apache/spark/pull/26058#discussion_r332817520
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/Clock.scala
 ##
 @@ -21,7 +21,14 @@ package org.apache.spark.util
  * An interface to represent clocks, so that they can be mocked out in unit 
tests.
  */
 private[spark] trait Clock {
+  /** @return Current system time, in ms. */
   def getTimeMillis(): Long
+  /** @return Current value of monotonic time source, in ns. */
+  def nanoTime(): Long
+  /**
 
 Review comment:
   Nit: add blank line between methods?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on a change in pull request #26058: [SPARK-10614][core] Add monotonic time to Clock interface.

2019-10-08 Thread GitBox

advancedxy commented on a change in pull request #26058: [SPARK-10614][core] 
Add monotonic time to Clock interface.
URL: https://github.com/apache/spark/pull/26058#discussion_r332817884
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/Clock.scala
 ##
 @@ -36,19 +43,23 @@ private[spark] class SystemClock extends Clock {
* @return the same time (milliseconds since the epoch)
* as is reported by `System.currentTimeMillis()`
*/
-  def getTimeMillis(): Long = System.currentTimeMillis()
+  override def getTimeMillis(): Long = System.currentTimeMillis()
+
+  /**
+   * @return value reported by `System.nanoTime()`.
+   */
+  override def nanoTime(): Long = System.nanoTime()
 
   /**
* @param targetTime block until the current time is at least this value
* @return current system time when wait has completed
*/
-  def waitTillTime(targetTime: Long): Long = {
-var currentTime = 0L
-currentTime = System.currentTimeMillis()
+  override def waitTillTime(targetTime: Long): Long = {
+var currentTime = System.currentTimeMillis()
 
 var waitTime = targetTime - currentTime
 if (waitTime <= 0) {
-  return currentTime
+  return getTimeMillis()
 
 Review comment:
   Why not just `return currentTime`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514
 
 
   This PR doesn't collect new metrics, only exposing the existing one. So, the 
following is not about this PR.
   > If driver keeps all the metrics for all the spark applications running 
using the driver,
   
   Second, you can use `Prometheus` TTL feature, @yuecong . Have you try that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539811514
 
 
   This PR doesn't collect new metrics, only exposing the existing one. So, the 
following is not about this PR.
   > If driver keeps all the metrics for all the spark applications running 
using the driver,
   
   Second, from `Prometheus` side, you can use `Prometheus` TTL feature, 
@yuecong . Have you try that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] firestarman commented on a change in pull request #25983: [SPARK-29327][MLLIB]Support specifying features via multiple columns

2019-10-08 Thread GitBox

firestarman commented on a change in pull request #25983: 
[SPARK-29327][MLLIB]Support specifying features via multiple columns
URL: https://github.com/apache/spark/pull/25983#discussion_r332817984
 
 

 ##
 File path: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala
 ##
 @@ -55,14 +55,50 @@ class PredictorSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   predictor.fit(df.select(col("label"), col("weight").cast(StringType), 
col("features")))
 }
   }
+
+  test("multiple columns for features should work well without side effect") {
+// Should fail due to not supporting multiple columns
+intercept[IllegalArgumentException] {
+  new MockPredictor(false).setFeaturesCol(Array("feature1", "feature2", 
"feature3"))
+}
+
+// Only use multiple columns for features
+val df = spark.createDataFrame(Seq(
+  (0, 1, 0, 2, 3),
+  (1, 2, 0, 3, 9),
+  (0, 3, 0, 2, 6)
+)).toDF("label", "weight", "feature1", "feature2", "feature3")
+
+val predictor = new MockPredictor().setWeightCol("weight")
+  .setFeaturesCol(Array("feature1", "feature2", "feature3"))
+predictor.fit(df)
+
+// Should fail due to wrong type for column "feature1" in schema
+intercept[IllegalArgumentException] {
+  predictor.fit(df.select(col("label"), col("weight"),
+col("feature1").cast(StringType), col("feature2"), col("feature3")))
+}
+
+val df2 = df.toDF("label", "weight", "features", "feature2", "feature3")
+// Should fail due to missing "feature1" in schema
+intercept[IllegalArgumentException] {
+  predictor.setFeaturesCol(Array("feature1", "feature2", 
"feature3")).fit(df2)
+}
+
+// Should fail due to wrong type in schema for single column of features
 
 Review comment:
   Thanks for review.  Updated the comments
   Actually that's expected. I mean only the names ("feature2", "feature3") 
passed into `setFeaturesCol(Array)` are wanted to use as multiple columns. But 
"features" is provided in "df2" schema, equal to the default value of the 
single column name (just like calling `setFeaturesCol("features")`). Then my 
current design supposes users are trying to use both single column and multiple 
columns, and does type check for both of them. As said above ,"features" now is 
used as single column, and should be "Vector" but actually "Int", so the test 
fails.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning

2019-10-08 Thread GitBox

viirya commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec 
complex type pruning
URL: https://github.com/apache/spark/pull/20935#issuecomment-539811088
 
 
   I think it is fine as his last response is more than 1 yr ago.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539810831
 
 
   Do the metrics for the Spark application disappear after the application 
finished? I guess the answer is No. If driver keeps all the metrics for all the 
spark applications running using the driver, will this not cause a high memory 
usage for the driver? Do you have some tests on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539810783
 
 
   @yuecong . That is a general issue on Apache Spark monitoring instead of 
this PR, isn't it? So, I have three questions for you.
   1. Do you use a custom Sink to monitor Apache Spark?
   2. Do you collect only a cluster-wide metrics?
   3. Is it helpful for long-running app monitoring like structured streamings?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539809234
 
 
   > 1. Please see this PR's description. The metric name is **unique** with 
cadinality 1 by using labels, 
`metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"`
   
   For Prometheus, not just the metrics name, but also labels count for a 
high-cardinality. Inside the tsdb for Prometheus server, actually, the 
combination of metrics names and labels are the key.
   see details [here](https://prometheus.io/docs/practices/naming/#labels)
   
   > CAUTION: Remember that every unique combination of key-value label pairs 
represents a new time series, which can dramatically increase the amount of 
data stored. Do not use labels to store dimensions with high cardinality (many 
different label values), such as user IDs, email addresses, or other unbounded 
sets of values.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-53980
 
 
   Hi, @yuecong . Thank you for review.
   1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 
exposes this Prometheus metric on the driver port, instead of the executor 
port. I mean you are referring `executor` instead of `driver`. Do you have a 
short-live Spark driver which dies in `30s`?
   >  As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately? Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   
   2. Please see this PR's description. The metric name is **unique** with 
cadinality 1 by using labels, 
`metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"`
   > It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. 
   
   I don't think you mean `Prometheus Dimension feature` is high-cardinality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r332816528
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
+  sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n")
 
 Review comment:
   Thank you for review, @viirya . Yes. Of course, we rename it freely because 
we start to support them natively.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539808598
 
 
   > 1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 
exposes this Prometheus metric on the driver port, instead of the executor port
   
   This is awesome!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-53980
 
 
   Hi, @yuecong . Thank you for review.
   1. That was true in the old Prometheus plugin. So, Apache Spark 3.0.0 
exposes this Prometheus metric on the driver port, instead of the executor 
port. I mean you are referring `executor` instead of `driver`. Do you have a 
short-live Spark driver which dies in `30s`?
   >  As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately? Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   
   2. Please see this PR's description. The metric name is **unique** with 
cadinality 1 by using labels, 
`metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"`
   > It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. 
   
   I don't think you mean `Prometheus Dimention feature` is high-cardinality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-53980
 
 
   @yuecong .
   1. That's true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes 
this Prometheus metric on the driver port, instead of the executor port. I mean 
you are referring `executor` instead of `driver`. Do you have a short-live 
Spark driver which dies in `30s`?
   >  As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately? Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   
   2. Please see this PR's description. The metric name is **unique** with 
cadinality 1 by using labels, 
`metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"`
   > It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. 
   
   I don't think you mean `Prometheus Dimention feature` is high-cardinality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-53980
 
 
   Hi, @yuecong . Thank you for review.
   1. That's true in the old Prometheus plugin. So, Apache Spark 3.0.0 exposes 
this Prometheus metric on the driver port, instead of the executor port. I mean 
you are referring `executor` instead of `driver`. Do you have a short-live 
Spark driver which dies in `30s`?
   >  As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately? Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   
   2. Please see this PR's description. The metric name is **unique** with 
cadinality 1 by using labels, 
`metrics_executor_rddBlocks_Count{application_id="app-20191008151625-"`
   > It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. 
   
   I don't think you mean `Prometheus Dimention feature` is high-cardinality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r332815572
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
+  sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n")
 
 Review comment:
   Not related to this PR. But why they all end with `_Count`? For `rddBlocks`, 
it is ok, but some seems not suitable, like `memoryUsed_Count`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26062: [SPARK-29401][CORE][ML][SQL][GRAPHX][TESTS] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

dongjoon-hyun closed pull request #26062: 
[SPARK-29401][CORE][ML][SQL][GRAPHX][TESTS] Replace calls to .parallelize 
Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
URL: https://github.com/apache/spark/pull/26062
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591

@dongjoon-hyun Thanks for fixing this.
I have several questions on this.

1. Short-lived metrics
As Prometheus uses pull model, how do you recommend people to use these
metrics for some executors who get shut down immediately? Also how this will
work for some short-lived(e.g. shorter than one Prometheus scrape interval,
usually it is 30s) spark application?
Check this [blog](
https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101)
about short-lived metrics for Prometheus.

2. High cardinality
It looks like you are using app_id as one of the app_id, which will
increase the cardinality for Prometheus metrics. See more information about
prometheus's cardinality issue as
[here](https://www.robustperception.io/cardinality-is-key) as well as this
[doc](https://prometheus.io/docs/practices/naming/#labels)

If a user uses a central Prometheus server to scrape its spark application
with this PR. for each time, it has a new Spark application, it will have N
metrics(say 10) and assume it has M workers(20) on average. As app_id will
change each time, with time going, old metrics will not disappear, it will add
up to millions and even billions of metrics. This will cause a heavy load for a
traditional Prometheus server. There are several
solutions([M3](https://eng.uber.com/m3/),
[Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/),
[Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address
this issue, but we should make it clear about the cardinality for users to use
such metrics.

It would be great we give some suggestion how we want users to use such
metrics in practice, especially on how to handle short-lived metrics and high
cardinality metrics

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591
 
 
   @dongjoon-hyun Thanks for fixing this. 
   I have several questions on this.
   
   1. Short-lived metrics
   As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately?  Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   Check this [blog]( 
https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) 
about short-lived metrics for Prometheus.
   
   2. Cardinality
It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. See more information about 
prometheus's cardinality issue as 
[here](https://www.robustperception.io/cardinality-is-key) as well as this 
[doc](https://prometheus.io/docs/practices/naming/#labels)
   
   If a user uses a central Prometheus server to scrape its spark application 
with this PR. for each time, it has a new Spark application, it will have N 
metrics(say 10) and assume it has M workers(100) on average. As app_id will 
change each time, with time going, old metrics will not disappear, it will add 
up to millions and even billions of metrics. This will cause a heavy load for a 
traditional Prometheus server. There are several 
solutions([M3](https://eng.uber.com/m3/), 
[Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/),
 [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address 
this issue, but we should make it clear about the cardinality for users to use 
such metrics.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591
 
 
   @dongjoon-hyun Thanks for fixing this. 
   I have several questions on this.
   
   1. Short-lived metrics
   As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately?  Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   Check this [blog]( 
https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) 
about short-lived metrics for Prometheus.
   
   2. Cardinality
It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. See more information about 
prometheus's cardinality issue as 
[here](https://www.robustperception.io/cardinality-is-key) as well as this 
[doc](https://prometheus.io/docs/practices/naming/#labels)
   
   If a user uses a central Prometheus server to scrape its spark application 
with this PR. for each time, it has a new Spark application, it will have N 
metrics(say 10) and assume it has M workers(20) on average. As app_id will 
change each time, with time going, old metrics will not disappear, it will add 
up to millions and even billions of metrics. This will cause a heavy load for a 
traditional Prometheus server. There are several 
solutions([M3](https://eng.uber.com/m3/), 
[Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/),
 [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address 
this issue, but we should make it clear about the cardinality for users to use 
such metrics.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo")

2019-10-08 Thread GitBox

dongjoon-hyun closed pull request #26061: [SPARK-29392][CORE][SQL][STREAMING] 
Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of 
Symbol("foo")
URL: https://github.com/apache/spark/pull/26061
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yuecong commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

yuecong commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591
 
 
   @dongjoon-hyun Thanks for fixing this. 
   I have several questions on this.
   
   1. Short-lived metrics
   As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately?  Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   Check this [blog]( 
https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) 
about short-lived metrics for Prometheus.
   
   2. Cardinality
It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. See more information about 
prometheus's cardinality issue as 
[here](https://www.robustperception.io/cardinality-is-key) as well as this 
[doc](https://prometheus.io/docs/practices/naming/#labels)
   
   If a user uses a central Prometheus server to scrape its spark application 
with this PR. for each time, it has a new Spark application, it will have N 
metrics and assume it has M workers on average. This will cause a heavy load 
for a traditional Prometheus server. There are several 
solutions([M3](https://eng.uber.com/m3/), 
[Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/),
 [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address 
this issue, but we should make it clear about the cardinality for users to use 
such metrics.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26061: [SPARK-29392][CORE][SQL][STREAMING] Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of Symbol("foo")

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26061: [SPARK-29392][CORE][SQL][STREAMING] 
Remove symbol literal syntax 'foo, deprecated in Scala 2.13, in favor of 
Symbol("foo")
URL: https://github.com/apache/spark/pull/26061#issuecomment-539804605
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-08 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539803942
 
 
   Hi, @srowen , @dbtsai , @HyukjinKwon .
   Could you review this PR, please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request #26064: [SPARK-23578][ML][PYSPARK] Binarizer support multi-column

2019-10-08 Thread GitBox

zhengruifeng opened a new pull request #26064: [SPARK-23578][ML][PYSPARK] 
Binarizer support multi-column
URL: https://github.com/apache/spark/pull/26064
 
 
   ### What changes were proposed in this pull request?
   Binarizer support multi-column by extending 
`HasInputCols`/`HasOutputCols`/`HasThreshold`/`HasThresholds`
   
   ### Why are the changes needed?
   similar algs in `ml.feature` already support multi-column, like 
`Bucketizer`/`StringIndexer`/`QuantileDiscretizer`
   
   
   ### Does this PR introduce any user-facing change?
   yes, add setter/getter of `thresholds`/`inputCols`/`outputCols`
   
   ### How was this patch tested?
   added suites


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kiszk commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex type pruning

2019-10-08 Thread GitBox

kiszk commented on issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec 
complex type pruning
URL: https://github.com/apache/spark/pull/20935#issuecomment-539795529
 
 
   @pwoody @HyukjinKwon @viirya May I take over this since he did not respond 
for a long time?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kiszk commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide

2019-10-08 Thread GitBox

kiszk commented on a change in pull request #26045: [SPARK-29367][DOC] Add 
compatibility note for Arrow 0.15.0 to SQL guide
URL: https://github.com/apache/spark/pull/26045#discussion_r332807321
 
 

 ##
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##
 @@ -219,3 +219,14 @@ Note that a standard UDF (non-Pandas) will load timestamp 
data as Python datetim
 different than a Pandas timestamp. It is recommended to use Pandas time series 
functionality when
 working with timestamps in `pandas_udf`s to get the best performance, see
 [here](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for 
details.
+
+### Compatibiliy Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
+
+Since Arrow 0.15.0, a change in the binary IPC format requires an environment 
variable to be set in
 
 Review comment:
   How about adding a link 
`http://arrow.apache.org/blog/2019/10/06/0.15.0-release/#columnar-streaming-protocol-change-since-0140`
 to the release blog of Apache Arrow?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #26051: [SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default

2019-10-08 Thread GitBox

gatorsmile commented on issue #26051: [SPARK-24640][SQL] Return `NULL` from 
`size(NULL)` by default
URL: https://github.com/apache/spark/pull/26051#issuecomment-539793779
 
 
   @MaxGekk Could you submit a follow-up PR to update the migration guide?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands should not submit a spark job

2019-10-08 Thread GitBox

imback82 commented on issue #26048: [SPARK-29373][SQL] DataSourceV2: Commands 
should not submit a spark job
URL: https://github.com/apache/spark/pull/26048#issuecomment-539793532
 
 
   I double-checked this. `V2TableWriteExec.writeWithV2` returns 
`sparkContext.emptyRDD`. In this case, `DAGScheduler.submitJob` will return 
without actually submitting a job. So there will be one job for `CREATE TABLE 
AS SELECT`.
   
   So theoretically, we could have just returned `sparkContext.emptyRDD` for 
some of the commands that don't return results (such as `USE`, etc.), but I 
think the new approach is cleaner (and we still need this new physical operator 
for `SHOW DATABASE`, etc.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2019-10-08 Thread GitBox

beliefer commented on issue #25416: [SPARK-28330][SQL] Support ANSI SQL: result 
offset clause in query expression
URL: https://github.com/apache/spark/pull/25416#issuecomment-539793201
 
 
   @dongjoon-hyun @HyukjinKwon Could you help me to review this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW 
FUNCTIONS  show '!=', '<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-539790518
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26053: [SPARK-29379][SQL]SHOW 
FUNCTIONS  show '!=', '<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-539790526
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16909/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS  show 
'!=', '<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-539790518
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS  show 
'!=', '<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-539790526
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16909/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-08 Thread GitBox

SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS  show '!=', 
'<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-539789913
 
 
   **[Test build #111929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111929/testReport)**
 for PR 26053 at commit 
[`b066088`](https://github.com/apache/spark/commit/b066088ee9a7deba990327f92a82940b85bf6025).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark 
Graph API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539789010
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark 
Graph API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539789015
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111924/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls 
to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539788946
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls 
to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539788951
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111923/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph 
API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539789015
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111924/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph 
API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539789010
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] 
Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with 
Seqs of tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539788946
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26062: [SPARK-29401][CORE][ML] 
Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with 
Seqs of tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539788951
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111923/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

SparkQA removed a comment on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph 
API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539749285
 
 
   **[Test build #111924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111924/testReport)**
 for PR 24851 at commit 
[`c50b679`](https://github.com/apache/spark/commit/c50b679310de99bba38e94b092871b3c82dc0ce1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #25909: [SPARK-29224]Implement Factorization Machines as a ml-pipeline component

2019-10-08 Thread GitBox

zhengruifeng commented on issue #25909: [SPARK-29224]Implement Factorization 
Machines as a ml-pipeline component
URL: https://github.com/apache/spark/pull/25909#issuecomment-539788608
 
 
   @mob-ai Thanks for this work!
   But before you continue, I guess you can refer to previous dicsussion 
[SPARK-7008](https://issues.apache.org/jira/browse/SPARK-7008). I think you 
should provide some information like convergence curves on common datasets, and 
proof that mini-batch SGD is a good choice as an efficient solver.
   As to the PR itself, besides owen's comments, I think there should be 
`FMClassifier` & `FMRegressor` respectively.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

SparkQA removed a comment on issue #26062: [SPARK-29401][CORE][ML] Replace 
calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of 
tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539741136
 
 
   **[Test build #111923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111923/testReport)**
 for PR 26062 at commit 
[`c43aa71`](https://github.com/apache/spark/commit/c43aa711ee035891d1d6af9ff27786d35c76885a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-08 Thread GitBox

SparkQA commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API
URL: https://github.com/apache/spark/pull/24851#issuecomment-539788379
 
 
   **[Test build #111924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111924/testReport)**
 for PR 24851 at commit 
[`c50b679`](https://github.com/apache/spark/commit/c50b679310de99bba38e94b092871b3c82dc0ce1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `abstract class PropertyGraphReader(session: CypherSession) `
 * `abstract class PropertyGraphWriter(val graph: PropertyGraph) `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to .parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples

2019-10-08 Thread GitBox

SparkQA commented on issue #26062: [SPARK-29401][CORE][ML] Replace calls to 
.parallelize Arrays of tuples, ambiguous in Scala 2.13, with Seqs of tuples
URL: https://github.com/apache/spark/pull/26062#issuecomment-539788014
 
 
   **[Test build #111923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111923/testReport)**
 for PR 26062 at commit 
[`c43aa71`](https://github.com/apache/spark/commit/c43aa711ee035891d1d6af9ff27786d35c76885a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #25963: [SPARK-28137][SQL] Add Postgresql function to_number.

2019-10-08 Thread GitBox

beliefer commented on issue #25963: [SPARK-28137][SQL] Add Postgresql function 
to_number.
URL: https://github.com/apache/spark/pull/25963#issuecomment-539787262
 
 
   @dongjoon-hyun @wangyum Could you help me to review this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses 
Arrow R 0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-539785681
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix 
incorrect dep in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539785727
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #26041: [SPARK-29403][INFRA][R] Uses 
Arrow R 0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-539785687
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16907/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #25984: [WIP][SPARK-29308][BUILD] Fix 
incorrect dep in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539785731
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16908/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix 
incorrect dep in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539785731
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16908/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 
0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-539785681
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 
0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-539785687
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16907/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix 
incorrect dep in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539785727
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-08 Thread GitBox

SparkQA commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 
in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-539784395
 
 
   **[Test build #111927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111927/testReport)**
 for PR 26041 at commit 
[`e1a6807`](https://github.com/apache/spark/commit/e1a680755475d0a4cfd60d5352b9af4dcf573dd7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

SparkQA commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep 
in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539784367
 
 
   **[Test build #111928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111928/testReport)**
 for PR 25984 at commit 
[`f7bd663`](https://github.com/apache/spark/commit/f7bd663b73c95d8d50691da455aea980630e112a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep in dev/deps/spark-deps-hadoop-3.2

2019-10-08 Thread GitBox

wangyum commented on issue #25984: [WIP][SPARK-29308][BUILD] Fix incorrect dep 
in dev/deps/spark-deps-hadoop-3.2 
URL: https://github.com/apache/spark/pull/25984#issuecomment-539783883
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer edited a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-10-08 Thread GitBox

beliefer edited a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... 
ESCAPE syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-539467518
 
 
   @dongjoon-hyun @maropu  @HyukjinKwon @wangyum Could you have time to follow 
up this PR ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] teeyog commented on issue #25287: [SPARK-28552][SQL]Identification of different dialects insensitive to case by JDBC URL prefix

2019-10-08 Thread GitBox

teeyog commented on issue #25287: [SPARK-28552][SQL]Identification of different 
dialects insensitive to case by JDBC URL prefix
URL: https://github.com/apache/spark/pull/25287#issuecomment-539783224
 
 
   @maropu 
   hi, I reported this error when test build. I don't know how to solve it. Can 
you help me? Thank you.
   You can also go to see Console Output.
   ```
   [error] (spark/javaunidoc:doc) javadoc returned nonzero exit code
   [error] Total time: 98 s, completed Oct 7, 2019 8:24:15 PM
   [error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt 
-Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl 
-Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos unidoc ; received return code 1
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over 
event log files
URL: https://github.com/apache/spark/pull/25670#issuecomment-539782072
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111922/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files

2019-10-08 Thread GitBox

AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over 
event log files
URL: https://github.com/apache/spark/pull/25670#issuecomment-539782070
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files

2019-10-08 Thread GitBox

AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event 
log files
URL: https://github.com/apache/spark/pull/25670#issuecomment-539782072
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111922/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 878 matches

Mail list logo