[GitHub] spark pull request #16186: [SPARK-18758][SS] StreamingQueryListener events f...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16186#discussion_r91240700 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala --- @@ -63,18 +83,29 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus) } } + /** + * Dispatch events to registered StreamingQueryListeners. Only the events associated queries + * started in the same SparkSession as this ListenerBus will be dispatched to the listeners. + */ override protected def doPostEvent( listener: StreamingQueryListener, event: StreamingQueryListener.Event): Unit = { +val runIdsToReportTo = activeQueryRunIds.synchronized { activeQueryRunIds.toSet } --- End diff -- Why need to clone the set? You can just use `activeQueryRunIds.synchronized { activeQueryRunIds.contains(...) }`. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16192: [SPARK-18764][Core]Add a warning log when skipping a cor...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16192 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16191: spark decision tree
Github user lklong commented on the issue: https://github.com/apache/spark/pull/16191 hi ,could somebody help to rely this question? thanks very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16192: [SPARK-18764][Core]Add a warning log when skipping a cor...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16192 **[Test build #69782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69782/consoleFull)** for PR 16192 at commit [`96b4836`](https://github.com/apache/spark/commit/96b48363ef58dbf1d7f2cf695ce30f05493f2990). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16192: [SPARK-18764][Core]Add a warning log when skippin...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16192 [SPARK-18764][Core]Add a warning log when skipping a corrupted file ## What changes were proposed in this pull request? It's better to add a warning log when skipping a corrupted file. It will be helpful when we want to finish the job first, then find them in the log and fix these files. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-18764 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16192 commit 96b48363ef58dbf1d7f2cf695ce30f05493f2990 Author: Shixiong Zhu Date: 2016-12-07T07:39:50Z Add a warning log when skipping a corrupted file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69780/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69780/consoleFull)** for PR 16171 at commit [`a0e8433`](https://github.com/apache/spark/commit/a0e8433f03d21a728dd843feef61b264314f44f8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16191: spark decision tree
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16191 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16191: spark decision tree
GitHub user zhuangxue opened a pull request: https://github.com/apache/spark/pull/16191 spark decision tree What algorithm is used in spark decision tree (is ID3, C4.5 or CART)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16191 commit 16eaad9daed0b633e6a714b5704509aa7107d6e5 Author: Sean Owen Date: 2016-11-10T18:20:03Z [SPARK-18262][BUILD][SQL] JSON.org license is now CatX ## What changes were proposed in this pull request? Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the case that it's not used by the part of Hive Spark uses anyway. ## How was this patch tested? Existing tests Author: Sean Owen Closes #15798 from srowen/SPARK-18262. commit b533fa2b205544b42dcebe0a6fee9d8275f6da7d Author: Michael Allman Date: 2016-11-10T21:41:13Z [SPARK-17993][SQL] Fix Parquet log output redirection (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993) ## What changes were proposed in this pull request? PR #14690 broke parquet log output redirection for converted partitioned Hive tables. For example, when querying parquet files written by Parquet-mr 1.6.0 Spark prints a torrent of (harmless) warning messages from the Parquet reader: ``` Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263) at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583) at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` This only happens during execution, not planning,
[GitHub] spark issue #16190: [SPARK-18762][WEBUI][WIP] Web UI should be http:4040 ins...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/16190 @viirya Thanks! I've fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16190: [SPARK-18761][WEBUI][WIP] Web UI should be http:4040 ins...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16190 I think the jira number should be SPARK-18762, instead of SPARK-18761. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16190: [SPARK-18761][WEBUI][WIP] Web UI should be http:4040 ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16190 **[Test build #69781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69781/consoleFull)** for PR 16190 at commit [`4518010`](https://github.com/apache/spark/commit/451801007496cd853d0053285a0757e31015a12a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16190: [SPARK-18761][WEBUI] Web UI should be http:4040 i...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/16190 [SPARK-18761][WEBUI] Web UI should be http:4040 instead of https:4040 ## What changes were proposed in this pull request? When SSL is enabled, the Spark shell shows: ``` Spark context Web UI available at https://192.168.99.1:4040 ``` This is wrong because 4040 is http, not https. It redirects to the https port. More importantly, this introduces several broken links in the UI. For example, in the master UI, the worker link is https:8081 instead of http:8081 or https:8481. CC: @mengxr @liancheng You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-18761 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16190 commit 451801007496cd853d0053285a0757e31015a12a Author: sarutak Date: 2016-12-07T07:14:15Z Reverted the change in SPARK-16988 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69780/consoleFull)** for PR 16171 at commit [`a0e8433`](https://github.com/apache/spark/commit/a0e8433f03d21a728dd843feef61b264314f44f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16187 **[Test build #69779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69779/consoleFull)** for PR 16187 at commit [`73d7910`](https://github.com/apache/spark/commit/73d7910fec565bb61f5dcd10d6bfd9cce467193a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15422 @zsxwing shouldn't we at least log the exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69777/consoleFull)** for PR 16171 at commit [`ef5954b`](https://github.com/apache/spark/commit/ef5954b19b6bca5fb7b603351ce087085ac23e9b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69777/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #69778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69778/consoleFull)** for PR 16043 at commit [`113f7be`](https://github.com/apache/spark/commit/113f7be992988ae3e8b2d11916a1731456f0647c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class ClassifiedEntries(undetermined : Seq[Expression], ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69778/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #69778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69778/consoleFull)** for PR 16043 at commit [`113f7be`](https://github.com/apache/spark/commit/113f7be992988ae3e8b2d11916a1731456f0647c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16187 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69772/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16187 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16187 **[Test build #69772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69772/consoleFull)** for PR 16187 at commit [`566c800`](https://github.com/apache/spark/commit/566c8007dcf74594c23ef2b1fcc394ce64029e9b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16168 @hvanhovell @nsyca @gatorsmile @rxin Thank you for your suggestions! I will try to make a better approach ASAP! Thank youï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16173 ok, the BroadcastFactory' comment shows `SparkContext uses a user-specified BroadcastFactory implementation to instantiate a particular broadcast for the entire Spark job.` so I think it is designed for external implementations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91235259 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala --- @@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("Using view after change the origin view") { +withView("v1", "v2") { + sql("CREATE VIEW v1 AS SELECT id FROM jt") + sql("CREATE VIEW v2 AS SELECT * FROM v1") + withTable("jt2", "jt3") { +// Don't change the view schema +val df2 = (1 until 10).map(i => i + i).toDF("id") --- End diff -- Good point! I'll add the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91235206 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -55,16 +55,19 @@ private[sql] class HiveSessionCatalog( conf, hadoopConf) { - override def lookupRelation(name: TableIdentifier, alias: Option[String]): LogicalPlan = { + override def lookupRelation( + name: TableIdentifier, + alias: Option[String], + databaseHint: Option[String] = None): LogicalPlan = { val table = formatTableName(name.table) -val db = formatDatabaseName(name.database.getOrElse(currentDb)) +val db = formatDatabaseName(name.database.getOrElse(databaseHint.getOrElse(currentDb))) if (db == globalTempViewManager.database) { val relationAlias = alias.getOrElse(table) globalTempViewManager.get(table).map { viewDef => SubqueryAlias(relationAlias, viewDef, Some(name)) }.getOrElse(throw new NoSuchTableException(db, table)) } else if (name.database.isDefined || !tempTables.contains(table)) { - val database = name.database.map(formatDatabaseName) + val database = Some(db).map(formatDatabaseName) --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91235152 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log } } + /** + * Apply Projection on unresolved logical plan to: + * 1. Omit the columns which are not referenced by the view; + * 2. Reorder the columns to keep the same order with the view; + */ + private def withProjection(plan: LogicalPlan, schema: StructType): LogicalPlan = { +// All fields in schema should exist in plan.schema, or we should throw an AnalysisException +// to notify the underlying schema has been changed. +if (schema.fields.forall { field => + plan.schema.fields.exists(other => compareStructField(field, other))}) { + val output = schema.fields.map { field => +plan.output.find { expr => + expr.name == field.name && expr.dataType == field.dataType}.getOrElse( +throw new AnalysisException("The underlying schema doesn't match the original " + + s"schema, expected ${schema.sql} but got ${plan.schema.sql}") + )} + Project(output, plan) +} else { + throw new AnalysisException("The underlying schema doesn't match the original schema, " + +s"expected ${schema.sql} but got ${plan.schema.sql}") +} + } + + /** + * Compare the both [[StructField]] to verify whether they have the same name and dataType. + */ + private def compareStructField(field: StructField, other: StructField): Boolean = { +field.name == other.name && field.dataType == other.dataType + } + + /** + * Aliases the schema of the LogicalPlan to the view attribute names + */ + private def aliasColumns(plan: LogicalPlan, fields: Seq[StructField]): LogicalPlan = { +val output = fields.map(field => (field.name, field.getComment)) +if (plan.output.size != output.size) { --- End diff -- This should not happen. I just want to ensure we are safe here in case the `withProjection` has been modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91235127 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +435,57 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper(taskRunner: TaskRunner, interruptThread: Boolean) extends Runnable { + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") + +private[this] val killTimeoutMs: Long = conf.getTimeAsMs("spark.task.killTimeout", "2m") --- End diff -- My goal here was to let users set this to `-1` to disable killing of the executor JVM. I'll add a test to make sure that this flag actually behaves that way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91235062 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +435,57 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper(taskRunner: TaskRunner, interruptThread: Boolean) extends Runnable { + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") --- End diff -- On the fence about documenting these publicly, but am willing to do so and appreciate naming suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91235005 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -161,12 +163,7 @@ private[spark] class Executor( * @param interruptThread whether to interrupt the task thread */ def killAllTasks(interruptThread: Boolean) : Unit = { -// kill all the running tasks -for (taskRunner <- runningTasks.values().asScala) { - if (taskRunner != null) { -taskRunner.kill(interruptThread) - } -} +runningTasks.keys().asScala.foreach(t => killTask(t, interruptThread = interruptThread)) --- End diff -- A careful reviewer will notice that it's possible for `killTask` to be called twice for the same task, either via multiple calls to `killTask` here or via a call to `killTask` followed by a later `killAllTasks` call. I think that this should technically be okay as of the code in this first draft of this patch since having multiple TaskReapers for the same task should be fine, but I can also appreciate how this could cause resource exhaustion issues in the pathological case where killTask is spammed continuously. If we think it's important to avoid multiple reapers in this case then a simple solution would be to add a `synchronized` method on `TaskRunner` which submits a `TaskReaper` on the first kill request and is a no-op on subsequent requests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91234885 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log alias.map(a => SubqueryAlias(a, qualifiedTable, None)).getOrElse(qualifiedTable) } else if (table.tableType == CatalogTableType.VIEW) { val viewText = table.viewText.getOrElse(sys.error("Invalid view without text.")) + val unresolvedPlan = sparkSession.sessionState.sqlParser.parsePlan(viewText).transform { +case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty => + u.copy(tableIdentifier = TableIdentifier(u.tableIdentifier.table, table.currentDatabase)) + } + // Resolve the plan and check whether the analyzed plan is valid. + val resolvedPlan = try { +val resolvedPlan = sparkSession.sessionState.analyzer.execute(unresolvedPlan) +sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan) + +resolvedPlan + } catch { +case NonFatal(e) => + throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewText", e) + } + val planWithProjection = table.originalSchema.map(withProjection(resolvedPlan, _)) +.getOrElse(resolvedPlan) + SubqueryAlias( alias.getOrElse(table.identifier.table), -sparkSession.sessionState.sqlParser.parsePlan(viewText), +aliasColumns(planWithProjection, table.schema.fields), --- End diff -- It will might be a bit more complex, but I think we certainly can do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91234770 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log alias.map(a => SubqueryAlias(a, qualifiedTable, None)).getOrElse(qualifiedTable) } else if (table.tableType == CatalogTableType.VIEW) { val viewText = table.viewText.getOrElse(sys.error("Invalid view without text.")) + val unresolvedPlan = sparkSession.sessionState.sqlParser.parsePlan(viewText).transform { +case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty => + u.copy(tableIdentifier = TableIdentifier(u.tableIdentifier.table, table.currentDatabase)) + } + // Resolve the plan and check whether the analyzed plan is valid. + val resolvedPlan = try { +val resolvedPlan = sparkSession.sessionState.analyzer.execute(unresolvedPlan) +sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan) + +resolvedPlan + } catch { +case NonFatal(e) => + throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewText", e) + } + val planWithProjection = table.originalSchema.map(withProjection(resolvedPlan, _)) --- End diff -- I think so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91234781 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -209,6 +209,41 @@ class JobCancellationSuite extends SparkFunSuite with Matchers with BeforeAndAft assert(jobB.get() === 100) } + test("task reaper kills JVM if killed tasks keep running for too long") { +val conf = new SparkConf().set("spark.task.killTimeout", "5s") +sc = new SparkContext("local-cluster[2,1,1024]", "test", conf) + +// Add a listener to release the semaphore once any tasks are launched. +val sem = new Semaphore(0) +sc.addSparkListener(new SparkListener { + override def onTaskStart(taskStart: SparkListenerTaskStart) { +sem.release() + } +}) + +// jobA is the one to be cancelled. +val jobA = Future { + sc.setJobGroup("jobA", "this is a job to be cancelled", interruptOnCancel = true) + sc.parallelize(1 to 1, 2).map { i => +while (true) { } + }.count() +} + +// Block until both tasks of job A have started and cancel job A. +sem.acquire(2) +// Small delay to ensure tasks actually start executing the task body +Thread.sleep(1000) --- End diff -- This is slightly ugly but it's needed to avoid a race where this regression test can spuriously pass (and thereby fail to test anything) in case we cancel a task right after it has launched on the executor but before the UDF in the task has actually run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16121#discussion_r91234665 --- Diff: python/pyspark/serializers.py --- @@ -278,50 +278,51 @@ def __repr__(self): return "AutoBatchedSerializer(%s)" % self.serializer -class CartesianDeserializer(FramedSerializer): +class CartesianDeserializer(Serializer): """ Deserializes the JavaRDD cartesian() of two PythonRDDs. """ def __init__(self, key_ser, val_ser): -FramedSerializer.__init__(self) self.key_ser = key_ser self.val_ser = val_ser -def prepare_keys_values(self, stream): -key_stream = self.key_ser._load_stream_without_unbatching(stream) -val_stream = self.val_ser._load_stream_without_unbatching(stream) -key_is_batched = isinstance(self.key_ser, BatchedSerializer) -val_is_batched = isinstance(self.val_ser, BatchedSerializer) -for (keys, vals) in zip(key_stream, val_stream): -keys = keys if key_is_batched else [keys] -vals = vals if val_is_batched else [vals] -yield (keys, vals) +def _load_stream_without_unbatching(self, stream): +key_batch_stream = self.key_ser._load_stream_without_unbatching(stream) +val_batch_stream = self.val_ser._load_stream_without_unbatching(stream) +for (key_batch, val_batch) in zip(key_batch_stream, val_batch_stream): +yield product(key_batch, val_batch) --- End diff -- Maybe consider adding a comment here explaining why the interaction of batching & product --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16121#discussion_r91232276 --- Diff: python/pyspark/serializers.py --- @@ -96,7 +96,7 @@ def load_stream(self, stream): raise NotImplementedError def _load_stream_without_unbatching(self, stream): --- End diff -- Even though this is internal it might make sense to have a docstring for this since were changing its behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16121#discussion_r91234619 --- Diff: python/pyspark/serializers.py --- @@ -278,50 +278,51 @@ def __repr__(self): return "AutoBatchedSerializer(%s)" % self.serializer -class CartesianDeserializer(FramedSerializer): +class CartesianDeserializer(Serializer): """ Deserializes the JavaRDD cartesian() of two PythonRDDs. --- End diff -- Maybe we should document this a bit given that we had problems with the implementation. (e.g. expand on the "Due to batching, we can't use the Java cartesian method." comment from `rdd.py` to explain how this is intended to function). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91234733 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +435,57 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper(taskRunner: TaskRunner, interruptThread: Boolean) extends Runnable { + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") + +private[this] val killTimeoutMs: Long = conf.getTimeAsMs("spark.task.killTimeout", "2m") + +private[this] val takeThreadDump: Boolean = + conf.getBoolean("spark.task.threadDumpKilledTasks", true) + +override def run(): Unit = { + val startTimeMs = System.currentTimeMillis() + def elapsedTimeMs = System.currentTimeMillis() - startTimeMs + + while (!taskRunner.isFinished && elapsedTimeMs < killTimeoutMs) { +taskRunner.kill(interruptThread = interruptThread) +taskRunner.synchronized { + Thread.sleep(killPollingFrequencyMs) +} +if (!taskRunner.isFinished) { + logWarning(s"Killed task ${taskRunner.taskId} is still running after $elapsedTimeMs ms") + if (takeThreadDump) { +try { + val threads = Utils.getThreadDump() + threads.find(_.threadName == taskRunner.threadName).foreach { thread => +logWarning(s"Thread dump from task ${taskRunner.taskId}:\n${thread.stackTrace}") + } +} catch { + case NonFatal(e) => +logWarning("Exception thrown while obtaining thread dump: ", e) +} + } +} + } + if (!taskRunner.isFinished && killTimeoutMs > 0 && elapsedTimeMs > killTimeoutMs) { +if (isLocal) { + logError(s"Killed task ${taskRunner.taskId} could not be stopped within " + --- End diff -- Even if we did throw an exception here, it wouldn't exit the JVM in local mode because we don't set an uncaught exception handler in local mode (see code further up in this file). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91234634 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,9 +230,11 @@ private[spark] class Executor( // ClosedByInterruptException during execBackend.statusUpdate which causes // Executor to crash Thread.interrupted() + notifyAll() } override def run(): Unit = { + Thread.currentThread().setName(threadName) --- End diff -- Task ids should be unique so therefore this thread name should be unique. Hence, I don't think it's super important to reset the thread's name when returning it to this task thread pool because the thread will just be renamed as soon as it's recycled for a new task and if the task has already exited then it'll be clear from the thread state / context that this is just a completed task's thread that's been returned to the pool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69777/consoleFull)** for PR 16171 at commit [`ef5954b`](https://github.com/apache/spark/commit/ef5954b19b6bca5fb7b603351ce087085ac23e9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91234685 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -192,13 +189,17 @@ private[spark] class Executor( serializedTask: ByteBuffer) extends Runnable { +val threadName = s"Executor task launch worker for task $taskId" --- End diff -- This naming scheme was intentionally chosen to match the pattern that we use for sorting threads in the executor thread dump page. I'll manually verify that this worked as expected there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" to over...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16189 **[Test build #69776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69776/consoleFull)** for PR 16189 at commit [`a46f9c2`](https://github.com/apache/spark/commit/a46f9c2436d533ff838674cb63e397d1007e34de). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91234443 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -207,31 +205,56 @@ case class CreateViewCommand( } /** - * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize - * SQL based on the analyzed plan, and also creates the proper schema for the view. + * Returns a [[CatalogTable]] that can be used to save in the catalog. This stores the following + * properties for a view: + * 1. The `viewText` which is used to generate a logical plan when we resolve a view; + * 2. The `currentDatabase` which sets the current database on Analyze stage; + * 3. The `schema` which ensure we generate the correct output. */ private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = { -val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL +val currentDatabase = sparkSession.sessionState.catalog.getCurrentDatabase -// Validate the view SQL - make sure we can parse it and analyze it. -// If we cannot analyze the generated query, there is probably a bug in SQL generation. -try { - sparkSession.sql(viewSQL).queryExecution.assertAnalyzed() -} catch { - case NonFatal(e) => -throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) -} +if (originalText.isDefined) { + val viewSQL = originalText.get + + // Validate the view SQL - make sure we can resolve it with currentDatabase. + val originalSchema = try { +val unresolvedPlan = sparkSession.sessionState.sqlParser.parsePlan(viewSQL) +val resolvedPlan = sparkSession.sessionState.analyzer.execute(unresolvedPlan) +sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan) + +resolvedPlan.schema + } catch { +case NonFatal(e) => + throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) + } -CatalogTable( - identifier = name, - tableType = CatalogTableType.VIEW, - storage = CatalogStorageFormat.empty, - schema = aliasedPlan.schema, - properties = properties, - viewOriginalText = originalText, - viewText = Some(viewSQL), - comment = comment -) + CatalogTable( +identifier = name, +tableType = CatalogTableType.VIEW, +storage = CatalogStorageFormat.empty, +schema = aliasedPlan.schema, +originalSchema = Some(originalSchema), +properties = properties, +viewOriginalText = originalText, +viewText = Some(viewSQL), +currentDatabase = Some(currentDatabase), +comment = comment + ) +} else { --- End diff -- I should add comments over this code. The originalText is non-empty only if the command is generated from SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69775/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/16189 [SPARK-18761][CORE][WIP] Introduce "task reaper" to oversee task killing in executors ## What changes were proposed in this pull request? Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks. This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from tasks that we are waiting to finish. After a configurable timeout, if the task has not been killed then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks. There are some aspects of the design that I'd like to think about a bit more, but I've opened this as `[WIP]` now in order to solicit early feedback. I'll comment on some of my thoughts directly on the diff. ## How was this patch tested? Tested via a new test case in `JobCancellationSuite`, plus manual testing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark cancellation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16189 commit 2c28594b980845bda1d4db7ae866a91caaad4fff Author: Josh Rosen Date: 2016-12-07T06:17:38Z Add failing regression test. commit a46f9c2436d533ff838674cb63e397d1007e34de Author: Josh Rosen Date: 2016-12-07T06:18:43Z Add TaskReaper to executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69775/consoleFull)** for PR 16171 at commit [`a0dc2c8`](https://github.com/apache/spark/commit/a0dc2c8342df4040b1cd9c5c1827271bbe22278f). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaClassificationModel(JavaPredictionModel, HasRawPredictionCol):` * `class JavaProbabilisticClassificationModel(JavaClassificationModel, HasProbabilityCol):` * `class OneVsRestModel(JavaModel, OneVsRestParams, HasFeaturesCol, HasPredictionCol,` * `class AFTSurvivalRegressionModel(JavaModel, HasFeaturesCol, HasPredictionCol,` * `class JavaPredictionModel(HasFeaturesCol, HasPredictionCol):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91234292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -207,31 +205,56 @@ case class CreateViewCommand( } /** - * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize - * SQL based on the analyzed plan, and also creates the proper schema for the view. + * Returns a [[CatalogTable]] that can be used to save in the catalog. This stores the following + * properties for a view: + * 1. The `viewText` which is used to generate a logical plan when we resolve a view; + * 2. The `currentDatabase` which sets the current database on Analyze stage; + * 3. The `schema` which ensure we generate the correct output. */ private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = { -val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL +val currentDatabase = sparkSession.sessionState.catalog.getCurrentDatabase -// Validate the view SQL - make sure we can parse it and analyze it. -// If we cannot analyze the generated query, there is probably a bug in SQL generation. -try { - sparkSession.sql(viewSQL).queryExecution.assertAnalyzed() -} catch { - case NonFatal(e) => -throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) -} +if (originalText.isDefined) { + val viewSQL = originalText.get + + // Validate the view SQL - make sure we can resolve it with currentDatabase. + val originalSchema = try { +val unresolvedPlan = sparkSession.sessionState.sqlParser.parsePlan(viewSQL) +val resolvedPlan = sparkSession.sessionState.analyzer.execute(unresolvedPlan) +sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan) + +resolvedPlan.schema + } catch { +case NonFatal(e) => + throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) --- End diff -- Yeah I agree we should throw a AnalysisException here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #69775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69775/consoleFull)** for PR 16171 at commit [`a0dc2c8`](https://github.com/apache/spark/commit/a0dc2c8342df4040b1cd9c5c1827271bbe22278f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16168#discussion_r91234171 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -207,31 +205,56 @@ case class CreateViewCommand( } /** - * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize - * SQL based on the analyzed plan, and also creates the proper schema for the view. + * Returns a [[CatalogTable]] that can be used to save in the catalog. This stores the following + * properties for a view: + * 1. The `viewText` which is used to generate a logical plan when we resolve a view; + * 2. The `currentDatabase` which sets the current database on Analyze stage; + * 3. The `schema` which ensure we generate the correct output. */ private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = { -val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL +val currentDatabase = sparkSession.sessionState.catalog.getCurrentDatabase -// Validate the view SQL - make sure we can parse it and analyze it. -// If we cannot analyze the generated query, there is probably a bug in SQL generation. -try { - sparkSession.sql(viewSQL).queryExecution.assertAnalyzed() -} catch { - case NonFatal(e) => -throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) -} +if (originalText.isDefined) { --- End diff -- I should add comments over this code. The `originalText` is non-empty only if the command is generated from SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16131 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @srowen Is this ready to be merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16150#discussion_r91232450 --- Diff: R/pkg/R/mllib.R --- @@ -1389,7 +1399,9 @@ setMethod("spark.gaussianMixture", signature(data = "SparkDataFrame", formula = # Get the summary of a multivariate gaussian mixture model #' @param object a fitted gaussian mixture model. -#' @return \code{summary} returns the model's lambda, mu, sigma, k, dim and posterior. +#' @return \code{summary} returns summary of the fitted model, which is a list. +#' The list includes the model's \code{lambda} (lambda), \code{mu} (mu), +#' \code{sigma} (sigma), and \code{posterior} (posterior). --- End diff -- Same reason as the above one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16186 @marmbrus @zsxwing @brkyvz Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16150#discussion_r91232063 --- Diff: R/pkg/R/mllib.R --- @@ -1852,9 +1867,9 @@ summary.treeEnsemble <- function(model) { # Get the summary of a Random Forest Regression Model -#' @return \code{summary} returns a summary object of the fitted model, a list of components -#' including formula, number of features, list of features, feature importances, number of -#' trees, and tree weights +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list of components includes \code{ans} (formula, number of features, list of features, +#' feature importances, number of trees, and tree weights). --- End diff -- the two places returns `summary.treeEnsemble(object)`. What shall I put in the `\code{}`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16187 Actually if possible please merge this in branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16182 **[Test build #3471 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3471/consoleFull)** for PR 16182 at commit [`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16182 **[Test build #69774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69774/consoleFull)** for PR 16182 at commit [`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16182 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16150#discussion_r91231575 --- Diff: R/pkg/R/mllib.R --- @@ -661,7 +665,10 @@ setMethod("fitted", signature(object = "KMeansModel"), # Get the summary of a k-means model #' @param object a fitted k-means model. -#' @return \code{summary} returns the model's features, coefficients, k, size and cluster. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{coefficients} (model cluster centers), --- End diff -- For the return list, I didn't see features and k. Does R function not only return the last line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16165: [SPARK-8617] [WEBUI] HistoryServer: Include in-progress ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16165 **[Test build #69773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69773/consoleFull)** for PR 16165 at commit [`51401b9`](https://github.com/apache/spark/commit/51401b90cc91dcca376b66d115532c00561f7e1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16150#discussion_r91231272 --- Diff: R/pkg/R/mllib.R --- @@ -1389,7 +1399,9 @@ setMethod("spark.gaussianMixture", signature(data = "SparkDataFrame", formula = # Get the summary of a multivariate gaussian mixture model #' @param object a fitted gaussian mixture model. -#' @return \code{summary} returns the model's lambda, mu, sigma, k, dim and posterior. +#' @return \code{summary} returns summary of the fitted model, which is a list. +#' The list includes the model's \code{lambda} (lambda), \code{mu} (mu), +#' \code{sigma} (sigma), and \code{posterior} (posterior). --- End diff -- missing k, dim? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16186 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69769/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16186 **[Test build #69769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69769/consoleFull)** for PR 16186 at commit [`9585ae4`](https://github.com/apache/spark/commit/9585ae41916e577cdb1a5d822cf69efa4af3e7d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16150#discussion_r91231151 --- Diff: R/pkg/R/mllib.R --- @@ -661,7 +665,10 @@ setMethod("fitted", signature(object = "KMeansModel"), # Get the summary of a k-means model #' @param object a fitted k-means model. -#' @return \code{summary} returns the model's features, coefficients, k, size and cluster. +#' @return \code{summary} returns summary information of the fitted model, which is a list. +#' The list includes the model's \code{coefficients} (model cluster centers), --- End diff -- are we missing features and k? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16188: Branch 1.6 decision tree
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16188 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16182 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69771/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16188: Branch 1.6 decision tree
GitHub user zhuangxue opened a pull request: https://github.com/apache/spark/pull/16188 Branch 1.6 decision tree What algorithm is used in spark decision tree (is ID3, C4.5 or CART) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16188 commit 4c28b4c8f342fde937ff77ab30f898dfe3186c03 Author: Gabriele Nizzoli Date: 2016-02-02T18:57:18Z [SPARK-13121][STREAMING] java mapWithState mishandles scala Option java mapwithstate with Function3 has wrong conversion of java `Optional` to scala `Option`, fixed code uses same conversion used in the mapwithstate call that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is `None`, better to use `JavaUtils.optionToOptional(v)` instead. Author: Gabriele Nizzoli Closes #11007 from gabrielenizzoli/branch-1.6. commit 9c0cf22f7681ae05d894ae05f6a91a9467787519 Author: Grzegorz Chilkiewicz Date: 2016-02-02T19:16:24Z [SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication Fixes problem and verifies fix by test suite. Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn and deduplicates SchemaUtils.appendColumn functions. Author: Grzegorz Chilkiewicz Closes #10741 from grzegorz-chilkiewicz/master. (cherry picked from commit b1835d727234fdff42aa8cadd17ddcf43b0bed15) Signed-off-by: Joseph K. Bradley commit 3c92333ee78f249dae37070d3b6558b9c92ec7f4 Author: Daoyuan Wang Date: 2016-02-02T19:09:40Z [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would be thrown. Author: Daoyuan Wang Closes #10964 from adrian-wang/npewriter. (cherry picked from commit 358300c795025735c3b2f96c5447b1b227d4abc1) Signed-off-by: Michael Armbrust Conflicts: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala commit e81333be05cc5e2a41e5eb1a630c5af59a47dd23 Author: Kevin (Sangwoo) Kim Date: 2016-02-02T21:24:09Z [DOCS] Update StructType.scala The example will throw error like :20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim Closes #10141 from swkimme/patch-1. (cherry picked from commit b377b03531d21b1d02a8f58b3791348962e1f31b) Signed-off-by: Michael Armbrust commit 2f8abb4afc08aa8dc4ed763bcb93ff6b1d6f0d78 Author: Adam Budde Date: 2016-02-03T03:35:33Z [SPARK-13122] Fix race condition in MemoryStore.unrollSafely() https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two threads that return the same value for currentTaskAttemptId() execute this method concurrently. This change makes the operation of reading the initial amount of unroll memory used, performing the unroll, and updating the associated memory maps atomic in order to avoid this race condition. Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID. Author: Adam Budde Closes #11012 from budde/master. (cherry picked from commit ff71261b651a7b289ea2312abd6075da8b838ed9) Signed-off-by: Andrew Or Conflicts: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala commit 5fe8796c2fa859e30cf5ba293bee8957e23163bc Author: Mario Briggs Date: 2016-02-03T17:50:28Z [SPARK-12739][STREAMING] Details of batch in Streaming tab uses two Duration columns I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration' Author: Mario Briggs Author: mariobriggs Closes #11022 from mariobriggs/spark-12739. (cherry picked from commit e9eb248edfa81d75f99c9afc2063e6b3d9ee7392) Signed-off-by: Shixiong Zhu commit cdfb2a1410aa799596c8b751187dbac28b2cc678 Author: Wenchen Fan Date: 2016-02-04T00:13:23Z [SPARK-13101][SQL][BRANCH-1.6] nullability of array type element should not fail analysis of encoder nullability should only be considered as an optimization rather than part of th
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16182 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16182 **[Test build #69771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69771/consoleFull)** for PR 16182 at commit [`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16183 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix test f...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16183 Merging this to master and 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16187 **[Test build #69772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69772/consoleFull)** for PR 16187 at commit [`566c800`](https://github.com/apache/spark/commit/566c8007dcf74594c23ef2b1fcc394ce64029e9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16187 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16187: [SPARK-18760][SQL] Consistent format specificatio...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16187 [SPARK-18760][SQL] Consistent format specification for FileFormats ## What changes were proposed in this pull request? We currently rely on FileFormat implementations to override toString in order to get a proper explain output. It'd be better to just depend on shortName for those. Before: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@xyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` After: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` Also closes #14680. ## How was this patch tested? Verified in spark-shell. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-18760 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16187 commit 566c8007dcf74594c23ef2b1fcc394ce64029e9b Author: Reynold Xin Date: 2016-12-07T05:22:40Z [SPARK-18760][SQL] Consistent format specification for FileFormats --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user sethah commented on the issue: https://github.com/apache/spark/pull/9 ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 ping @dbtsai :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Use metastore schema instead o...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14537 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #6848: [SPARK-8398][CORE] Hadoop input/output format adva...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6848 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16181: Branch 2.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16181 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #9543: [SPARK-11482][SQL] Make maven repo for Hive metast...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9543 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8318 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #7265: [SPARK-7263] Add new shuffle manager which stores ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7265 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16184 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69767/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16184 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16184 **[Test build #69767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69767/consoleFull)** for PR 16184 at commit [`c6fe345`](https://github.com/apache/spark/commit/c6fe34511fc1ea5c36713d435dc64673deceae7f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9543: [SPARK-11482][SQL] Make maven repo for Hive metastore jar...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/9543 I'm going to close this one for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8785: [Spark-10625] [SQL] Spark SQL JDBC read/write is unable t...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/8785 @tribbloid is this a problem that needs to be fixed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #7265: [SPARK-7263] Add new shuffle manager which stores shuffle...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/7265 I'm gong to close this for now. Next year we might actually come back and revisit this - probably not with the current parquet implementation since it is not very efficient, but some sort of columnar format. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #6848: [SPARK-8398][CORE] Hadoop input/output format advanced co...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/6848 I'm going to close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16135 cc @ericl can you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16129 This LGTM. Now that I'm looking at it, the test suite **never actually tests for correctness**, just basic input/output sizes. We really should have better tests, but it's ok with me if it's done in a separate JIRA. Also, I'd be in favor of changing the title since, while it does affect RandomForest/ML, it's really an error in the SamplingUtils, and this method is used in at least one other place (RangePartitioner). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16122 I haven't been able to get a proper unit test environment running where the embedded metastore conf is different from the client conf. I did validate that Spark without this patch failed to execute a query on a table with an integer type partition column filtering on that column where the metastore has direct sql access disabled, whereas Spark with this patch works and behaves as expected. @wangyum I don't believe your test uses Hive in a way that's compatible with Spark. Can you please remove it? @ericl Any ideas on how to unit test the case where the client and metastore have different configurations? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14537 The schema inferring is replaced with metastore schema completely in #14690. I think we can close this now? cc @cloud-fan @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix test f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16183 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69764/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org